模拟测试程序,从客户端向服务器发数据,人工控制服务器收数据。当客户端发了一部分数据后,无法再发送,此时服务器开始每次收取1K。
按照常理推断,服务器收取1K后,客户端应该能够继续发送数据,但实测观察发现,客户端还是无法发送数据,直到服务器收取了一定数据量后,客户端才能够继续发送。
tcp抓包如下:
[plain] view plain copy
- 11:42:40.217984 IP localhost.6379 > localhost.28944: . ack 65665 win 0 <nop,nop,timestamp 1816613366 1816613366>
- 0x0000: 4500 0034 5e08 4000 4006 deb9 7f00 0001 E..4^.@.@.......
- 0x0010: 7f00 0001 18eb 7110 7c79 0efb 7c5f 2ff1 ......q.|y..|_/.
- 0x0020: 8010 0000 3a7f 0000 0101 080a 6c47 51f6 ....:.......lGQ.
- 0x0030: 6c47 51f6 lGQ.
- 11:42:40.425034 IP localhost.28944 > localhost.6379: . ack 1 win 257 <nop,nop,timestamp 1816613573 1816613366>
- 0x0000: 4500 0034 7f94 4000 4006 bd2d 7f00 0001 E..4..@.@..-....
- 0x0010: 7f00 0001 7110 18eb 7c5f 2ff0 7c79 0efb ....q...|_/.|y..
- 0x0020: 8010 0101 38b0 0000 0101 080a 6c47 52c5 ....8.......lGR.
- 0x0030: 6c47 51f6 lGQ.
- 11:42:40.425047 IP localhost.6379 > localhost.28944: . ack 65665 win 0 <nop,nop,timestamp 1816613573 1816613366>
- 0x0000: 4500 0034 5e09 4000 4006 deb8 7f00 0001 E..4^.@.@.......
- 0x0010: 7f00 0001 18eb 7110 7c79 0efb 7c5f 2ff1 ......q.|y..|_/.
- 0x0020: 8010 0000 39b0 0000 0101 080a 6c47 52c5 ....9.......lGR.
- 0x0030: 6c47 51f6 lGQ.
- 11:42:40.838967 IP localhost.28944 > localhost.6379: . ack 1 win 257 <nop,nop,timestamp 1816613987 1816613573>
- 0x0000: 4500 0034 7f95 4000 4006 bd2c 7f00 0001 E..4..@.@..,....
- 0x0010: 7f00 0001 7110 18eb 7c5f 2ff0 7c79 0efb ....q...|_/.|y..
- 0x0020: 8010 0101 3643 0000 0101 080a 6c47 5463 ....6C......lGTc
- 0x0030: 6c47 52c5 lGR.
- 11:42:40.838983 IP localhost.6379 > localhost.28944: . ack 65665 win 0 <nop,nop,timestamp 1816613987 1816613366>
- 0x0000: 4500 0034 5e0a 4000 4006 deb7 7f00 0001 E..4^.@.@.......
- 0x0010: 7f00 0001 18eb 7110 7c79 0efb 7c5f 2ff1 ......q.|y..|_/.
- 0x0020: 8010 0000 3812 0000 0101 080a 6c47 5463 ....8.......lGTc
- 0x0030: 6c47 51f6 lGQ.
- 11:42:41.666922 IP localhost.28944 > localhost.6379: . ack 1 win 257 <nop,nop,timestamp 1816614815 1816613987>
- 0x0000: 4500 0034 7f96 4000 4006 bd2b 7f00 0001 E..4..@.@..+....
- 0x0010: 7f00 0001 7110 18eb 7c5f 2ff0 7c79 0efb ....q...|_/.|y..
- 0x0020: 8010 0101 3169 0000 0101 080a 6c47 579f ....1i......lGW.
- 0x0030: 6c47 5463 lGTc
- 11:42:41.666939 IP localhost.6379 > localhost.28944: . ack 65665 win 0 <nop,nop,timestamp 1816614815 1816613366>
- 0x0000: 4500 0034 5e0b 4000 4006 deb6 7f00 0001 E..4^.@.@.......
- 0x0010: 7f00 0001 18eb 7110 7c79 0efb 7c5f 2ff1 ......q.|y..|_/.
- 0x0020: 8010 0000 34d6 0000 0101 080a 6c47 579f ....4.......lGW.
- 0x0030: 6c47 51f6 lGQ.
- 11:42:43.322908 IP localhost.28944 > localhost.6379: . ack 1 win 257 <nop,nop,timestamp 1816616471 1816614815>
- 0x0000: 4500 0034 7f97 4000 4006 bd2a 7f00 0001 E..4..@.@..*....
- 0x0010: 7f00 0001 7110 18eb 7c5f 2ff0 7c79 0efb ....q...|_/.|y..
- 0x0020: 8010 0101 27b5 0000 0101 080a 6c47 5e17 ....'.......lG^.
- 0x0030: 6c47 579f lGW.
- 11:42:43.322921 IP localhost.6379 > localhost.28944: . ack 65665 win 0 <nop,nop,timestamp 1816616471 1816613366>
- 0x0000: 4500 0034 5e0c 4000 4006 deb5 7f00 0001 E..4^.@.@.......
- 0x0010: 7f00 0001 18eb 7110 7c79 0efb 7c5f 2ff1 ......q.|y..|_/.
- 0x0020: 8010 0000 2e5e 0000 0101 080a 6c47 5e17 .....^......lG^.
- 0x0030: 6c47 51f6 lGQ.
- 11:42:46.634889 IP localhost.28944 > localhost.6379: . ack 1 win 257 <nop,nop,timestamp 1816619783 1816616471>
- 0x0000: 4500 0034 7f98 4000 4006 bd29 7f00 0001 E..4..@.@..)....
- 0x0010: 7f00 0001 7110 18eb 7c5f 2ff0 7c79 0efb ....q...|_/.|y..
- 0x0020: 8010 0101 144d 0000 0101 080a 6c47 6b07 .....M......lGk.
- 0x0030: 6c47 5e17 lG^.
可以看到服务器返回了大量的ack 65665 win 0的包。
经过查阅相关资料,发现这个问题现象和tcp流控有关,由于涉及内容太多,这里只总结关键点:
1)ack 65665 win 0中的win 0,是服务器告诉客户端:我的tcp滑窗已经满了,没有空间了,客户端收到这样的包后,停止发送数据;
2)为什么服务器收取了一部分数据后,tcp滑窗已经不是满了的状态,还继续返回ack 65665 win 0呢?
这是tcp的协议规定的,当滑窗满了后,为了避免再次很快被填满,只有当滑窗空间达到buffer size的一般或者MSS的大小时才告诉客户端可以继续发送了,即ack包中win不再为0。详见如下说明:
To avoid SWS, we simply make the rule that the receiver may not update its advertised receive window in such a way that this leaves too little usable window space on the part of the sender. In other words, we restrict the receiver from moving the right edge of the window by too small an amount. The usual minimum that the edge may be moved is either the value of theMSS parameter, or one-half the buffer size, whichever is less.
实测和代码验证确认,Linux应该是等于MSS。
这个问题的处理过程中涉及到了很多tcp协议的知识,例如:MSS,SWS(Slide window system),SWS(Silly window syndrome),tcp缓存,ack机制等,有兴趣的同学可以去查查。
完整的解释请参考如下链接:
http://www.tcpipguide.com/free/t_TCPWindowManagementIssues.htm