Re: iperf buffer again.


On Thu, 5 Dec 2002 chongz --at-- positioning-research.com wrote:

> Hi all,
> Finally I am running my computer in University and testing iperf again,
> but I didn't get what I expected :(

As I said before the kernel has ultimate authority over the size of the 
tcp window. One extra thing to remember is that advertised window sizes 
are not the same as actual window sizes since any data not handled by the 
application will not be advertised as available buffer space. 

 
> The windows size from 129.12.48.51(my pc) are not consistant with the
> the -w option, and the default setting has the maximum 63712 value 
> (super autotuning?).

Well it would look like there is a problem, but in reality there isn't. 
You are missing a key point to IP packets. There are only 16 bits for the 
window size and some early implementations of TCP did signed math 
operations so it was really restricted to 64K. To get past this little 
hangup the WS(Window Shift) option was added. This tells how many bits 
to shift the window size value to get what they really want to say. 
TCPDump will report the window size of each packet like a good packet 
sniffer, but it maintains NO state. So if a connection is set-up with a 
WS=2 then the values reported by TCPDump will be n/4 _NOT_ n. You have to 
look at the first two packets exchanged to determine the value of WS. 
Your values are more like; for 128K ->  95568 and for 240K -> 182448. It 
is working correctly you just need to do more work.

> Also the values of bandwidth in iperf don't increse while the buffer
> size increases. How could that be?
> 
> The delay now is about 20msec and bw is about 10M(or <20M). 

Well the list of reasons for this is long! First if your second host does 
not have window shifting enabled it will not do the shift and thus use the 
values that tcpdump reports. There is a sweet spot for every connection 
which is usually around the delay bandwidth product, if you go past this 
value performance usually drops, this is because the connection will 
freuently go past the available bandwidth, experience packet loss, and 
exponentially backoff. If the receive window is to small then the 
connection will not fully fill the pipe and thus performance suffers. The 
perfect advertised recieve window is at all times exactly the available 
bandwidth * delay of the link which will keep the sender from sending 
more data than the network can handle, therefore experiencing no 
loss, and will not slow down. Other than those 2 large factors, problems 
could be due to the number of passings between the TCP layer and iperf, 
scheduling of the iperf process, and others. I have found that if you drop 
chunks of 4 times the window size to the TCP implementation you get the 
best performance for that window size(not always the case). To do that 
simply use the -l option. For your machine it would be like `iperf -s -w 
<size>` other machine `iperf -c blah -w <size> -l <size*2>` since your OS 
doubles the value of -w already. Though you can use the -l on the server 
side it only changes the results by <1% where on the client side it can 
have a large affect. 

> And my default netsetup are:
> net.core.rmem_max = 8388608
> net.core.wmem_max = 8388608
> net.core.rmem_default = 131072
> net.core.wmem_default = 131072
> net.ipv4.tcp_rmem = 10240 87380 8388608
> net.ipv4.tcp_wmem = 10240 65536 8388608
> net.ipv4.tcp_mem = 8388608 8388608 8388608

Also make sure that net.ipv4.tcp_windowscaling is 1. I can see that  
net.ipv4.tcp_sack and  net.ipv4.tcp_timestamps are already 1 so I assume 
that the windowscaling is as well, but figured I would make sure.

Kevin



Other Mailing lists | Author Index | Date Index | Subject Index | Thread Index