Re: iperf buffer again.
On Thu, 5 Dec 2002 chongz --at-- positioning-research.com wrote:
> Hi all,
> Finally I am running my computer in University and testing iperf again,
> but I didn't get what I expected :(
As I said before the kernel has ultimate authority over the size of the
tcp window. One extra thing to remember is that advertised window sizes
are not the same as actual window sizes since any data not handled by the
application will not be advertised as available buffer space.
> The windows size from 129.12.48.51(my pc) are not consistant with the
> the -w option, and the default setting has the maximum 63712 value
> (super autotuning?).
Well it would look like there is a problem, but in reality there isn't.
You are missing a key point to IP packets. There are only 16 bits for the
window size and some early implementations of TCP did signed math
operations so it was really restricted to 64K. To get past this little
hangup the WS(Window Shift) option was added. This tells how many bits
to shift the window size value to get what they really want to say.
TCPDump will report the window size of each packet like a good packet
sniffer, but it maintains NO state. So if a connection is set-up with a
WS=2 then the values reported by TCPDump will be n/4 _NOT_ n. You have to
look at the first two packets exchanged to determine the value of WS.
Your values are more like; for 128K -> 95568 and for 240K -> 182448. It
is working correctly you just need to do more work.
> Also the values of bandwidth in iperf don't increse while the buffer
> size increases. How could that be?
>
> The delay now is about 20msec and bw is about 10M(or <20M).
Well the list of reasons for this is long! First if your second host does
not have window shifting enabled it will not do the shift and thus use the
values that tcpdump reports. There is a sweet spot for every connection
which is usually around the delay bandwidth product, if you go past this
value performance usually drops, this is because the connection will
freuently go past the available bandwidth, experience packet loss, and
exponentially backoff. If the receive window is to small then the
connection will not fully fill the pipe and thus performance suffers. The
perfect advertised recieve window is at all times exactly the available
bandwidth * delay of the link which will keep the sender from sending
more data than the network can handle, therefore experiencing no
loss, and will not slow down. Other than those 2 large factors, problems
could be due to the number of passings between the TCP layer and iperf,
scheduling of the iperf process, and others. I have found that if you drop
chunks of 4 times the window size to the TCP implementation you get the
best performance for that window size(not always the case). To do that
simply use the -l option. For your machine it would be like `iperf -s -w
<size>` other machine `iperf -c blah -w <size> -l <size*2>` since your OS
doubles the value of -w already. Though you can use the -l on the server
side it only changes the results by <1% where on the client side it can
have a large affect.
> And my default netsetup are:
> net.core.rmem_max = 8388608
> net.core.wmem_max = 8388608
> net.core.rmem_default = 131072
> net.core.wmem_default = 131072
> net.ipv4.tcp_rmem = 10240 87380 8388608
> net.ipv4.tcp_wmem = 10240 65536 8388608
> net.ipv4.tcp_mem = 8388608 8388608 8388608
Also make sure that net.ipv4.tcp_windowscaling is 1. I can see that
net.ipv4.tcp_sack and net.ipv4.tcp_timestamps are already 1 so I assume
that the windowscaling is as well, but figured I would make sure.
Kevin