RE: Iperf logging oddity on Linux with iperf 1.7.0
Actually the output in both cases is accurate. TCP is easier to explain
so I will start with that one. As in my previous message about Paul's
strange reports and TCP processing Iperf shows goodput not throughput. As
such when a packet is received it flows through several checks. It then
checks to see if it is time to output a periodic update. If it determines
that it has past a time boundry it reports that data BEFORE adding the
current packet, it then clears the temporary values and checks to make
sure that it did not pass multiple periodic updates and reports any that
have been past (thus these are 0 byte reports). Only after this checking
does it add the current packets amount to the temporary reporting numbers.
Thus lets look at the output Paul provided in this light:
> 17:44:13 [ 18] local 172.27.2.28 port 5001 connected with 172.27.2.13
> 17:44:16 [ 18] 0.0- 2.0 sec 12.8 KBytes 52.5 Kbits/sec
This report is generated after the first "packet" past the 2 second
interval arrives. That packet will be included in the next report.
> 17:44:18 [ 18] 2.0- 4.0 sec 11.4 KBytes 46.7 Kbits/sec
This report is likewise generated by the first "packet" past the 4 second
interval. That packet will be included in the next report.
> 17:44:23 [ 18] 4.0- 6.0 sec 2.85 KBytes 11.7 Kbits/sec
> 17:44:23 [ 18] 6.0- 8.0 sec 0.00 KBytes 0.00 Kbits/sec
This report was generated by a "packet" arriving at 44:23 resulting in two
reports being due. Between 44:18 and 44:20 2.85KB worth of "packets"
arrived. Also between 44:20 and 44:22 no "packets" were received. The
"packet" arriving at 44:23 is then added to the next report as it should
be.
> 17:44:27 [ 18] 8.0-10.0 sec 1.43 KBytes 5.84 Kbits/sec
> 17:44:27 [ 18] 10.0-12.0 sec 0.00 KBytes 0.00 Kbits/sec
Here again a "packet" arriving at 44:27 causes two reports to be generated
one of which has no data because none was received during that period. And
this packet is added to the 12-14 period because 44:27 - 44:13 = :14 with
some rounding in there it is probably closer to say 13.5 seconds into the
transfer.
I quoted all the above packets because these are "packets" as seen by the
application which are not exactly equal to TCP packets sent on the
network. TCP is allowed to repackage "packets" from the application to
the network and actual packets from the network to the application.
Therefore the reports are accurate to what the application saw. This
does not completely equate to what the network saw with all the
retransmissions that you say you saw with TCPDump. However the report is
of the activity above the TCP layer and is accurate. TCP uses many
mechanisms to prevent spurious activity on both the network and
application side. One method you could try to "force" TCP to more
accurately show you what it has received would be to supply a smaller
length on the server side with the (-l) parameter. If you set it to the
MSS that the TCP connection uses (often significantly smaller than the
network maximum frame size) then anytime an in-order segment arrives it
should alert your application. However you will still get stored up
out-of-order packets delivered potentially much later than they arrived.
To get the MSS used by the connection look at the size of application data
in 95% of the actual packets received in TCPDump. Then if that number is
say 564 start the server:
iperf -s -l 564
Let me know if this provides "smoother" results. One potential problem
with shrinking the length of receives is that you increase the amount of
processing required for the stream and that may slow down the connection.
However if you are doing low bandwidth testing with fast idle machines it
should not hurt anything.
Now UDP on the other hand is a wholely different bag of worms. Jim is
correct that Matt's observations can not be attributed to retransmission
and TCP mechanisms since UDP provides no such mechanisms. My guess is that
for whatever reason the client is not sending any packets during the 0
byte periods. This is mostly due to scheduling issues. Iperf performs busy
waiting to try and time the packets it sends on the network to be as close
to CBR as possible. However it can be interrupted by just about anything
for extended periods of time. Unlike most TCP stack implementations, Iperf
does not get scheduled by the kernel every 200ms and on every interrupt
from the NIC on the receiving of an ACK to create the ability to have
control of the CPU when we want it. The same problems can exist on the
server side where data is waiting for the socket but the thread does not
get control of the CPU immediately to return from the recv() and process
the results. I would be interested in a TCPDump along with a utilization
on the serverside during a run that has 0 bytes for an interval when the
rest are pushing 900Mb/s.
Kevin
PS: Before my time with Iperf there were no 0 byte intervals, however the
intervals were never exact. And you would have received a report for the
Paul's run that looked like:
17:44:13 [ 18] local 172.27.2.28 port 5001 connected with 172.27.2.13
17:44:16 [ 18] 0.0- 2.0 sec 12.8 KBytes 52.5 Kbits/sec
17:44:18 [ 18] 2.0- 4.0 sec 11.4 KBytes 46.7 Kbits/sec
17:44:23 [ 18] 4.0- 8.6 sec 2.85 KBytes 5.7 Kbits/sec
17:44:27 [ 18] 8.6-12.2 sec 1.43 KBytes 2.84 Kbits/sec
or something. Now to me this is flat out wrong. Here you not only have
fluctuations in the transfer amount but the time intervals are variable!
Granted you do get a average on the throughput for the 4 second intervals
but if the application received NO data between 6.0-8.0 why not
just come out and say it? The current implementation provides much more
information as to when data was received.
PPS: Sorry for the HUGE response but this is probably the "short" version.
=-)
On Thu, 11 Mar 2004, Jim Ferguson wrote:
> Paul,
>
> This observation is something that we need to look into before pushing out
> a 2.0 version. A fair amount of "code cleanup" was done between the 1.65
> and 1.70 versions, and it appears this 'short interval' behavior has been
> repeatable. Thanks for sending the report, along with the tidbit that
> tcpdump was showing significant retransmissions. I'm guessing that that is
> *not* the case for Matt's observations, however...?
>
> -Jim
>
> At 08:46 AM 3/11/2004 -0600, you wrote:
> >TCPdump shows a much more consistent rate *although there are a lot of
> >retransmissions happening). The expected rate is about 10-50K depending on
> >fluctuations in bw allocations on the wireless system.
> >
> >I think this all goes back to iperf reports not being purely time based
> >and the code not being able to properly detect when it is going to
> >generate multiple reports at the same time and thus produce inaccurate
> >reports for some intervals.
> >
> >If I change the reporting interval to 10 seconds, this is less likely to
> >happen and the TCP connections actually perform better (fewer
> >retransmissions and better rate).
> >
> >Paul
> >
> >
> >-----Original Message-----
> >From: Matthew J Zekauskas [mailto:matt --at-- internet2.edu]
> >Sent: Wednesday, March 10, 2004 7:30 PM
> >To: iperf-users --at-- dast.nlanr.net
> >Subject: Re: Iperf logging oddity on Linux with iperf 1.7.0
> >
> >
> >--On Wednesday, March 10, 2004 6:07 PM -0600 paul shields
> ><pshields --at-- NAVINI.com> wrote:
> >
> > > Below is a snippet of a log from a Linux server with the report
> > interval set for 2 seconds and
> > > timestamps added as to exactly when iperf generated the line of output.
> > The handling of time gaps
> > > seems incorrect. (notice timestamps are the same for some report
> > intervals and all of the data
> > > rate is attributed to the fist interval print at that time. Iperf seems
> > to want to "catch up" on
> > > the reporting process and generates a lot of false 0.0 output lines.
> > >
> > > The report should generate output for the combined 4-8 interval instead
> > of creating two intervals
> > > and incorrectly reporting the data rate distribution between them.
> > >
> > > Comments?
> > > Paul
> > >
> > > 17:44:13 [ 18] local 172.27.2.28 port 5001 connected with 172.27.2.13
> > port 1215
> > > 17:44:16 [ 18] 0.0- 2.0 sec 12.8 KBytes 52.5 Kbits/sec
> > > 17:44:18 [ 18] 2.0- 4.0 sec 11.4 KBytes 46.7 Kbits/sec
> > > 17:44:23 [ 18] 4.0- 6.0 sec 2.85 KBytes 11.7 Kbits/sec
> > > 17:44:23 [ 18] 6.0- 8.0 sec 0.00 KBytes 0.00 Kbits/sec
> > > 17:44:27 [ 18] 8.0-10.0 sec 1.43 KBytes 5.84 Kbits/sec
> > > 17:44:27 [ 18] 10.0-12.0 sec 0.00 KBytes 0.00 Kbits/sec
> > > 17:44:30 [ 18] 12.0-14.0 sec 4.28 KBytes 17.5 Kbits/sec
> > > 17:44:30 [ 18] 14.0-16.0 sec 0.00 KBytes 0.00 Kbits/sec
> > > 17:44:32 [ 18] 16.0-18.0 sec 5.70 KBytes 23.4 Kbits/sec
> > > 17:44:37 [ 18] 18.0-20.0 sec 1.50 KBytes 6.16 Kbits/sec
> > > 17:44:37 [ 18] 20.0-22.0 sec 0.00 KBytes 0.00 Kbits/sec
> > > 17:44:41 [ 18] 22.0-24.0 sec 1.43 KBytes 5.84 Kbits/sec
> > > 17:44:41 [ 18] 24.0-26.0 sec 0.00 KBytes 0.00 Kbits/sec
> > > 17:44:41 [ 18] 26.0-28.0 sec 1.43 KBytes 5.84 Kbits/sec
> > > 17:44:44 [ 18] 28.0-30.0 sec 5.15 KBytes 21.1 Kbits/sec
> >
> >I've seen something similar with iperf 1.7 as well as iperf 2 betas
> >when running under RH9 on gigE-connected systems. The bitrates
> >are quite a bit higher, but some runs have intervals that report
> >0 bits sent! Other runs are consistent at 980-ish Mb/s.
> >
> >A related observation is that when I try to do 900 Mb/s UDP
> >streams, some runs are consistently right around 900 Mb/s,
> >and some runs also have intervals that report 0 bits sent.
> >
> >In both of these cases, the intervals with data do not try
> >to "make up" for the missing bits; overall reports show
> >low bitrate transfers (usually 50 to 200 Mb/s for me :),
> >and no interval ever spikes above the physical maximum
> >bitrate. (What were you expecting in the runs you show?
> >is 0 bits sent over 2 seconds reasonable for you? Have
> >you tried any tcpdump or equivalent passive traces?)
> >
> >I believe there may be an issue at the sender... alarms there
> >look like they are not delivered in time either. Maybe I've
> >got a bogus device driver? I haven't had time to pursue the
> >matter :(, instead choosing to filter out runs that look to
> >be completely bogus. This is not the correct long-term solution...
> >
> >--Matt
>