One area that I’ve been spending quite a bit of time looking at lately is the TCP layer on our servers. We have seen multiple issues that involve TCP and it is an oft-overlooked area when troubleshooting.
There are two tools that I’d like to focus on today – netstat and nstat. Both tools pull statistics from the following Linux files, which track network-related statistics and SNMP counters:
/proc/net/netstat
/proc/net/snmp
Here is what the output of these two files looks like:
$ cat /proc/net/netstat
TcpExt: SyncookiesSent SyncookiesRecv SyncookiesFailed EmbryonicRsts PruneCalled RcvPruned OfoPruned OutOfWindowIcmps LockDroppedIcmps ArpFilter TW TWRecycled TWKilled PAWSPassive PAWSActive PAWSEstab DelayedACKs DelayedACKLocked DelayedACKLost ListenOverflows ListenDrops TCPPrequeued TCPDirectCopyFromBacklog TCPDirectCopyFromPrequeue TCPPrequeueDropped TCPHPHits TCPHPHitsToUser TCPPureAcks TCPHPAcks TCPRenoRecovery TCPSackRecovery TCPSACKReneging TCPFACKReorder TCPSACKReorder TCPRenoReorder TCPTSReorder TCPFullUndo TCPPartialUndo TCPDSACKUndo TCPLossUndo TCPLoss TCPLostRetransmit TCPRenoFailures TCPSackFailures TCPLossFailures TCPFastRetrans TCPForwardRetrans TCPSlowStartRetrans TCPTimeouts TCPRenoRecoveryFail TCPSackRecoveryFail TCPSchedulerFailed TCPRcvCollapsed TCPDSACKOldSent TCPDSACKOfoSent TCPDSACKRecv TCPDSACKOfoRecv TCPAbortOnData TCPAbortOnClose TCPAbortOnMemory TCPAbortOnTimeout TCPAbortOnLinger TCPAbortFailed TCPMemoryPressures TCPSACKDiscard TCPDSACKIgnoredOld TCPDSACKIgnoredNoUndo TCPSpuriousRTOs TCPMD5NotFound TCPMD5Unexpected TCPSackShifted TCPSackMerged TCPSackShiftFallback TCPBacklogDrop TCPMinTTLDrop TCPChallengeACK TCPSYNChallenge BusyPollRxPackets TCPFromZeroWindowAdv TCPToZeroWindowAdv TCPWantZeroWindowAdv
TcpExt: 0 0 8026471 447 6 0 0 0 0 0 5917361 0 7 0 0 0 49265954 43356 924413 62736 62736 95 0 12468 0 4884204246 39 160296961 4791040967 22 2544 34 47 14 19 1 4 8 12677 15385 840 26 298 25526 856 3881 1406 15297 220550 4 177 0 312 921035 545 21187 37 10449576 1113312 0 6189 0 0 0 1241 46 2249 220 0 0 6229 12500 86375 0 0 15643 8695 0 10 10 14
IpExt: InNoRoutes InTruncatedPkts InMcastPkts OutMcastPkts InBcastPkts OutBcastPkts InOctets OutOctets InMcastOctets OutMcastOctets InBcastOctets OutBcastOctets
IpExt: 0 0 0 0 99960 0 954772643799 971866880824 0 0 57011704 0
$ cat /proc/net/snmp
Ip: Forwarding DefaultTTL InReceives InHdrErrors InAddrErrors ForwDatagrams InUnknownProtos InDiscards InDelivers OutRequests OutDiscards OutNoRoutes ReasmTimeout ReasmReqds ReasmOKs ReasmFails FragOKs FragFails FragCreates
Ip: 2 64 5489492045 0 0 0 0 0 5423054428 5431400754 0 0 0 0 0 0 0 0 0
Icmp: InMsgs InErrors InDestUnreachs InTimeExcds InParmProbs InSrcQuenchs InRedirects InEchos InEchoReps InTimestamps InTimestampReps InAddrMasks InAddrMaskReps OutMsgs OutErrors OutDestUnreachs OutTimeExcds OutParmProbs OutSrcQuenchs OutRedirects OutEchos OutEchoReps OutTimestamps OutTimestampReps OutAddrMasks OutAddrMaskReps
Icmp: 37480 44 688 0 0 0 0 36779 13 0 0 0 0 37709 0 616 0 0 0 0 314 36779 0 0 0 0
IcmpMsg: InType0 InType3 InType8 OutType0 OutType3 OutType8
IcmpMsg: 13 688 36779 36779 616 314
Tcp: RtoAlgorithm RtoMin RtoMax MaxConn ActiveOpens PassiveOpens AttemptFails EstabResets CurrEstab InSegs OutSegs RetransSegs InErrs OutRsts
Tcp: 1 200 120000 -1 10197009 23179951 8255 2067864 49 5417270379 5390785056 592798 7343 28892503
Udp: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
Udp: 5746478 574 0 39938128 0 0
UdpLite: InDatagrams NoPorts InErrors OutDatagrams RcvbufErrors SndbufErrors
UdpLite: 0 0 0 0 0 0
Since that output is not very readable, we have ‘netstat –s’ and nstat.
netstat –s
netstat – when used with the “-s” flag – displays summary statistics for each protocol. It lists absolute values and (unfortunately) only those with non-zero values. It also displays verbose descriptions of the statistics.
Here is a sample of the abbreviated output:
$ netstat -s
Ip:
5489957924 total packets received
0 forwarded
0 incoming packets discarded
5423507481 incoming packets delivered
5431857527 requests sent out
. . .
Tcp:
10197583 active connections openings
23183019 passive connection openings
8255 failed connection attempts
2068049 connection resets received
48 connections established
5417722682 segments received
5391237518 segments send out
592799 segments retransmited
7343 bad segments received.
28896234 resets sent
. . .
TcpExt:
8028072 invalid SYN cookies received
447 resets received for embryonic SYN_RECV sockets
6 packets pruned from receive queue because of socket buffer overrun
5917765 TCP sockets finished time wait in fast timer
7 TCP sockets finished time wait in slow timer
49272962 delayed acks sent
43364 delayed acks further delayed because of locked socket
Quick ack mode was activated 924413 times
62736 times the listen queue of a socket overflowed
62736 SYNs to LISTEN sockets ignored
. . .
nstat
The nstat command dumps the values of the two files listed above, one row per statistic. I generally use ‘nstat –az’. This dumps all counters (even the ones with zero values) and also dumps the absolute values, not the deltas. That way, I can easily dump the values into a spreadsheet and track over time.
Here is an abbreviated sample of the output:
$ nstat –az
#8716.1804289383 sampling_interval=5 time_const=60
IpInReceives 5489680539 558.6
IpInDelivers 5423237728 543.3
IpOutRequests 5431585485 547.1
. . .
TcpActiveOpens 10197238 0.6
TcpPassiveOpens 23181196 3.7
TcpAttemptFails 8255 0.0
TcpEstabResets 2067937 0.2
TcpInSegs 5417453375 542.4
TcpOutSegs 5390968045 542.2
TcpRetransSegs 592798 0.0
TcpInErrs 7343 0.0
TcpOutRsts 28894010 4.4
. . .
TcpExtSyncookiesFailed 8027081 2.0
TcpExtEmbryonicRsts 447 0.0
TcpExtPruneCalled 6 0.0
TcpExtTW 5917516 0.6
TcpExtTWKilled 7 0.0
TcpExtDelayedACKs 49268720 9.2
TcpExtDelayedACKLocked 43360 0.0
TcpExtDelayedACKLost 924413 0.0
TcpExtListenOverflows 62736 0.0
TcpExtListenDrops 62736 0.0
. . .
The first column is the statistic name; the second column is the absolute value of the counter; and the third column is the average rate over the last time interval (default is 60 seconds).
I prefer to use nstat because like I mentioned earlier, it prints out all statistics including those with zero values, which makes it handy to throw into a spreadsheet. But the main reason I like it is because the statistics names it uses are the names that show up in the Linux source code, which makes it easier to research. Unfortunately, some of these statistics are not well-documented – so looking at the source code at least gives you some insight into what they mean.
Example
Let’s illustrate this with an example. I recently was researching an issue where we were focusing on two statistics – TcpExtListenOverflows and TcpExtListenDrops.
Note: in the case of these two particular statistics, there is already an excellent post on TCP Overflow on Andreas Veithen’s blog that explains more about them. However, not all of these statistics are as well-documented and I wanted to extend that article by showing how you can search the source code.
In ‘netstat –s’ output, they look like this:
TcpExt:
. . .
62736 times the listen queue of a socket overflowed
62736 SYNs to LISTEN sockets ignored
. . .
In nstat output, they look like this:
. . .
TcpExtListenOverflows 62736 0.0
TcpExtListenDrops 62736 0.0
. . .
If I want to know why those values are increasing but come up short on a Google search, I can go to a site like Linux Cross Reference and search the source code. If I strip off the prefix “TcpExt” and search the Linux code for “ListenOverflows” or “ListenDrops”, I can see where these values are defined. Here I am searching version 2.6.37 of the source code:
This produces the following hits:
If I look in snmp.h, I see the SNMP MIBs that these statistics map to:
163 /* linux mib definitions */
164 enum
165 {
166 LINUX_MIB_NUM = 0,
167 LINUX_MIB_SYNCOOKIESSENT, /* SyncookiesSent */
168 LINUX_MIB_SYNCOOKIESRECV, /* SyncookiesRecv */
169 LINUX_MIB_SYNCOOKIESFAILED, /* SyncookiesFailed */
170 LINUX_MIB_EMBRYONICRSTS, /* EmbryonicRsts */
171 LINUX_MIB_PRUNECALLED, /* PruneCalled */
172 LINUX_MIB_RCVPRUNED, /* RcvPruned */
173 LINUX_MIB_OFOPRUNED, /* OfoPruned */
174 LINUX_MIB_OUTOFWINDOWICMPS, /* OutOfWindowIcmps */
175 LINUX_MIB_LOCKDROPPEDICMPS, /* LockDroppedIcmps */
176 LINUX_MIB_ARPFILTER, /* ArpFilter */
177 LINUX_MIB_TIMEWAITED, /* TimeWaited */
178 LINUX_MIB_TIMEWAITRECYCLED, /* TimeWaitRecycled */
179 LINUX_MIB_TIMEWAITKILLED, /* TimeWaitKilled */
180 LINUX_MIB_PAWSPASSIVEREJECTED, /* PAWSPassiveRejected */
181 LINUX_MIB_PAWSACTIVEREJECTED, /* PAWSActiveRejected */
182 LINUX_MIB_PAWSESTABREJECTED, /* PAWSEstabRejected */
183 LINUX_MIB_DELAYEDACKS, /* DelayedACKs */
184 LINUX_MIB_DELAYEDACKLOCKED, /* DelayedACKLocked */
185 LINUX_MIB_DELAYEDACKLOST, /* DelayedACKLost */
186 LINUX_MIB_LISTENOVERFLOWS, /* ListenOverflows */
187 LINUX_MIB_LISTENDROPS, /* ListenDrops */
. . .
And if I search the code itself (tcp_ipv4.c), I can see where they get incremented:
1478 exit_overflow:
1479 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENOVERFLOWS);
1480 exit_nonewsk:
1481 dst_release(dst);
1482 exit:
1483 NET_INC_STATS_BH(sock_net(sk), LINUX_MIB_LISTENDROPS);
1484 return NULL;
If I want to see what conditions cause the exit_overflow or exit calls to be made, we can search that as well. Further down in tcp_ipv4.c, I see:
1414
1415 if (sk_acceptq_is_full(sk))
1416 goto exit_overflow;
1417
1418 if (!dst && (dst = inet_csk_route_req(sk, req)) == NULL)
1419 goto exit;
1420
. . .
1469
1470 if (__inet_inherit_port(sk, newsk) < 0) {
1471 sock_put(newsk);
1472 goto exit;
1473 }
FYI – the verbose descriptions that are used in the netstat command (which can be viewed on the net-tools sourceforge.net site) are stored in the the file statistics.c:
struct entry Tcpexttab[] =
{
{"SyncookiesSent", N_("%u SYN cookies sent"), opt_number},
{"SyncookiesRecv", N_("%u SYN cookies received"), opt_number},
{"SyncookiesFailed", N_("%u invalid SYN cookies received"), opt_number},
{ "EmbryonicRsts", N_("%u resets received for embryonic SYN_RECV sockets"),
opt_number },
{ "PruneCalled", N_("%u packets pruned from receive queue because of socket"
" buffer overrun"), opt_number },
/* obsolete: 2.2.0 doesn't do that anymore */
{ "RcvPruned", N_("%u packets pruned from receive queue"), opt_number },
{ "OfoPruned", N_("%u packets dropped from out-of-order queue because of"
" socket buffer overrun"), opt_number },
{ "OutOfWindowIcmps", N_("%u ICMP packets dropped because they were "
"out-of-window"), opt_number },
{ "LockDroppedIcmps", N_("%u ICMP packets dropped because"
" socket was locked"), opt_number },
{ "TW", N_("%u TCP sockets finished time wait in fast timer"), opt_number },
{ "TWRecycled", N_("%u time wait sockets recycled by time stamp"), opt_number },
{ "TWKilled", N_("%u TCP sockets finished time wait in slow timer"), opt_number },
{ "PAWSPassive", N_("%u passive connections rejected because of"
" time stamp"), opt_number },
{ "PAWSActive", N_("%u active connections rejected because of "
"time stamp"), opt_number },
{ "PAWSEstab", N_("%u packets rejects in established connections because of"
" timestamp"), opt_number },
{ "DelayedACKs", N_("%u delayed acks sent"), opt_number },
{ "DelayedACKLocked", N_("%u delayed acks further delayed because of"
" locked socket"), opt_number },
{ "DelayedACKLost", N_("Quick ack mode was activated %u times"), opt_number },
{ "ListenOverflows", N_("%u times the listen queue of a socket overflowed"),
opt_number },
{ "ListenDrops", N_("%u SYNs to LISTEN sockets ignored"), opt_number },
{ "TCPPrequeued", N_("%u packets directly queued to recvmsg prequeue."),
opt_number },
...
I hope this gives you some ideas as to how you can use nstat and netstat to dive into performance at the TCP layer. It has definitely been eye-opening for me and has already borne fruit in our troubleshooting in this layer. Feel free to add any more insights in the comments section.