As a long-time performance DBA, I’ve often felt that it is important to know something about troubleshooting the layers that are upstream and downstream of the database in the technology stack. Lately, I’ve been making use of packet captures and Wireshark to solve tough issues in the TCP layer. We recently resolved a long-standing issue with TCP retransmissions that were causing connection drops between an application server and one of our databases and I thought this might help others faced with similar issues.
This problem started with a series of TNS-12535 messages that were seen in the Oracle alert logs for one of our databases:
One area that I’ve been spending quite a bit of time looking at lately is the TCP layer on our servers. We have seen multiple issues that involve TCP and it is an oft-overlooked area when troubleshooting.
There are two tools that I’d like to focus on today – netstat and nstat. Both tools pull statistics from the following Linux files, which track network-related statistics and SNMP counters:
Here is what the output of these two files looks like: