Lies, Damned Lies and Metrics

(Apologies to Mark Twain…)

I’ve long subscribed to the principle of “Follow the Data” when it comes to troubleshooting performance. However, sometimes the data can be misleading (“lie” is an awful strong word) and sometimes the metrics you need just aren’t there. I was reminded of that this week while looking into a production performance issue with one of our critical applications.

The issue was presenting itself as an I/O problem in the database layer. Oracle wait event metrics from ASH (Active Session History) were indicating that I/O operations were taking longer than normal. Normally when we see this, we gather data about the I/O subsystem using utilities like iostat. Since this was on an Exadata, we also used the cellcli utility to report on storage cell information. However – this time – neither of these utilities was showing long I/O waits corresponding to our issue.

Continue reading

Red Flags – How a Data Quality Issue Becomes a Performance Issue

When I’m reviewing the query performance of our applications, one of SQL constructs that raises a red flag for me is the use of LOWER and UPPER functions in the WHERE clause (as well as TRIM/LTRIM/RTRIM).

The reason these functions trigger a red flag is because they are typically used as a workaround to a data quality problem and this workaround usually causes downstream performance impacts. Developers use these functions when they don’t trust the format of the data in the column or when they don’t trust the format of the value that is being compared.

How does this create a downstream performance problem?  Wrapping an indexed column in a function disables the use of the index and often causes a full table scan to occur. This can take a query that would normally execute in less than a millisecond and cause it to take much longer – depending on how big the table is that needs to be full-scanned.

Here is an example from a recently reviewed application:

Continue reading