SecureFile LOB Performance Issue

I wanted to document an issue that we experienced in one of our databases after upgrading to Oracle 12c (12.1.0.2).  The issue was that occasionally an update to one of our tables would take much longer to execute than normal.  Unfortunately, this issue did not get caught in testing and made its way into production.  It didn’t happen every time the update occurred and for this reason, it was difficult to track down.

Here’s how I approached the troubleshooting of this issue and the bug that we identified.

Since this wasn’t happening every time but we knew which SQL statement was involved, I elected to do a SQL_ID trace:

alter system set events 'sql_trace [sql:sql_id=89hm7xzttp9c9] plan_stat=all_executions,wait=true,bind=false';

After setting the trace, I monitored the trace directory waiting to find the long running UPDATEs.  To do this, I grep’d the trace files looking for EXEC operations on that SQL_ID with long elapsed time (“e=”) values.  The longest one was just over 89 seconds:

find . -type f -mtime -1 -name "*.trc" -exec grep -ln "89hm7xzttp9c9" {} \; |xargs egrep "EXEC" |cut -d"," -f2|cut -d"=" -f2|sort -n
6134275
6406358
53427796
78365426
89383890

Once I got several trace files, I turned the SQL_ID trace off:

alter system set events 'sql_trace [sql:sql_id=89hm7xzttp9c9] off';

I then pulled the trace files that had the longest EXEC times.  Here is the pattern I saw in the various trace files:

 PARSING IN CURSOR . . .
 UPDATE  x SET . . .
 . . .
 RETURNING  INTO :dl0
 END OF STMT
 BINDS #140344624162424:
. . .
 WAIT #140344624162424: nam='db file sequential read' ela= 634 file#=107 block#=3220853 blocks=1 obj#=161211 tim=2967436896291
 WAIT #140344624162424: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967436896532
 EXEC #140344624162424:c=3000,e=3721,p=1,cr=4,cu=23,mis=0,r=1,dep=0,og=1,plh=2409839646,tim=2967436896563
 WAIT #140344624162424: nam='SQL*Net message from client' ela= 88 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967436896739
 LOBPGSIZE: type=PERSISTENT LOB,bytes=8060,c=0,e=24,p=0,cr=0,cu=0,tim=2967436896797
 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967436896823
 WAIT #0: nam='SQL*Net message from client' ela= 78 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967436896918
 WAIT #140344624822120: nam='SQL*Net more data from client' ela= 13 driver id=1650815232 #bytes=222 p3=0 obj#=87726 tim=2967436897091
 WAIT #140344624822120: nam='db file sequential read' ela= 350 file#=124 block#=212669 blocks=1 obj#=87726 tim=2967436897811
 WAIT #140344624822120: nam='db file sequential read' ela= 329 file#=124 block#=212670 blocks=1 obj#=87726 tim=2967436898239
 WAIT #140344624822120: nam='db file sequential read' ela= 336 file#=124 block#=212671 blocks=1 obj#=87726 tim=2967436898675

. . .  < 196000 WAITS !!! > . . .

 WAIT #140344624822120: nam='gc current grant 2-way' ela= 238 p1=15 p2=22656 p3=33554444 obj#=87726 tim=2967520128096
 WAIT #140344624822120: nam='gc current grant 2-way' ela= 208 p1=15 p2=22657 p3=33554433 obj#=87726 tim=2967520129038
 WAIT #140344624822120: nam='gc current grant 2-way' ela= 189 p1=15 p2=22658 p3=33554433 obj#=87726 tim=2967520129359
 LOBWRITE: type=PERSISTENT LOB,bytes=6468,c=33450915,e=83232687,p=133280,cr=419937,cu=274972,tim=2967520129628
 WAIT #0: nam='SQL*Net message to client' ela= 2 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967520129713
 WAIT #0: nam='SQL*Net message from client' ela= 608 driver id=1650815232 #bytes=1 p3=0 obj#=87726 tim=2967520130349

Between the LOBPGSIZE operation and the LOBWRITE operation, there were over 196,000 wait events, most of them “db file sequential read” and “gc cr disk read”.  The LOBWRITE operation took about 83 seconds.

I wanted to see if there were any particular blocks that were being visited as part of this operation so I grep’d and counted the various occurrences of the “sequential read” and “gc cr disk read” operations.   What I noticed is that the same blocks were being visited over and over again – some of them as many as 681 times:

$ grep "sequential read" pdb2_ora_76294.trc|cut -d" " -f9,10,12|sort|uniq -c|sort -rn|more
    681 file#=28 block#=1793355 obj#=96666
    681 file#=28 block#=1793354 obj#=96666
    681 file#=28 block#=1793353 obj#=96666
. . .
    680 file#=28 block#=1620571 obj#=96660
    680 file#=28 block#=1620570 obj#=96660
    680 file#=28 block#=1620569 obj#=96660
. . .
    320 file#=28 block#=3592814 obj#=96643
    320 file#=28 block#=3592813 obj#=96643
    320 file#=28 block#=3592812 obj#=96643
. . .

$ grep "gc cr disk read" pdb2_ora_76294.trc|cut -d" " -f9,10,12|sort|uniq -c|sort -rn|more
    681 p1=28 p2=1793355 obj#=96666
    681 p1=28 p2=1793354 obj#=96666
    681 p1=28 p2=1793353 obj#=96666
. . .
    680 p1=28 p2=1620570 obj#=96660
    680 p1=28 p2=1620569 obj#=96660
    680 p1=28 p2=1620568 obj#=96660
. . .
    320 p1=28 p2=3592814 obj#=96643
    320 p1=28 p2=3592813 obj#=96643
    320 p1=28 p2=3592812 obj#=96643
. . .

Here is what a “normal” execution of this UPDATE looked like.  The normal LOBWRITE took only 680 microseconds:

 PARSING IN CURSOR . . .
 UPDATE   set 
 RETURNING  INTO :dl0
 END OF STMT
 BINDS #140344626255832:
. . .
 WAIT #140344626255832: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434169512
 EXEC #140344626255832:c=3000,e=4262,p=0,cr=5,cu=17,mis=0,r=1,dep=0,og=1,plh=2409839646,tim=2967434169557
 WAIT #140344626255832: nam='SQL*Net message from client' ela= 76 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434169722
 LOBPGSIZE: type=PERSISTENT LOB,bytes=8060,c=0,e=25,p=0,cr=0,cu=0,tim=2967434169813
 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434169845
 WAIT #0: nam='SQL*Net message from client' ela= 76 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434169938
 WAIT #140344635454240: nam='SQL*Net more data from client' ela= 12 driver id=1650815232 #bytes=222 p3=0 obj#=-1 tim=2967434170178
 LOBWRITE: type=PERSISTENT LOB,bytes=5358,c=0,e=680,p=0,cr=2,cu=8,tim=2967434170670
 WAIT #0: nam='SQL*Net message to client' ela= 1 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434170717
 WAIT #0: nam='SQL*Net message from client' ela= 254 driver id=1650815232 #bytes=1 p3=0 obj#=-1 tim=2967434170990

Believing this to be a bug, we then opened an SR and came to find out that we were experiencing bug # 22905136 – “SECUREFILE LOB HIGH SINGLE BLOCK PHYSICAL READ”.  The note for this bug is MOS Doc ID 2189248.1.

Applying the patch for this bug fixed the issue.

So if your application makes use of Securefile LOBs and you are upgrading to 12.1, check out this bug and make sure that this patch is applied before going into test and production.

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google photo

You are commenting using your Google account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s