In this earlier post, I wrote about “reliable message” waits in Oracle and how to find out more about the channel that is being waited on. One of these waits that we’ve been seeing with some frequency is on “Result Cache: Channel”. In some cases, these waits have been debilitating and I believe it merits its own post, in case someone else runs into this issue.
Here is an example of one issue we experienced in 18.104.22.168. Users were reporting a slow system and in looking at session waits, we saw long wait times for the “reliable message” event. In querying GV$CHANNEL_WAITS, we saw the following waits on instance 3 of a RAC database:
select inst_id, channel, messages_published, wait_count, WAIT_TIME_USEC/1000000 wait_time_sec from GV$CHANNEL_WAITS order by inst_id, wait_time_sec desc; INST ID CHANNEL MESSAGES_PUBLISHED WAIT_COUNT WAIT_TIME_SEC ----- ------------------------------------------------------ ------------------ ---------- ------------- . . . 3 Result Cache: Channel 474745796 473767959 267825.106 3 obj broadcast channel 4928144 2643616 5065.185 3 kxfp control signal channel 3349090 1140344 484.500 3 MMON remote action broadcast channel 401055 6999 266.940 3 RBR channel 320568 37809 137.223 . . .
Notice how much time is spent on the “Result Cache: Channel” waits. Foreground processes were waiting on this and we needed to find out why, quickly. Some fast research showed us that this particular channel wait is used to track time waited for synchronization of the result set caches. Oracle tracks DML changes to the tables that are being result cached and it sends messages across the interconnect to all other instances to update their caches related to that table. If those tables have a high DML (insert/update/delete) frequency, it can cause quite a bit of contention on the system.
The RESULT_CACHE Hint
However, to our knowledge, we were not deliberately using result set caching in this database – so where was it coming from? I checked DBA_TABLES to validate that no tables were being result cached and none were. I was aware of the RESULT_CACHE hint, so I ran a query against GV$SQLAREA to see if there were any queries using that particular hint. Lo and behold – there was:
SELECT /* DS_SVC */ /*+ dynamic_sampling(0) no_sql_tune no_monitoring optimizer_features_enable(default) opt_param('parallel_execution_enabled', 'false') result_cache */ COUNT(C1) FROM (SELECT /*+ qb_name("innerQuery") NO_INDEX_FFS( "PR") */ 1 AS C1 FROM ""."PRVSN_RQST" SAMPLE BLOCK(8.72684, 8) SEED(4) "PR" WHERE ("PR"."PRVSN_RQST_ID"=TO_NUMBER(:B1))) innerQuery
The Culprit? SQL Tuning Advisor
Since this appeared to be an Oracle-generated query (based on the hints), we searched on MOS and found that this type of query comes from the SQL Tuning Advisor (STA). Subsequent searching within Oracle Enterprise Manager (OEM) showed us that we did have some STA jobs that had been run recently:
Apparently, one of our DBAs had been using the SQL Tuning Advisor to try to improve the performance of some of the queries. However, because of this unknown side effect of the STA, it actually ended up causing performance issues.
The table involved has highly concurrent INSERT/UPDATE activity and the system was very busy trying to constantly update the result caches across all four nodes on this RAC cluster. Consequently, foreground processes/sessions were spending a lot of time waiting on this cache synchronization.
Resolving this issue was accomplished by doing two things: first we flushed the result cache:
set serveroutput on BEGIN IF dbms_result_cache.flush THEN dbms_output.put_line('Flush Successful'); ELSE dbms_output.put_line('Flush Failure'); END IF; END; /
Then, we told the DBAs not to use STA for tuning or that if they did, they would need to make sure they clean up after themselves afterwards.
According to MOS Doc ID 1951729.1, another option would be to disable the result cache altogether, using:
alter system set result_cache_max_size=0;
This definitely seems like buggy behavior to me – a tool that is supposed to help improve performance should not be able to cause performance issues. I’m not sure what value there is to the STA caching its results and I’m therefore not sure why they felt they needed to add that hint to the code they run.
Any insights are appreciated – please leave them in the comments.