Diagnosing Oracle “reliable message” Waits

Reliable Message waits are cryptic by nature.  It is a general purpose wait event that tracks many different types of channel communications within the Oracle database.  I’ve read some blogs that suggest that this is a benign wait event that can be ignored.  My experience is that they are not benign and should not be ignored.  This post will show you how to decipher these events and resolve the issue.

Here is what you might see in an AWR report:


Notice that “reliable message” is the top wait event at 61% of overall DB time.

In order to decipher these, you’ll need to focus on the P1 values associated with the waits.  I have found it most useful to pull the P1 values with the highest wait times.  You can pull the data from GV$ / V$ACTIVE_SESSION_HISTORY or from DBA_HIST_ACTIVE_SESS_HISTORY.  Since there are many different channels, you may need to specify a particular timeframe that you are interested in if it was severely affecting your performance.

Getting the P1 values (“Channel Context”)

  to_char(p1, 'XXXXXXXXXXXXXXXX') event_param,
  count(*), sum(time_waited/1000000) time_waited
from gv$active_session_history
where event = 'reliable message'
group by to_char(p1, 'XXXXXXXXXXXXXXXX')
order by time_waited desc

EVENT_PARAM                                           COUNT(*) TIME_WAITED
--------------------------------------------------- ---------- -----------
        3CCF8A1D8                                          202  104.148331
        3CCF96200                                          106   39.235101
        3CCF9AFF0                                           64   23.084554
        3CCF8A330                                            3    1.004721
        3CCF87C38                                            1    1.000295

  to_char(p1, 'XXXXXXXXXXXXXXXX') event_param,
  count(*) , sum(time_waited/1000000) time_waited
from dba_hist_active_sess_history 
where event = 'reliable message'
group by to_char(p1, 'XXXXXXXXXXXXXXXX')
order by time_waited desc

EVENT_PARAM                                           COUNT(*) TIME_WAITED
--------------------------------------------------- ---------- -----------
        3C9071160                                          200  149.222728
        3CD0B2980                                          204  130.562281
        3CD0B7770                                          151   88.174098
        3CCF9AFF0                                          120   78.041257
        3C906C370                                           21   11.966220
        3CCF8A1D8                                           20   10.306872
        3CCF96200                                           16    4.381981

Look up the channel description from the P1 value

select name_ksrcdes 
from x$ksrcdes 
where indx = (select name_ksrcctx from x$ksrcctx where addr like '%&addr%');
SQL> /
Enter value for addr: 3CCF8A1D8
RBR channel


There are also two views (GV$ / V$CHANNEL_WAITS) that can be used to pull all channel waits regardless of the timeframe.  You can use this to see the cumulative totals for all channel waits since instance startup:

  inst_id, channel, messages_published, wait_count,
  WAIT_TIME_USEC/1000000 wait_time_sec
order by inst_id, wait_time_sec desc;

   ID CHANNEL                                                MESSAGES_PUBLISHED WAIT_COUNT WAIT_TIME_SEC
----- ------------------------------------------------------ ------------------ ---------- -------------
    1 obj broadcast channel                                             4172316    1371368    3097.02528
    1 MMON remote action broadcast channel                               289208     272849    731.903429
    1 Result Cache: Channel                                           474745359     959831    604.858179
    1 kxfp control signal channel                                       2666851    1085395    412.432036
    1 RBR channel                                                        166981     128774     66.667725

    2 MMON remote action broadcast channel                               289202       6349    685.204776
    2 kxfp control signal channel                                       2666813     545023    203.762707
    2 obj broadcast channel                                             4172316      52995      96.11189
    2 Result Cache: Channel                                           474745359        248       .214467
    2 RBR channel                                                        166981        336       .151961

    3 Result Cache: Channel                                           474745796  473767959    267825.106
    3 obj broadcast channel                                             4928144    2643616    5065.18535
    3 kxfp control signal channel                                       3349090    1140344    484.500164
    3 MMON remote action broadcast channel                               401055       6999    266.940782
    3 RBR channel                                                        320568      37809    137.223524

Notice in the above output that the “Result Cache” channel wait on Instance 3 has the largest wait times by far.  I’ll devote a separate post to what we experienced with this particular channel wait, but for now, here is some information that I’ve found on “RBR channel” and “obj broadcast channel”.

“RBR channel”

If you are seeing high RBR channel waits, see the following MOS/Metalink Doc ID 15826962.8 – Bug 15826962 – High “reliable message” wait due to “RBR channel”

“obj broadcast channel”

See MOS/Metalink Doc ID 1644828.1 – Checkpoint Contention With Block Change Tracking Enabled

Hope this helps!


2 thoughts on “Diagnosing Oracle “reliable message” Waits

  1. Hi,
    I had similar issue and this is oracle support answer:
    Patch 20470877
    Product Oracle Database Family
    Platform Linux x86-64
    Last Updated 18-MAR-2015


Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

Connecting to %s