Wednesday, February 23, 2022

Commvault restore issue troubleshooting

 


Development is suggesting you to apply fix of Bug 33304775 - IPCDAT Errors - IBV Resources are Used for Processes Which Have No Need of Them ( Doc ID 33304775.8 )

There is no workaround, you will need to apply the fix. I'm checking for an available fix for you

Per Dev findings:
=============
The suspicion here is that RMAN is forking DB processes which are using a libcell context/IPCDAT context/QPs that are created by RMAN.

Due to a potential kernel issue, the child does not inherit the security context of the parent correctly (it does a deep copy as expected, but the ibverbs specific code is only doing a shallow comparison of the security context pointer).

If that happens, any ibv_modify_qp() call by the child will hit an EACCES for QPs created by RMAN.


The fix is to make sure that RMAN does not create QPs at all (it should not need to, since it does not do IOs with the cell directly). Bug 33304775 prevents processes from creating QPs unnecessarily, and could help with this issue.
=============
We had a couple of questions for you..
1. Is this issue reproducible 100% of the time when you try restoring this DB? if yes, how much time does it take from the time the restore is started..
2. Do you see the same issue across other DBs that you're migrating to 19c?
3. Are there any other DB or OS processes that are impacted during the same time ?

We can try and arrange a Zoom with rman development and Commvault support during US hours if this issue is easily reproducible. Also here is other information that Engg is requesting..

First, if the CommVault software changes uid or gid or any other process credential during runtime.

Second,
1. Copy the attached dtrace script trace.d to the node where they launch RMAM. The dtrace script will log a message for every attempt to change uid or gid.
2. Run “dtrace -s trace.d > /tmp/dtrace.out”.
3. Reproduce the issue.
a. While commVault/RMAN are running, capture the output of “cat /proc/<pid>/status” periodically, where <pid> is the pid of commVault and RMAN.
4. Once the issue is reproduced, collect the following:
a. The output of “dmesg -T”.
b. The ExaWatcher PS data from the period -- /opt/oracle.ExaWatcher/archive/Ps.ExaWatcher/
/c. var/log/audit/audit.log
d. /var/log/messages
e. /tmp/dtrace.out
f. The periodic dump of “cat /proc/<pid>/status” from Step 3a

Database Options/Management Packs Usage Reporting for Oracle Databases 11.2 and later (Doc ID 1317265.1)

  Database Options/Management Packs Usage Report You can determine whether an option is currently in use in a database by running options_pa...