Thursday, July 15, 2021

Data Guard Physical Standby - Converting a Snapshot Standby back to a Physical Standby using Data Guard Broker (Doc ID 1546657.1)

 this Document

Goal
Solution
References

APPLIES TO:

Oracle Database - Enterprise Edition - Version 11.2.0.1 to 12.1.0.2 [Release 11.2 to 12.1]
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Information in this document applies to any platform.

GOAL

 ***Checked for relevance on 20-Oct-2016***

Perform a conversion of a Snapshot Standby database back to a Physical Standby Database.

Perform checks relevant to a successful conversion.

Monitor the conversion process to ensure that it has been executed correctly without error.

SOLUTION

  

NOTE: In the images and/or the document content below, the user information and environment data used represents fictitious data
from the Oracle sample schema(s), Public Documentation delivered with an Oracle database product or other training material.
Any similarity to actual environments, actual persons, living or dead, is purely coincidental and not intended in any manner.

  

For the purposes of this document, the following fictitious environment is used as an example to describe the procedure:

Primary database name: rs1_stb

Standby/Snapshot database name : rs1

Diskgroup : +DATA ,+FRA

  

 

 

 

The current configuration

The Primary Site

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Apr 18 11:58:26 2013



Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> select db_unique_name,database_role,open_mode from v$database;

DB_UNIQUE_NAME               DATABASE_ROLE    OPEN_MODE
------------------------------ ---------------- --------------------
rs1_stb                PRIMARY        READ WRITE

 
The Standby Site

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Apr 18 11:50:31 2013


Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> select db_unique_name,database_role,open_mode from v$database;

DB_UNIQUE_NAME               DATABASE_ROLE    OPEN_MODE
------------------------------ ---------------- --------------------
rs1                   SNAPSHOT STANDBY READ WRITE

 

Checks to perform prior to the conversion

1. Check from a Data Guard Broker perspective the configuration is healthy and the configuration status is reported as "SUCCESS".

$ dgmgrl
DGMGRL for Linux: Version 11.2.0.3.0 - 64bit Production

Welcome to DGMGRL, type "help" for information.

DGMGRL> connect sys@rs1_stb
Password:
Connected.
DGMGRL> show configuration;

Configuration - rs1_dg

  Protection Mode: MaxPerformance
  Databases:
    rs1_stb - Primary database
    rs1     - Snapshot standby database

Fast-Start Failover: DISABLED

Configuration Status:
SUCCESS

 
2. Confirm log shipping is occurring between the sites

$ sqlplus / as sysdba

SQL*Plus: Release 11.2.0.3.0 Production on Thu Apr 18 12:08:18 2013



Connected to:
Oracle Database 11g Enterprise Edition Release 11.2.0.3.0 - 64bit Production
With the Partitioning, Automatic Storage Management, OLAP, Data Mining
and Real Application Testing options

SQL> alter system switch logfile;

System altered.

SQL> /

System altered.


SQL> archive log list
Database log mode           Archive Mode
Automatic archival           Enabled
Archive destination           USE_DB_RECOVERY_FILE_DEST
Oldest online log sequence     297
Next log sequence to archive   299
Current log sequence           299

Primary Site Alert Log

Details the switching occurring and the shipping of the logs

Starting background process SMCO
Thu Apr 18 11:57:15 2013
SMCO started with pid=24, OS id=3796
Thu Apr 18 12:09:04 2013
Destination LOG_ARCHIVE_DEST_2 is SYNCHRONIZED
LGWR: Standby redo logfile selected for thread 1 sequence 298 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 298 (LGWR switch)
  Current log# 3 seq# 298 mem# 0: +DATA/rs1_stb/onlinelog/group_3.278.801239719
  Current log# 3 seq# 298 mem# 1: +FRA/rs1_stb/onlinelog/group_3.315.801239719
Thu Apr 18 12:09:07 2013
Archived Log entry 380 added for thread 1 sequence 297 ID 0x6cd69690 dest 1:
Thu Apr 18 12:09:19 2013
LGWR: Standby redo logfile selected for thread 1 sequence 299 for destination LOG_ARCHIVE_DEST_2
Thread 1 advanced to log sequence 299 (LGWR switch)
  Current log# 1 seq# 299 mem# 0: +DATA/rs1_stb/onlinelog/group_1.276.801239713
  Current log# 1 seq# 299 mem# 1: +FRA/rs1_stb/onlinelog/group_1.327.801239715
Thu Apr 18 12:09:22 2013
Archived Log entry 382 added for thread 1 sequence 298 ID 0x6cd69690 dest 1:

Standby Site Alert Log

Details the receiving of the logs in the Snapshot Standby

$ tail -f /u01/app/oracle/diag/rdbms/rs1/rs1/trace/alert_rs1.log
Thu Apr 18 11:52:11 2013
RFS[4]: Assigned to RFS process 3799
RFS[4]: Selected log 5 for thread 1 sequence 296 dbid 1823795963 branch 800115647
Archived Log entry 528 added for thread 1 sequence 296 ID 0x6cd69690 dest 3:
Thu Apr 18 11:53:30 2013
ALTER SYSTEM SET local_listener='(DESCRIPTION=(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(HOST=IP)(PORT=1521))))' SCOPE=MEMORY SID='rs1';
Thu Apr 18 11:54:09 2013
Starting background process SMCO
Thu Apr 18 11:54:09 2013
SMCO started with pid=23, OS id=3812
Thu Apr 18 12:09:03 2013
RFS[2]: Selected log 5 for thread 1 sequence 298 dbid 1823795963 branch 800115647
Thu Apr 18 12:09:04 2013
Archived Log entry 529 added for thread 1 sequence 297 ID 0x6cd69690 dest 3:
Thu Apr 18 12:09:18 2013
RFS[2]: Selected log 4 for thread 1 sequence 299 dbid 1823795963 branch 800115647
Thu Apr 18 12:09:19 2013
Archived Log entry 530 added for thread 1 sequence 298 ID 0x6cd69690 dest 3:

 

Perform the conversion from a Snapshot Standby to a Physical Standby

Given the shipping is successful and the broker configuration is begin reported as healthy the conversion back to a physical standby can be performed.

1. Start the broker command line and ensure a sys@TNSALIAS connection is made to ensure a successful shutdown and startup of the standby site required throughout this process.

DGMGRL> connect sys@rs1_stb
Password:
Connected.

 
2. Issue the convert command from the broker

DGMGRL> convert database rs1 to physical standby;

 
The convert command will perform the following steps

- Shutdown snapshot standby down and restart it to mount

Converting database "rs1" to a Physical Standby database, please wait...
Operation requires shutdown of instance "rs1" on database "rs1"
Shutting down instance "rs1"...
Database closed.
Database dismounted.
ORACLE instance shut down.
Operation requires startup of instance "rs1" on database "rs1"
Starting instance "rs1"...
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Database mounted.

 
- Another shutdown of the standby is initiated once the flashback operation has been performed. This flashback will set the standby site back to a time prior to the conversion to a snapshot standby.

Continuing to convert database "rs1" ...
Operation requires shutdown of instance "rs1" on database "rs1"
Shutting down instance "rs1"...
ORA-01109: database not open

Database dismounted.
ORACLE instance shut down.

 
- The Standby site which has now been converted back to a Physical Standby site will now be restarted.

Operation requires startup of instance "rs1" on database "rs1"
Starting instance "rs1"...
ORA-32004: obsolete or deprecated parameter(s) specified for RDBMS instance
ORACLE instance started.
Database mounted.
Database "rs1" converted successfully

 
Alert Log Extract from the Standby Site detailing the change from Snapshot Standby to Physical Standby

$ tail -f /u01/app/oracle/diag/rdbms/rs1/rs1/trace/alert_rs1.log
..
.
Fri Apr 19 11:41:46 2013
Shutting down instance (immediate)
Stopping background process SMCO
Shutting down instance: further logons disabled
..
.
Fri Apr 19 11:41:55 2013
NOTE: Deferred communication with ASM instance
NOTE: deferred map free for map id 12
Completed: alter database CLOSE NORMAL
alter database DISMOUNT
Shutting down archive processes
Archiving is disabled
..
.
Fri Apr 19 11:42:03 2013
Instance shutdown complete
Fri Apr 19 11:42:04 2013
Starting ORACLE instance (normal)
..
.
Fri Apr 19 11:42:33 2013
RSM0 started with pid=41, OS id=13962
alter database convert to physical standby
ALTER DATABASE CONVERT TO PHYSICAL STANDBY (rs1)
krsv_proc_kill: Killing 4 processes (all RFS)
Flashback Restore Start
..
.
Fri Apr 19 11:42:50 2013
Flashback Restore Complete
Drop guaranteed restore point
Guaranteed restore point  dropped
Deleted Oracle managed file +FRA/rs1/flashback/log_6.412.802083919
Deleted Oracle managed file +FRA/rs1/flashback/log_4.265.800203569
Deleted Oracle managed file +FRA/rs1/flashback/log_3.308.801298337
Clearing standby activation ID 1836586315 (0x6d78154b)
The primary database controlfile was created using the
'MAXLOGFILES 16' clause.
There is space for up to 13 standby redo logfiles
Use the following SQL commands on the standby database to create
standby redo logfiles that match the primary database:
ALTER DATABASE ADD STANDBY LOGFILE 'srl1.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl2.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl3.f' SIZE 52428800;
ALTER DATABASE ADD STANDBY LOGFILE 'srl4.f' SIZE 52428800;
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Fri Apr 19 11:42:50 2013
ARC2: Becoming the active heartbeat ARCH
Clearing online redo logfile 1 +DATA/rs1/onlinelog/group_1.261.800115647
Clearing online log 1 of thread 1 sequence number 7
Clearing online redo logfile 1 complete
Clearing online redo logfile 2 +DATA/rs1/onlinelog/group_2.262.800115651
Clearing online log 2 of thread 1 sequence number 8
Clearing online redo logfile 2 complete
Clearing online redo logfile 3 +DATA/rs1/onlinelog/group_3.263.800115655
Clearing online log 3 of thread 1 sequence number 6
Clearing online redo logfile 3 complete
Completed: alter database convert to physical standby
Fri Apr 19 11:43:01 2013
Primary database is in MAXIMUM PERFORMANCE mode
Changing standby controlfile to MAXIMUM PERFORMANCE mode
RFS[4]: Assigned to RFS process 13967
RFS[4]: Selected log 4 for thread 1 sequence 304 dbid 1823795963 branch 800115647
Fri Apr 19 11:43:03 2013
Shutting down instance (immediate)
Shutting down instance: further logons disabled
..
.
Fri Apr 19 11:43:04 2013
NOTE: Deferred communication with ASM instance
All dispatchers and shared servers shutdown
alter database CLOSE NORMAL
ORA-1109 signalled during: alter database CLOSE NORMAL...
alter database DISMOUNT
Shutting down archive processes
Archiving is disabled
..
.
Completed: alter database DISMOUNT
..
.
Fri Apr 19 11:43:16 2013
Instance shutdown complete
Fri Apr 19 11:43:17 2013
Starting ORACLE instance (normal)
..
.
Fri Apr 19 11:43:31 2013
ARC3 started with pid=32, OS id=14084
ARC1: Archival started
ARC2: Archival started
ARC1: Becoming the 'no FAL' ARCH
ARC1: Becoming the 'no SRL' ARCH
ARC2: Becoming the heartbeat ARCH
Physical Standby Database mounted.
Lost write protection disabled
ARC2: Becoming the active heartbeat ARCH
Completed: alter database  mount
..
.
Fri Apr 19 11:43:44 2013
RSM0 started with pid=39, OS id=14112
ALTER SYSTEM SET log_archive_trace=0 SCOPE=BOTH SID='rs1';
ALTER SYSTEM SET log_archive_format='%t_%s_%r.dbf' SCOPE=SPFILE SID='rs1';
ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Attempt to start background Managed Standby Recovery process (rs1)
Fri Apr 19 11:43:48 2013
MRP0 started with pid=40, OS id=14114
MRP0: Background Managed Standby Recovery process started (rs1)
 started logmerger process
Fri Apr 19 11:43:54 2013
Managed Standby Recovery starting Real Time Apply
Parallel Media Recovery started with 2 slaves
Waiting for all non-current ORLs to be archived...
All non-current ORLs have been archived.
Media Recovery Log /tmp/1_284_800115647.dbf
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_285_800115647.dbf
Fri Apr 19 11:43:56 2013
Completed: ALTER DATABASE RECOVER MANAGED STANDBY DATABASE  THROUGH ALL SWITCHOVER DISCONNECT  USING CURRENT LOGFILE
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_286_800115647.dbf
Media Recovery Log /tmp/1_287_800115647.dbf
..
.
Media Recovery Log /u01/app/oracle/product/11.2.0/dbhome_1/dbs/arch1_302_800115647.dbf
Media Recovery Log +FRA/rs1/archivelog/2013_04_19/thread_1_seq_303.308.813152615
Media Recovery Log +FRA/rs1/archivelog/2013_04_19/thread_1_seq_304.265.813152621
Media Recovery Waiting for thread 1 sequence 305 (in transit)
Recovery of Online Redo Log: Thread 1 Group 5 Seq 305 Reading mem 0
  Mem# 0: +DATA/rs1/onlinelog/group_5.269.800367453
  Mem# 1: +FRA/rs1/onlinelog/group_5.316.800367455

Standby Data Guard Broker Log detailing the change from Snapshot Standby to Physical Standby

$  tail -f /u01/app/oracle/diag/rdbms/rs1/rs1/trace/drcrs1.log
04/18/2013 13:53:50
Creating process RSM0
04/18/2013 13:53:54
Process RSM0 re-created with PID = 7625
04/18/2013 13:56:39
Process RSM0, PID = 7625, will be killed
04/18/2013 13:56:46
Creating process RSM0
04/18/2013 13:56:50
Process RSM0 re-created with PID = 7635
04/19/2013 11:41:37
Data Guard notifying Oracle Clusterware of database role change
conversion to physical standby requires this database be restarted to the mounted state
04/19/2013 11:41:52
Notifying DMON of db close
Notifying RSM0 of db close
04/19/2013 11:41:56
Data Guard Broker shutting down
posting shutdown message to primary database 0x02001000
RSM0 successfully terminated
04/19/2013 11:41:58
Command CONVERT DATABASE rs1 completed with warning ORA-16570
>> DMON Process Shutdown <<
04/19/2013 11:42:22
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "+DATA/rs1/dr1rs1.dat"
      dg_broker_config_file2 = "+FRA/rs1/dr2rs1.dat"
04/19/2013 11:42:27
DMON Registering service rs1_DGB with listener(s)
Broker Configuration:       "rs1_dg"
      Protection Mode:            Maximum Performance
      Fast-Start Failover (FSFO): Disabled, flags=0x0, version=3
      Primary Database:           rs1_stb (0x02010000)
      Standby Database:           rs1, Enabled Snapshot Standby (0x01010000)
04/19/2013 11:42:28
Data Guard notifying Oracle Clusterware of database role change
04/19/2013 11:42:32
Creating process RSM0
04/19/2013 11:43:00
CONVERT to PHYSICAL succeeded. Restart required.
04/19/2013 11:43:09
Data Guard Broker shutting down
posting shutdown message to primary database 0x02001000
RSM0 successfully terminated
04/19/2013 11:43:12
>> DMON Process Shutdown <<
04/19/2013 11:43:35
>> Starting Data Guard Broker bootstrap <<
Broker Configuration File Locations:
      dg_broker_config_file1 = "+DATA/rs1/dr1rs1.dat"
      dg_broker_config_file2 = "+FRA/rs1/dr2rs1.dat"
04/19/2013 11:43:39
DMON Registering service rs1_DGB with listener(s)
Broker Configuration:       "rs1_dg"
      Protection Mode:            Maximum Performance
      Fast-Start Failover (FSFO): Disabled, flags=0x0, version=3
      Primary Database:           rs1_stb (0x02010000)
      Standby Database:           rs1, Enabled Physical Standby (0x01010000)
Physical standby bootstrap requires Oracle Clusterware buildup
04/19/2013 11:43:44
rs1 version check successfully completed
rs1 has current configuration metadata,
      completing bootstrap
Creating process RSM0
04/19/2013 11:43:54
Notifying Oracle Clusterware that database is open
Data Guard notifying Oracle Clusterware to start services and other instances change

 

REFERENCES

NOTE:443720.1 - Using Snapshot Standby Database.
NOTE:1623815.1 - Data Guard Snapshot Standby Archive Redo Log Management and Configuration

Repairing SQL Performance Regression with SQL Plan Management

 


Nigel Bayliss
PRODUCT MANAGER

You were hoping for an uneventful day at work when you notice that you have a critical workload query that is taking much longer to execute than usual, and you can see that it is beginning to affect your service levels. You suspect that it has a sub-optimal execution plan and you need to get it sorted out immediately. Is there a quick and easy way to do that?

Automatic SQL plan management can deal with this type of issue without DBA intervention, but what can you do if this feature is not available to you? Fortunately, you can take advantage of enhancements made in SQL plan management (SPM) from Oracle Database 18c onwards (I have a word to say about Oracle Database 12c Release 2 later on). The approach presented here must be initiated by a DBA, but it is nevertheless very easy to use.

The steps are as follows:

  1. Capture the 'problem' SQL statement plan in a SQL plan baseline.
  2. Run the SPM evolve task for this SQL plan baseline (using a particular set of parameters which I'll show you below).
  3. Accept the recommended plan.

In many cases  - that's it! 

SPM evolution will locate and test execute previously-used SQL execution plans and figure out which ones are really the best. It does not rely on optimizer costings alone. It can be configured to search the automatic workload repository (AWR), the Oracle Database cursor cache and SQL tuning sets. In other words, if you have a better plan in a query repository or the cursor cache, SPM can find it and apply it to the problem query.

In broad terms, this is what the procedure looks like:

BEGIN 
   --
   -- Create a SQL plan baseline for the problem query plan
   -- (in this case assuming that it is in the cursor cache)
   -- 
   n := dbms_spm.load_plans_from_cursor_cache(
                  sql_id => '<problem_SQL_ID>', 
                  plan_hash_value=> <problem_plan_hash_value>, 
                  enabled => 'no');
   --
   -- Set up evolve
   --
   tname := DBMS_SPM.CREATE_EVOLVE_TASK(sql_handle=>handle); 

   DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( 
      task_name => tname,
      parameter => 'ALTERNATE_PLAN_BASELINE', 
      value     => 'EXISTING');

   DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( 
      task_name => tname,
      parameter => 'ALTERNATE_PLAN_SOURCE', 
      value     => 'CURSOR_CACHE+AUTOMATIC_WORKLOAD_REPOSITORY+SQL_TUNING_SET');

   DBMS_SPM.SET_EVOLVE_TASK_PARAMETER( 
      task_name => tname,
      parameter => 'ALTERNATE_PLAN_LIMIT', 
      value     => 'UNLIMITED');
   --
   -- Evolve
   --
   ename := DBMS_SPM.EXECUTE_EVOLVE_TASK(tname);
   --
   -- Optionally, choose to implement immediately
   --
   n := DBMS_SPM.IMPLEMENT_EVOLVE_TASK(tname);
END; 
/

There will be cases where a SQL statement was previously performing very well and was never picked up and stored in AWR (and perhaps are no longer in the cursor cache). In this case, SPM might not be able to find the previous good plan. You can mitigate this risk if you periodically capture your workloads in SQL tuning sets or if you enable the automatic SQL tuning set. The 'SQL_TUNING_SET' option shown above will include SQL tuning sets as the source of potential execution plans. This will improve the likelihood that you can repair queries that previously ran very quickly and did not show up in AWR. 

The method described above is available in Oracle Database 18c, but note that SPM internals were enhanced in this area for Oracle Database 19c. It is therefore possible that you will find that the technique will work best in this release. Oracle Database 12c Release 2 includes the parameter settings used above but the task (at the time of writing) will sometimes fail with ORA-01422 due to bug number 29539794.

SQL plan management applies some acceptance criteria and will only accept plans that perform better than the regressed plan by some margin. In the fully-worked example in GitHub you will see that the SPM evolve report is displayed (the primary script is spm.sql). The report clearly indicates whether or not the acceptance criteria is passed and if it is not, then the plan found in AWR will not be accepted.

The use of SQL tuning sets and AWR has licensing implications. Always  check the Database Licence Information User Manual.

You can help to improve the scripts - comments welcome.

How to Gather Optimizer Statistics Fast!

 

How to Gather Optimizer Statistics Fast!

Nigel Bayliss
PRODUCT MANAGER

There are a number of ways to speed up the process of gathering optimizer statistics, but I'm not sure that it's common knowledge just how much of an effect some simple changes can make. If you have been asking yourself, "why is stats gathering taking so long and what can I do about it?", then this post is for you.

If you are already familiar with the different methods of gathering optimizer statistics, you might want to jump to the end of this post where I compare them and make some recommendations.

Overview

The perception that it's difficult to speed up statistics gathering has sometimes motivated DBAs to manipulate the number of rows sampled on a table-by-table basis using the ESTIMATE_PERCENT parameter (or DBMS_STATS preference). For example, large tables may have estimate percent set to 1% and small tables, 100%. Legacy scripts play their part too: some systems are still using procedures established before the performance enhancements available with auto sample size. One of the reasons we recommend ESTIMATE_PERCENT=>AUTO_SAMPLE_SIZE is that it includes number of distinct value (NDV) optimizations that yield high performance and accurate statistics. Gathering statistics using a 1% sample of rows might complete very quickly, but inaccurate statistics are the likely result, along with sub-optimal SQL execution plans.

Instead of manipulating ESTIMATE_PERCENT, the time taken to gather statistics can be reduced by using more machine resources. This post compares some before-and-after scenarios to demonstrate how you can do this. Fully worked examples are available in GitHub.

I will concentrate on using automatic optimizer statistics gathering, but the lessons are broadly applicable to manual statistics gathering too (there's an example at the end of the post). The examples are intended for use on Oracle Database 12c and Oracle Database 18c. The same techniques are applicable to Oracle Database 11g, but note that the resource consumer groups have different names in that release.

Option #1 - Default Statistics Gathering

Consider the following trace of CPU consumption over time: 

CPU, simple gather

It shows my CPU utilization while the automatic statistics gathering job is running and there's not much else happening on the system. Notice that about 75% of the CPU is not utilized. This fact is easy to understand once you know that the environment has a 4-core CPU with one thread per core. By default, statistics gathering uses a single process (with a single worker-thread) and this will utilize the processing power of a single CPU core. In my case, this equates to a utilization of 25% (one quarter of the 4-core chip). For systems with a higher core count, the single process will utilize an even smaller proportion of the available CPU.

Gathering statistics like this is not necessarily a problem. If stats gathering runs to completion most nights and there's no urgent need to have fresh statistics by a certain time, then there's no need to do anything more. Always keep things as simple as possible and only make changes if you need to. If your environment is large and/or volatile, the auto statistics job might regularly fail to run to completion in the batch window. In other words, the window might close before all tables considered stale have fresh statistics. If this is the case, then some tables might remain stale for a long time.

Fortunately, this situation is easy to see. If you view the statistics advisor report available in Oracle Database 12c Release 2, then it will tell you. The data dictionary stored this information too. In the example below, my batch window is 20 minutes long and the auto stats job has sometimes failed to complete (status STOPPED). The JOB_INFO column reveals the reason: auto statistics collection is occasionally taking longer than 20 minutes and terminates when the batch window closes.

Task History

How can we fix this? We could (a) make the batch window longer and/or (b) speed up statistics gathering. I am going to consider option b (because option a is less interesting).

How do you speed up statistics gathering? If you have resources on your database server, then you could dedicate more of it to gather statistics. You can reduce the elapsed time of gathering statistics at the cost of a more fully utilized database server. It is of course necessary to identify a window of time where there's spare system resource, so this solution requires that the system is not running at 100% all of the time.

It is worth noting that other techniques are available to reduce the time required to maintain statistics (such as incremental statistics maintenance), but this is out of scope for the purposes of this blog post.

Option #2 - Gathering Statistics in Parallel – AUTO_DEGREE

Gathering statistics with auto sample size initiates full table scans to inspect table data. We can leverage parallel execution to make these scans complete in less time. To do this you can, for example, identify large tables and define a specific degree of parallelism (DOP):

exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', 16)

There is an easier, set-and-forget approach where you can let Oracle to decide on the DOP for you:

exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', DBMS_STATS.AUTO_DEGREE)

A clean and simple approach is to set the property at the global level:

exec dbms_stats.set_global_prefs('DEGREE', DBMS_STATS.AUTO_DEGREE)

With parallel execution in play, statistics gathering has the potential to consume lots of system resource, so you need to consider how to control this. When the auto stats gathering job executes it (by default) uses the resource management plan DEFAULT_MAINTENANCE_PLAN and a consumer group called ORA$AUTOTASK. This makes it very easy to make some adjustments and control just how much resource you want to dedicate to gathering statistics.

You first need to decide what priority to attribute to auto stats gathering depending on what other processes are likely to be running at the same time. In the following example, the auto stats job has a minimum of 5% CPU if other tasks running in the database are competing for CPU. However, if the system is not busy, we will allow the job to consume up to 80% of the CPU (this will protect processes that must run outside the control of the database). The maximum degree of parallelism an individual session can use is four in this case. It is useful to control the maximum DOP because you will want to make sure that you do not reach the maximum number of parallel server processes allowed for the system (this will become more relevant later in this post).

RM Plan

For completeness, the example above includes all plan directives for the DEFAULT_MAINTENANCE_PLAN, but it is only necessary to specify the plan directives you want to modify. By default, when the maintenance windows opens, it will activate the DEFAULT_MAINTENANCE_PLAN. If you prefer, you can create your own resource management plan and associate it with any maintenance windows of your choosing. If you also set the resource_management_plan initialization parameter, then you can use the same resource management plan when the batch windows are both open and closed. Here's an example:

Bespoke RM Plan

When AUTO_DEGREE is used the resource utilization can look very different . In this example, the tables are all identical so there's a very regular pattern:

CPU auto DOP

We are now using much more CPU, and consequently the job completes in only 12 minutes and 26 seconds (where, previously, it failed to complete within the 20-minute window):

Job status

Remember that database resource management (DBRM) is in force during the batch window, so it is very easy to adjust CPU utilization even while the job is running. For example - consider what happens when I adjust the utilization limit down from 80% to 40% and then back again:

RM CPU

Let's look at a more realistic AUTO_DEGREE scenario. In the following example we have a schema containing tables that have a wide variation in size. The CPU profile is now less consistent:

Auto DOP

The DOP is changing in response to the size of each individual table. The job runs serially at first (about 25% CPU), then DOP 2 for a while, then DOP 3 and then back to serial. We could micro-manage DOP on a table-by-table basis, but it is much better to avoid approaches like this because we should always aim to avoid too much manual intervention. The global AUTO_DEGREE solution will be good enough in many cases, so there will be no need for any further manual intervention.  

Option #3 - Gathering Statistics Concurrently - CONCURRENT

Parallel statistics gathering has enabled us to increase CPU utilization significantly, but what if we have spare machine resources and want to go even faster? In the previous example, the CPU could be more fully utilized. If you want to achieve that, then how do you go about it?

Firstly, disable parallel execution (we will come back to that later):

exec dbms_stats.set_global_prefs('DEGREE', 1)

The CONCURRENT preference allows DBMS_SCHEDULER to initiate multiple statistics gathering jobs at once, so that the database will gather statistics on multiple tables and partitions concurrently. We can choose to enable this behavior for auto stats gathering only:

exec dbms_stats.set_global_prefs('CONCURRENT','AUTOMATIC')

If you enable a database resource management plan, you can use concurrent for manual stats or for manual and auto:

exec dbms_stats.set_global_prefs('CONCURRENT','MANUAL')
exec dbms_stats.set_global_prefs('CONCURRENT','ALL')

The database will now gather statistics using multiple scheduler jobs. In my case, auto stats initiated 16 job processes and the CPU profile looked like this:

CPU Conc

I am using max_utilization_limit set to 80%, and the job completes in 11 minutes and 39 seconds:

Concurrent Results

Concurrent statistics gathering works very well if tables are of a similar size, but without parallel execution, serial jobs running on very large tables can take a long time and the might not complete before the batch window closes. If this is a problem for you, you can use a combination of concurrent processing and parallel execution.

Option #4 - Gathering Statistics Concurrently and in Parallel - CONCURRENT and AUTO_DEGREE

Care is required when implementing concurrency and parallel execution because there's scope to execute a very large number of concurrent parallel execution servers and generate a very high system load. Multiple jobs will start and each has the potential to initiate a number of parallel query servers. As a very general rule of thumb, you want to have no more than about 2*CPUCoreCount to 4*CPUCoreCount parallel servers executing at any one time.

You can mitigate the risk of initiating too many parallel execution servers as follows:

Concurrent and Auto DOP

There is currently no way to cap the number of job queue processes allocated to concurrent stats gathering, so 'turning down' the job_queue_processes setting is the only way to do this. I have created an enhancement request with respect to this limitation.

Enable concurrent stats gathering for the automatic statistics gathering job:

exec dbms_stats.set_global_prefs('CONCURRENT','AUTOMATIC')

Set AUTO_DEGREE globally:

exec dbms_stats.set_global_prefs('DEGREE', DBMS_STATS.AUTO_DEGREE)

Or, for individual large tables:

exec dbms_stats.set_table_prefs(user, 'BIG_TABLE', 'DEGREE', DBMS_STATS.AUTO_DEGREE)

On my system, the auto stats initiated 16 job processes and the resource management plan I used limited DOP to four. This resulted in 64 parallel execution servers executing concurrently, so a DOP limited to two might have been a better choice in my case. Here is the new CPU profile:

CPU Concurrent and Auto DOP

The job completed in 13 minutes 16 seconds:

Concurrent + Auto DOP results

In this case, why was there no benefit in run time using CONCURRENT and AUTO_DEGREE? It is because CONCURRENT without AUTO_DEGREE consumed CPU up to the 80% limit imposed by DBRM. In the more general case where there is a good mix of table sizes, some very large tables and a faster IO subsystem, then AUTO_DEGREE used in combination with CONCURRENT has the potential to yield the shortest gather stats times.

If you are in a position to be able to decrease job_queue_processes to limit the number of jobs that execute concurrently, then you will be able to increase the DOP limit to a higher values: 

Higher DOP

The same is true if there are only a small number of tables where parallelism is used.

Oracle Multitenant

At the time of writing there's a bug with parallel statistics gathering in a multitenant database when used with CONCURRENT (unpublished bug# 27249531). Parallel execution servers initiated by gather stats are not constrained by max_utilization_limit. This can result in high CPU consumption. Using DEGREE above 1 or AUTO_DEGREE is OK if CONCURRENT is not used.

Until a fix is available for your platform and version, the best solution is to use DEGREE=>1 if you want to use CONCURRENT in multitenant environments.

Manual Statistics Gathering

If you want to initiate stats gathering manually, and still make full use of parallel and concurrent settings, then you can use the following approach:

Manual Example

Performance Comparisons

A small test system was used for the examples above, so it will be useful to see what an enterprise-class system looks like (let's say 72 cores with HT). The Oracle Real World Performance Group ran some tests to check out the different techniques.

The relative performance of the stats gathering methods will be different on every system you try, so treat this as entertainment rather than science. For example, the test tables were all large and all the same size, so this will work in favor of AUTO_DEGREE (used without CONCURRENT) because a high degree of parallism was used for every table.

A large number of CPU cores will make the default method look exceptionally underpowered. In this case the CPU is only 1.5% utilized; a single core in an enterprise-class system:

Performance Test Results

Here are the results plotted:

Plots

Summary

Remember that you will need spare machine capacity to gain benefit from the techniques outlined in this blog post.

Generally speaking, option #2 is most likely to give you a quick and easy win if there are a number of very large tables. Option #3 is great if you have plenty of spare machine resource and a large number of smaller tables.

Option #4 requires more care to avoid initiating too many parallel execution servers.

Options #3 and #4 are particularly useful if you need to get a one-off stats gathering task done very quickly: perhaps when you are commissioning a new deployment or gathering statistics after an upgrade.

Here's a high-level summary:

Summary

I've uploaded self-contained test scripts to GitHub.

Comments welcome!

https://blogs.oracle.com/optimizer/how-to-gather-optimizer-statistics-fast

Database Options/Management Packs Usage Reporting for Oracle Databases 11.2 and later (Doc ID 1317265.1)

  Database Options/Management Packs Usage Report You can determine whether an option is currently in use in a database by running options_pa...