Monday, December 2, 2019

Oracle Exadata Database Machine Setup/Configuration Best Practices (Doc ID 1274318.1)

Skip to content
Copyright (c) 2019, Oracle. All rights reserved. Oracle Confidential.


Oracle Database Backup Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Exadata Storage Server Software - Version to [Release 11.2 to 12.2]
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Information in this document applies to any platform.


The goal of this document is to present the best practices for the deployment of  Sun Oracle Database Machine  V2/X2-2/X2-8/X3-2/X3-8/X4-2/X4-8/X5-2  in the area of Setup and Configuration.


General audience working on Oracle Exadata X2-2/X2-8/X3-2/X3-8/X4-2/X4-8/X5-2/X6-2/X6-8/X7-2/X-8


Primary and standby databases should NOT reside on the same IB Fabric
Use hostname and domain name in lower case
Verify ILOM Power Up Configuration
Verify Hardware and Firmware on Database and Storage Servers
Verify InfiniBand Cable Connection Quality
Verify Ethernet Cable Connection Quality
Verify InfiniBand Fabric Topology (verify-topology)
Verify key InfiniBand fabric error counters are not present
Verify InfiniBand switch software version is 1.3.3-2 or higher
Verify InfiniBand subnet manager is running on an InfiniBand switch
Disable Infiniband subnet manager service where subnet manager master should never run
Verify key parameters in the InfiniBand switch /etc/opensm/opensm.conf file
Verify There Are No Memory (ECC) Errors
Verify celldisk configuration on disk drives
Verify celldisk configuration on flash memory devices
Verify there are no griddisks configured on flash memory devices
Verify griddisk count matches across all storage servers where a given prefix name exists
Verify griddisk ASM status
Verify that griddisks are distributed as expected across celldisks
Verify the percent of available celldisk space used by the griddisks
Verify Database Server ZFS RAID Configuration
Verify InfiniBand is the Private Network for Oracle Clusterware Communication
Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
Verify Oracle RAC Databases use RDS Protocol over InfiniBand Network.
Verify Database and ASM instances use same SPFILE
Verify Berkeley Database location for Cloned GI homes
Configure Storage Server alerts to be sent via email
Configure NTP and Timezone on the InfiniBand switches
Configure NTP slew_always settings as SMF property for Solaris
Verify NUMA Configuration
Enable Xeon Turbo Boost
Verify Exadata Smart Flash Log is Created
Verify Exadata Smart Flash Cache is Created
Verify Exadata Smart Flash Cache status is "normal"
Verify Master (Rack) Serial Number is Set
Verify Management Network Interface (eth0) is on a Separate Subnet
Verify RAID disk controller CacheVault capacitor condition
Verify RAID Disk Controller Battery Condition
Verify Ambient Air Temperature
Verify operating system hugepages count satisfies total SGA requirements
Verify MaxStartups 100 in /etc/ssh/sshd_config on all database servers
Verify all datafiles have AUTOEXTEND attribute ON
Verify all BIGFILE tablespaces have non-default MAXBYTES values set
Ensure Temporary Tablespace is correctly defined
Enable portmap service if app requires it
Enable proper services on database nodes to use NFS
Be Careful when Combining the InfiniBand Network across Clusters and Database Machines
Set fast_start_mttr_target=300 to optimize run time performance of writes
Enable auditd on database servers
Verify AUD$ and FGA_LOG$ tables use Automatic Segment Space Management
Use dbca templates provided for current best practices
Updating database node OEL packages to match the cell
Disable cell level flash caching for grid disks that don't need it when using Write Back Flash Cache
Gather system statistics in Exadata mode if needed
Verify Hidden Database Initialization Parameter Usage
Verify BDB location for Cloned GI homes
Verify Shared Servers do not perform serial full table scans
Verify Write Back Flash Cache minimum version requirements
Verify bundle patch version installed matches bundle patch version registered in database
Verify database server file systems have "Maximum mount count" = "-1"
Verify database server file system have "Check interval" = "0"
Verify Automated Service Request (ASR) configuration
Verify ZFS File System User and Group Quotas are configured

Verify the file /.updfrm_exact does not exist
Verify the vm.min_free_kbytes configurationValidate key sysctl.conf parameters on database servers
Remove "fix_control=32" from dbfs mount options
Set Linux kernel log buffer size to 1MB
Verify IP routing configuration on DB nodes
Verify there are no .fuse_hidden files under the dbfs mount
Verify that the SDP over IB option "sdp_apm_enable(d)" is set to "0"
Verify /etc/oratab
Verify consistent software and configuration across nodes
Verify all database and storage servers time server configuration
Verify Sar files have read permissions for non-root user
Verify that the patch for bug 16618055 is applied
Verify the Name Service Cache Daemon (NSCD) is Running
Verify kernels and initrd in /boot/grub/grub.conf are available on the system
Verify basic Logical Volume(LVM) system devices configuration
Ensure db_unique_name is unique across the enterprise
Verify average ping times to DNS nameserver
Verify Running-config and Startup-config are the same on the Cisco switch
Validate SSH is installed and configured on Cisco management switch
Verify Database Memory Allocation is not Greater than Physical Memory Installed on Database node
Verify Cluster Verification Utility(CVU) Output Directory Contents Consume < 500MB of Disk Space
Verify active system values match those defined in configuration file "cell.conf"
Verify that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED"
Verify TCP Segmentation Offload (TSO) is set to off
Check alerthistory for stateful alerts not cleared
Check alerthistory for non-test open stateless alerts
Verify clusterware state is "Normal"
Verify the grid Infrastructure management database (MGMTDB) does not use hugepages
Verify the "localhost" alias is pingable
Verify bundle patch version installed matches bundle patch version registered in database

Verify database is not in DST upgrade state
Verify there are no failed diskgroup rebalance operations
Verify the CRS_HOME is properly locked
Verify storage server data (non-system) disks have no partitions
Verify db_unique_name is used in I/O Resource Management (IORM) interdatabase plans
Verify Datafiles are Placed on Diskgroups consisting of griddisks with cachingPolicy = DEFAULT
Verify all datafiles are placed on griddisks that are cached on flash disks
Validate key sysctl.conf parameters on database servers
Detect duplicate files in /etc/*init* directories
Verify Database Server Quorum Disks configuration
Verify Oracle Clusterware files are placed appropriately
Verify "_reconnect_to_cell_attempts=9" on database servers which access X6 storage servers
Verify passwordless SSH connectivity for Enterpise Manager (EM) agent owner userid to target component userids
Check /EXAVMIMAGES on dom0s for possible over allocation by sparse files
Verify active kernel version matches expected version for installed Exadata Image
Verify Storage Server user "CELLDIAG" exists
Verify installed rpm(s) kernel type match the active kernel version
Verify Flex ASM Cardinality is set to "ALL"
Verify "downdelay" is set correctly for bonded client interfaces
Verify ExaWatcher is executing
Verify non-Default services are created for all Pluggable Databases
Verify Grid Infrastructure Management Database (MGMTDB) configuration
Verify Automatic Storage Management Cluster File System (ACFS) file systems do not contain critical database files
Verify the ownership and permissions of the "oradism" file
Verify the SYSTEM, SYSAUX, USERS and TEMP tablespaces are of type bigfile
Verify the storage servers in use configuration matches across the cluster
Verify "asm_power_limit" is greater than zero
Verify the recommended patches for Adaptive features are installed
Verify initialization parameter cluster_database_instances is at the default value
Verify the database server NVME device configuration
Verify that Automatic Storage Management Cluster File System (ACFS) uses 4K metadata block size
Evaluate Automated Maintenance Tasks configuration
Verify proper ACFS drivers are installed for Spectre v2 mitigation
Verify Exafusion Memory Lock Configuration
Verify there are no unhealthy InfiniBand switch sensors
Refer to MOS 1682501.1 if non-Exadata components are in use on the InfiniBand fabric
Verify the ib_sdp module is not loaded into the kernel
Verify all voting disks are online
Verify available ksplice fixes are installed
Archived Best Practices
Revision History

Primary and standby databases should NOT reside on the same IB Fabric

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8,X4-2Linux11.2.x +11.2.x +

Benefit / Impact:
To properly protect the primary databases residing on the "primary" Exadata Database Machine, the physical standby database requires fault isolation from IB switch maintenance issues,
IB switch failures and software issues, RDS bugs and timeouts or any issue resulting from a complete IB fabric failure. To protect the standby from these failures that impact the primary's
availability, we highly recommend that at least one viable standby database resides on a separate IB fabric.

If the primary and standby resides on the same IB fabric, both primary and standby systems can be unavailable due a bug causing an IB fabric failure.
Action / Repair:
The primary and at least one viable standby database must not reside on the same inter-racked Exadata Database Machine. The communication between the primary and standby
Exadata Database Machines must use GigE or 10GigE. The trade-off is lower network bandwidth. The higher network bandwidth is desirable for standby database instantiation
(should only be done first time) but that requirement is eliminated for post-failover operations when flashback database is enabled.

Use hostname and domain name in lower case

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux, Solari11.2.x +11.2.x +

Benefit / Impact:
Using lowercase will avoid known deployment time issues.
OneCommand deployment will fail in step 16 if this is not done. This will abort the installation with:
"ERROR: unable to locate file to check for string 'Configure Oracle Grid Infrastructure for a Cluster ... succeeded' #Step 16#"
Action / Repair:
As a best practice, user lower case for hostnames and domain names

Verify ILOM Power Up Configuration

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL11/11/12<Name>ProductionExadata, SSC14281920- exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8,X4- - 11
Linux x86-64 UEK5.8
exachk 2.2.2 
Benefit / Impact:
Verifying the ILOM power up configuration helps to ensure that a server (or more) are booted up after a power interruption as quickly as possible.
Not verifying the ILOM power up configuration may result in unexpected server boot behavior after a power interruption.
Action / Repair:
To verify the ILOM power up configuration, as the root userid enter the following command on each database and storage server:
if [ -x /usr/bin/ipmitool ]
ipmitool sunoem cli force "show /SP/policy" | grep -i power
/opt/ipmitool/bin/ipmitool sunoem cli force "show /SP/policy" | grep -i power
The output varies by Exadata software version and should be similar to:
Exadata software version or higher:
Exadata software version or lower:
If the output is not as expected, as the root userid use the ipmitool "set /SP/policy" command. For example:
# ipmitool sunoem cli force "set /SP/policy HOST_AUTO_POWER_ON=enabled"
Connected. Use ^D to exit.
-> set /SP/policy HOST_AUTO_POWER_ON=enabled
Set 'HOST_AUTO_POWER_ON' to 'enabled'
-> Session closed

 Verify Hardware and Firmware on Database and Storage Servers

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
The Oracle Exadata Database Machine is tightly integrated, and verifying the hardware and firmware before the Oracle Exadata Database Machine is placed into or returned to
production status can avoid problems related to the hardware or firmware modifications.
The impact for these verification steps is minimal.
If the hardware and firmware are not validated, inconsistencies between database and storage servers can lead to problems and outages.
Action / Repair:
To verify the hardware and firmware configuration for a database server, execute the following command as the "root" userid:
The output will contain a line similar to the following:
[SUCCESS] The hardware and firmware profile matches one of the supported profile 
If any result other than "SUCCESS" is returned, investigate and correct the condition.
To verify the hardware and firmware configuration for a storage server, execute the following "cellcli" command as the "cellmonitor" userid:
CellCLI> alter cell validate configuration
The output will be similar to:
Cell <cell> successfully altered 
If any result other than "successfully altered" is returned, investigate and correct the condition.
NOTE: CheckHWnFWProfile is also executed at each boot of a database server.

NOTE: "alter cell validate configuration" is also executed once a day on a storage server by the MS process and the result is written into the storage server alert history.

Verify InfiniBand Cable Connection Quality

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8,X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
InfiniBand cables require proper connections for optimal efficiency. Verifying the InfiniBand cable connection quality helps to ensure that the InfiniBand network operates at optimal efficiency.
There is minimal impact to verify InfiniBand cable connection quality.
InfiniBand cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command on all database and storage servers:
for ib_cable in `ls /sys/class/net | grep ^ib`; do printf "$ib_cable: "; cat /sys/class/net/$ib_cable/carrier; done 
The output should look similar to:
ib0: 1 
ib1: 1 
If anything other than "1" is reported, investigate that cable connection


Execute the following command as the "root" userid on all database and storage servers:

for ib_cable in `ls /sys/class/net | grep ^ib`; do printf "$ib_cable: "; cat /sys/class/net/$ib_cable/carrier; done 
The output should look similar to:
ib0: 1 ib1: 1 
If anything other than "1" is reported, investigate that cable connection.


Execute the following command as the "root" userid on all database servers:
dladm show-ib | grep -v LINK | sed -e 's/ */ /g' -e 's/ *//' | awk '{print $1":", $5}'| sort 
The output should be similar to:
ib0: up
ib1: up 
If anything other than "up" is reported, investigate that cable connection.
NOTE: Storage servers should report 2 connections. X2-2(4170) and X2-2 database servers should report 2 connections. X2-8 database servers should report 8 connections.

Verify Ethernet Cable Connection Quality

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8,X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
Ethernet cables require proper connections for optimal efficiency. Verifying the Ethernet cable connection quality helps to ensure that the Ethernet network operates at optimal efficiency.
There is minimal impact to verify Ethernet cable connection quality.
Ethernet cables that are not properly connected may negotiate to a lower speed, work intermittently, or fail.
Action / Repair:
Execute the following command as the root userid on all database and storage servers:
for cable in `ls /sys/class/net | grep ^eth`; do printf "$cable: "; cat /sys/class/net/$cable/carrier; done 
The output should look similar to:
eth0: 1
eth1: cat: /sys/class/net/eth1/carrier: Invalid argument
eth2: cat: /sys/class/net/eth2/carrier: Invalid argument
eth3: cat: /sys/class/net/eth3/carrier: Invalid argument
eth4: 1
eth5: 1 
"Invalid argument" usually indicates the device has not been configured and is not in use. If a device reports "0", investigate that cable connection.
NOTE: Within machine types, the output of this command will vary by customer depending on how the customer chooses to configure the available ethernet cards.

Verify the InfiniBand Fabric Topology (verify-topology)

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical WARN 09/05/18 <Name> Production Exadata - Physical,
Exadata - Management Domain,
ALL 20144798 - exachk 
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/A N/A N/A N/A ALL Linux exachk 18.4.0 N/A 
Benefit / Impact:
Verifying that the InfiniBand network is configured with the correct topology for an Oracle Exadata Database Machine helps to ensure that the InfiniBand network operates at maximum efficiency.
An incorrect InfiniBand topology will cause the InfiniBand network to operate at degraded efficiency, intermittently, or fail to operate.
Action / Repair:
To verify the InfiniBand Fabric Topology, execute the following code set as the "root" userid on one database server in the Exadata environment:
VT_ERRORS=$(echo "$VT_OUTPUT" | egrep ERROR)
if [ -n "$VT_ERRORS" ]
  echo -e "FAILURE: verify-topology returned one or more errors (and perhaps warnings).\nDETAILS:\n$VT_OUTPUT"
elif [ -n "$VT_WARNINGS" ]
  echo -e "WARNING: verify-topology returned one or more warnings.\nDETAILS:\n$VT_OUTPUT"
  echo -e "SUCCESS: verify-topology returned no errors or warnings."
The expected output is:
SUCCESS: verify-topology returned no errors or warnings.
An example of a "FAILURE:" message:
FAILURE: verify-topology returned one or more errors (and perhaps warnings).
   [ DB Machine Infiniband Cabling Topology Verification Tool. ]
Every node is connected to two leaf switches in a single rack.......................................................[FAILED]
Node randomcel06 (Guid: 21280001f00464 ) is connected to just one leaf switch randomsw-ib2(Guid: 2128f57723a0a0 )
Error found in following rack
<output truncated>
An example of a "WARNING:" message:
WARNING: verify-topology returned one or more warnings.

   [ DB Machine Infiniband Cabling Topology Verification Tool ]
        [Version IBD VER 2.b ]

[WARNING] - Non-Exadata nodes detected! Please ensure this is OK
Approximating classification into cells and db hosts

Software UPGRADE required for the tool to be accurate

Looking at 1 rack(s).....
<output truncated>
If anything other than "SUCCESS:" is reported, investigate and correct the underlying fault(s).

Verify key InfiniBand fabric error counters are not present

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalWARN09/28/16<Name>ProductionExadata-Management Domain, Exadata-Physical, SSC, Exalogic 
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X6-2, X6-811.2.x+Linux x86-64
Solaris - 11

Benefit / Impact:

Verifying key InfiniBand fabric error counters are not present helps to maintain the InfiniBand fabric at peak efficiency.
The impact of verifying key InfiniBand fabric error counters are not present is minimal. The impact of correcting key InfiniBand fabric error counters varies depending upon the root cause of the specific error counter present, and cannot be estimated here.

If key InfiniBand fabric error counters are present, the fabric may be running in degraded condition or lack redundancy.
NOTE: Uncorrected symbol errors increase the risk of node evictions and application outages.

Action / Repair:
To verify key InfiniBand fabric error counters are not present, execute the following command set as the "root" userid on one database server:
NOTE: This will not work in the user domain of a virtualized environment.
if [[ -d /proc/xen && ! -f /proc/xen/capabilities ]]
     echo -e "\nThis check will not run in a user domain of a virtualized environment. Execute this check in the management domain.\n"
    RAW_DATA=$(ibqueryerrors | egrep 'SymbolError|LinkDowned|RcvErrors|RcvRemotePhys|LinkIntegrityErrors');
    CRITICAL_DATA=$(echo "$RAW_DATA" | egrep 'SymbolError|RcvErrors');
    WARNING_DATA=$(echo "$RAW_DATA" | egrep -v 'SymbolError|RcvErrors');
    if [ -z "$RAW_DATA" ]
          echo -e "SUCCESS: Key InfiniBand fabric error counters were not found"
    if [ 'echo "$RAW_DATA" | egrep 'SymbolError|RcvErrors' | wc -l' -gt 0 ]
        echo -e "FAILURE: receive errors or symbol errors or both were found:\n\nCounters found:\n"
       echo -e "WARNING: Key InfiniBand fabric error counters were found\n\nCounters Found:\n";

The expected output should be:
SUCCESS: Key InfiniBand fabric error counters were not found

- OR -

This check will not run in a user domain of a virtualized environment. Execute this check in the management domain.

Example of a FAILURE result:
FAILURE: receive errors or symbol errors or both were found:
Counters found:
GUID 0x10e00001451161 port 1: [SymbolErrorCounter == 1367] [PortRcvErrors == 1367]
GUID 0x10e08027b8a0a0 port ALL: [SymbolErrorCounter == 54679] [LinkErrorRecoveryCounter == 76]
<output truncated>
GUID 0x21280001fca219 port 1: [LinkDownedCounter == 1]
GUID 0x21280001fca21a port 2: [LinkDownedCounter == 1]
<output truncated>
Example of a WARNING result:
WARNING: Key InfiniBand fabric error counters were found

GUID 0x10e00001886289 port 1: [LinkDownedCounter == 1] [PortXmitDiscards == 272] [PortXmitWait == 2021116]
GUID 0x10e0802617a0a0 port ALL: [LinkErrorRecoveryCounter == 63]
GUID 0x10e0802617a0a0 port 1: [LinkErrorRecoveryCounter == 10]
GUID 0x10e0802617a0a0 port 2: [LinkErrorRecoveryCounter == 11]
<output truncated>
In general, if the output is not "SUCCESS...", follow the diagnostic guidance in the following documents:
Special Notes on Symbol errors:

Symbol errors create a much higher risk of node evictions if the error rate is too high. On the InfiniBand switches, there is a mechanism that will automatically down a port if the error rate becomes too high. On the database and storage servers, there is no such mechanism at this time, so it is recommended to examine the Symbol error rate manually, using ExaWatcher data.

NOTE: In the following example, all data pertaining to InfiniBand switches has been filtered out for brevity.

As the "root" userid, the following example demonstrates how to examine the Symbol error rate using ExaWatcher.

1) From the manual output, make note of the GUIDs with SymbolErrorCounter present:

FAILURE: receive errors or symbol errors or both were found:
  Counters found:   
<output truncated>   
GUID 0x10e00001451161 port 1: [SymbolErrorCounter == 1123] [PortRcvErrors == 1123] [PortXmitWait == 230121020]   
<output turncated>

2) Use the following command to identify the server with the symbol errors present:

[root@randomadm01 ~]# ibqueryerrors -G 0x10e00001451161 | head -1 Errors for "randomadm02 S, HCA-4"

3) Log onto the database server identified in the command above, randomadm02.

4) Change to the ExaWatcher directory for IB hca information (the default is in use here):

# cd /opt/oracle.ExaWatcher/archive/IBCardInfo.ExaWatcher

5) Using the port identification provided in 1), use the following output to condense (removes "0" entries) all relevant available ExaWatcher data:

[root@randomadm02 IBCardInfo.ExaWatcher]# cat <(bzcat *.bz2) <(cat *.dat) | egrep "port 1" -A23 | egrep SymbolError | grep -v '0[[:blank:][:cntrl:]]*$' | sort -k1.2,10 -k2.1,8
<output truncated>            
[09/13/2016 02:38:18] SymbolErrorCounter           999                    1
[09/13/2016 02:43:20] SymbolErrorCounter           1030                   31             
 <output truncated>           
[09/13/2016 17:28:56] SymbolErrorCounter           1062                   1
[09/13/2016 17:39:00] SymbolErrorCounter           1085                   23
[09/13/2016 17:59:10] SymbolErrorCounter           1100                   5
<output truncated>

6) Calculate the symbol error rate per minute. By default, ExaWatcher data intervals are 5 minutes, but that can be changed. Using these two lines:

[09/13/2016 17:28:56] SymbolErrorCounter           1062                   1               
[09/13/2016 17:39:00] SymbolErrorCounter           1085                   23 

The delta between 17:28 and 17:39 is "23". The time interval is 10 minutes, so 23 / 10 is 2.3 symbol errors per minute.

NOTE ESPECIALLY!! If the symbol error rate is consistently greater than 2 per minute, investigate for root cause and take corrective action!

NOTE: The InfiniBand fabric error counters should be cleared and validated after any maintenance activity.

NOTE: The InfiniBand fabric error counters are cumulative and the errors may have occurred at any time in the past. This check is the result at one point in time, and cannot advise anything about history or an error rate.

NOTE: This check should not be considered complete validation of the InfiniBand fabric. Even if this check indicates success, there may still be issues on the InfiniBand fabric caused by other, more rare Infiinband fabric error counters being present. If there are or appear to be issues with the InfiniBand fabric while this check passes, perform a full evaluation of the "ibqueryerrors" command output and the output of other commands such as "ibdiagnet".

NOTE: Depending upon the Exadata version, the key InfiniBand fabric error counters have different names. In the following list, the older version of the counter name is shown in square brackets.

Key Infiniband fabric error counters list:
SymbolErrorCounter [SymbolErrors]
LinkErrorRecoveryCounter [LinkRecovers]
LinkDownedCounter [LinkDowned]
PortRcvErrors [RcvErrors]
PortRcvRemotePhysicalErrors [RcvRemotePhysErrors]
LocalLinkIntegrityErrors [LinkIntegrityErrors]
NOTE: Some Infiinband fabric error counters (for example, "SymbolErrorCounter [SymbolErrors]","PortRcvErrors [RcvErrors]") can increment when nodes are rebooted. Small values for these Infiinband fabric error counters which are less than the "LinkDownedCounter [LinkDowned]" counters are generally not a problem. The "LinkDownedCounter [LinkDowned]" counters indicate the number of times the port has gone down (usually for valid reasons, such as a node reboot) and are not typically an error indicator by themselves.

NOTE: Links reporting high, persistent error rates (especially "SymbolErrorCounter [SymbolErrors]", "LinkErrorRecoveryCounter [LinkRecovers]", "PortRcvErrors [RcvErrors]", "LocalLinkIntegrityErrors [LinkIntegrityErrors]") often indicate a bad or loose cable or port issues.


Verify InfiniBand switch software version is 1.3.3-2 or higher

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical11/01/11X2-2(4170), X2-2, X2-8,X4-2Linux, [WIP:VW]Solaris11.2.x +11.2.x +

Benefit / Impact:
The Impact of verifying that the InfiniBand switch software is at version 1.3.3-2 or higher is minimal. The impact of upgrading the InfiniBand switch(s) to 1.3.3-2 varies depending upon the upgrade method
chosen and your current InfiniBand switch software level.
InfiniBand switch software version 1.3.3-2 fixes several potential InfiniBand fabric stability issues. Remaining on an InfiniBand switch software version below 1.3.3-2 raises the risk of experiencing a potential outage.
Action / Repair:
To verify the InfiniBand switch software version, log onto the InfiniBand switch and execute the following command as the "root" userid:
version | head -1 | cut -d" " -f5
The output should be similar to:
If the output is not 1.3.3-2 or higher, upgrade the InfiniBand switch software to at least version 1.3.3-2.
NOTE: Patch 12373676 provides InfiniBand software version 1.3.3-2 and instructions.
NOTE: Upgrading to 1.3.3-2 may be performed as a rolling upgrade without an outage. The InfiniBand switch software is not dependent upon any other components in the Oracle Exadata Database Machine
and may be upgraded at any time.
NOTE: If your InfiniBand switch is at software version 1.0.1-1, it will need to first be upgraded to 1.1.3-1 or 1.1.3-2 before it can be upgraded to 1.3.3-2. The InfiniBand switch software cannot be upgraded
directly from 1.0.1-1 to 1.3.3-2.

Verify the Master Subnet Manager is running on an InfiniBand switch

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL11/28/18<Name>DevelopmentExadata - Physical,
Exadata - Managment Domain
ALL28862740 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.4.0N/A
Benefit / Impact:
Having the Master Subnet Manager reside in the correct location improves the stability, availability and performance of the InifiniBand fabric. The Impact of verifying the Master Subnet Manager is running on an InfiniBand switch is minimal. The impact of moving the Master Subnet Manager varies depending upon where it is currently executing and to where it will be relocated.
If the Master Subnet Manager is not running on an InfiniBand switch, the InfiniBand fabric may crash during certain fabric management transitions.
Action / Repair:
To verify the Master Subnet Manager is located on an InfiniBand switch, execute the following command set as the "root" userid on a database server:
SUBNET_MGR_MSTR_GID=$(echo "$SUBNET_MGR_MSTR_OUTPUT" | cut -d" " -f7 | cut -c3-16)
for IB_NODE_GID in $(echo "$IBSWITCHES_OUTPUT" | cut -c14-27)
  echo -e "SUCCESS: the Master Subnet Manager is executing on InfiniBand switch:\n$(echo "$SUBNET_MGR_MSTR_LOC_SWITCH")"
  echo -e "FAILURE: the Master Subnet Manager does not appear to be executing on an InfiniBand switch:\n$(echo "$SUBNET_MGR_MSTR_OUTPUT")"

The output should be similar to:

SUCCESS: the Master Subnet Manager is executing on InfiniBand switch:
Switch  : 0x002128469b03a0a9 ports 36 "SUN DCS 36P QDR randomsw-iba0 <IP>" enhanced port 0 lid 1 lmc 0

Example of a "FAILURE" result:

FAILURE: the Master Subnet Manager does not appear to be executing on an InfiniBand switch:
sminfo: sm lid 3 sm guid 0x10e0cdce81a0a9, activity count 3362634 priority 8 state 3 SMINFO_MASTER

If the result is "FAILURE", investigate the guid provided, relocate the Master Subnet Manager to a correct InfiniBand switch, and prevent the Subnet Manager from starting on the component where the Master Subnet Manager was found to be executing.

  1. The InfiniBand network can have more than one Subnet Manager, but only one Subnet Manager is active at a time. The active Subnet Manager is the Master Subnet Manager. The other Subnet Managers are the Standby Subnet Managers. If a Master Subnet Manager is shut down or fails, then a Standby Subnet Manager automatically becomes the Master Subnet Manager.
  2. There are typically several Standby Subnet Managers waiting to take over should the current Master Subnet Manager either fail or is manually moved to some other component with an available Standby Subnet Manager. Only run Subnet Managers on the InfiniBand switches specified for use in Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, Oracle Big Data Appliance, and Oracle SuperCluster. Running Subnet Manager on any other device is not supported.
  3. For pure multirack Exadata deployments with less than 4 racks, the Subnet Manager should run on all spine and leaf InfiniBand switches. For deployments with 4 or more Exadata racks, the Subnet Manager should run only on spine InfiniBand switches. For additional configuration information, please see section "4.6.7 Understanding the Network Subnet Manager Master" of the "Exadata Database Machine Maintenance Guide".
  4. For InfiniBand fabric configurations that involve a mix of different Oracle Engineered Systems, please refer to: MOS note 1682501.1
  5. Moving the Master Subnet Manager is sometimes required during maintenance and patching operations. For additional guidance on maintaining the Master Subnet Manager, please see section "4.6 Maintaining the InfiniBand Network" of the "Exadata Database Machine Maintenance Guide".


  Verify the Subnet Manager is properly disabled

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL11/28/18<Name>DevelopmentExadata - Physical,
Exadata - Managment Domain
ALL28768896- exachk
14534296- exachk
16270663- exachk
16795289- exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.4.0N/A
Benefit / Impact:
NOTE: The Subnet Manager should only execute on InfiniBand switches. It should be disabled on any other component attached to an InfiniBand fabric.

Having the Subnet Manager executing in the correct locations improves the stability, availability and performance of the InifiniBand fabric. The Impact of verifying the Subnet Manager is disabled on components where the Master Subnet Manager should never reside is minimal. The impact of disabling the Subnet Manager varies depending upon the component type where it is found to be incorrectly executing, and whether or not the Master Subnet Manager is incorrectly executing on that component.

Unexpected behavior, such as connectivity or performance loss, can occur if the Subnet Manager is executing on an unexpected component in the InfiniBand fabric.
Action / Repair:
To Verify the Subnet Manager is disabled on components where the Master Subnet Manager should never reside, execute the following command set as the "root" userid on all database and storage servers:
COMMAND_OUTPUT=$(ps -ef | grep -i [o]pensm)
if [ -n "$COMMAND_OUTPUT" ]
  echo -e "FAILURE: the Subnet Manager is executing.\nDETAILS:\n$COMMAND_OUTPUT"
  echo -e "SUCCESS: the Subnet Manager is not executing."

The expected output is:

SUCCESS: the Subnet Manager is not executing.

Example of a "FAILURE" output:

FAILURE: the Subnet Manager is executing.
root      2627     1  0 Mar24 ?        12:14:31 /usr/sbin/opensm --daemon

If the result is "FAILURE", investigate why the Subnet Manager is executing, relocate the Master Subnet Manager if necessary, and prevent the Subnet Manager from starting in the future.

  1. The command set provided is for Oracle Exadata Database Machines only. If there are non-Exadata components residing on the InifiniBand fabric (e.g., a media server), refer to the provided documentation for that component.
  2. There are typically several Standby Subnet Managers waiting to take over should the current Master Subnet Manager either fail or is manually moved to some other component with an available Standby Subnet Manager. Only run Subnet Managers on the InfiniBand switches specified for use in Oracle Exadata Database Machine, Oracle Exalogic Elastic Cloud, Oracle Big Data Appliance, and Oracle SuperCluster. Running Subnet Manager on any other device is not supported.
  3. For pure multirack Exadata deployments with less than 4 racks, the Subnet Manager should run on all spine and leaf InfiniBand switches. For deployments with 4 or more Exadata racks, the Subnet Manager should run only on spine InfiniBand switches. For additional configuration information, please see section "4.6.7 Understanding the Network Subnet Manager Master" of the "Exadata Database Machine Maintenance Guide".
  4. For InfiniBand fabric configurations that involve a mix of different Oracle Engineered Systems, please refer to: MOS note 1682501.1
  5. Moving the Master Subnet Manager is sometimes required during maintenance and patching operations. For additional guidance on maintaining the Master Subnet Manager, please see section "4.6 Maintaining the InfiniBand Network" of the "Exadata Database Machine Maintenance Guide".

Verify There Are No Memory (ECC) Errors

Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain,
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X3-2, X3-8,
X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Solaris - 11
Linux x86-64
exachk 2.2.4
Benefit / Impact:
Memory modules that have corrected Memory Errors (ECC) can show degraded performance, IPMI driver timeouts, and BMC error messages in /var/log/messages file.
Correcting the condition restores optimal performance.
The impact of checking for memory ECC errors is slight. Correction will likely require node downtime for hardware diagnostics or repair.
If not corrected, the faulty memory will lead to performance degradation and other errors.
Action / Repair:
To verify there are no memory (ECC) errors, run the following commands as the "root" userid on all database and storage servers:
if [ -x /usr/bin/ipmitool ]
ECC_OUTPUT=$($IPMI_COMMAND sel list | grep Memory | grep ECC)
if [ -z "$ECC_OUTPUT" ]
  echo -e "SUCCESS: No memory ECC errors were found.\nECC list:\n\n$ECC_OUTPUT"
  echo -e "FAILURE: Memory ECC errors were found.\nECC list:\n\n$ECC_OUTPUT"

The expected output should be:
SUCCESS: No memory ECC errors were found. ECC list:
Example of a FAILURE result:
FAILURE: Memory ECC errors were found. ECC list:  24f | 09/16/2016 | 09:32:59 | Memory #0x53 | Correctable ECC | Asserted
If any errors are reported, take the following corrective actions in order:
1) Reseat the DIMM.
2) Open an SR for hardware replacement.

Verify celldisk configuration on disk drives

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical12/06/11X2-2(4170), X2-2, X2-8,X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact:
The definition and maintenance of storage server celldisks is critical for optimal performance and outage avoidance.
The impact of verifying the basic storage server celldisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.
If the basic storage server celldisk configuration is not verified, poor performance or unexpected outages may occur.
Action / Repair:
To verify the basic storage server celldisk configuration on disk drives, execute the following command as the "celladmin" user on each storage server:
cellcli -e "list celldisk where disktype=harddisk and status=normal" | wc -l

The output should be:


If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, there should be 12 celldisks on disk drives with a status of "normal".

Verify celldisk configuration on flash memory devices

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFAIL11/15/2017<Name>ProductionExadata27119016 - exachk
24514400 - exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AX2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-2, X7-811.2+Linux x86-64exachk 

Benefit / Impact:
The definition and maintenance of storage server celldisks is critical for optimal performance and outage avoidance. The number of celldisks configured on flash memory devices varies by hardware version. Each celldisk configured on flash memory devices should have a status of "normal".
The impact of verifying the celldisk configuration on flash memory devices is minimal. The impact of correcting any anomalies is dependent upon the reason for the anomaly and cannot be estimated here.
If the celldisk configuration on flash memory devices is not verified, poor performance or unexpected outages may occur.
Action / Repair:
To verify the celldisk configuration on flash memory devices, execute the following command as the "root" userid on each storage server:
cellcli -e "list celldisk where disktype=flashdisk and status=normal" | wc -l
The output should be similar to the following and match one of the rows in the "Celldisk on Flash Memory Devices Mapping Table":
Celldisk on Flash Memory Devices Mapping Table

System DescriptionCommon NameDisk TypeNumber of Devices
X4275 X2-2(4170) MIXED 16 
X4270 M2 X2-2, X2-8 MIXED 16 
X4270 M3 X3-2, X3-8 MIXED 16 
X4-2L X4-2 MIXED 16 
X5-2L X5-2, X5-8 MIXED 
X5-2L X5-2, X5-8 FLASH 
X6-2L X6-2, X6-8 MIXED 
X6-2L X6-2, X6-8 FLASH 
X7-2L X7-2, X7-8 MIXED 
X7-2L X7-2, X7-8 FLASH 

If the output is not as expected, execute the following command as the "root" userid:
cellcli -e "list celldisk where disktype=flashdisk and status!=normal"
Perform your root cause analysis and corrective actions based upon the key words returned in the "status" field. For additional information, please reference the following:
The "Maintaining Flash Disks" section of "Oracle® Exadata Database Machine, Owner's Guide 11g Release 2 (11.2), E13874-24"
Troubleshooting guide for Sick or underperforming storage cell/Performance Issue (Doc ID 1348736.1)
Troubleshooting guide for Underperforming FlashDisks (Doc ID 1348938.1)

Verify there are no griddisks configured on flash memory devices

Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain,
BDA, Exalogic, Exalytics, SSC, ZDLRA
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64
Benefit / Impact:
The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.
If the storage server griddisk configuration is not verified, poor performance or unexpected outages may occur.
Action / Repair:
To verify there are no storage server griddisks configured on flash memory devices, execute the following command as the "celladmin" user on each storage server:
cellcli -e "list griddisk where disktype=flashdisk" | wc -l
The output should be:
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.
Experience has shown that the Oracle recommended Best Practice of using all available flash device space for Smart Flash Log and Smart Flash Cache provides the highest overall performance benefit with lowest maintenance overhead for an Oracle Exadata Database Machine.

In some very rare cases for certain highly write-intensive applications, there may be some performance benefit to configuring grid disks onto the flash devices for datafile writes only. With the release of the Smart Flash Log feature in, redo logs should never be placed on flash grid disks. Smart Flash Log leverages both hard disks and flash devices with intelligent caching to achieve the fastest possible redo write performance, optimizations which are lost if redo logs are simply placed on flash grid disks.

The space available to Smart Flash Cache and Smart Flash Log is reduced by the amount of space allocated to the grid disks deployed on flash devices. The usable space in the flash grid disk group is either half or one-third of the space allocated for grid disks on flash devices, depending on whether the flash grid disks are configured with ASM normal or high redundancy.

If after thorough performance and recovery testing, a customer chooses to deploy grid disks on flash devices, it would be a supported, but not Best Practice, configuration.

Verify griddisk count matches across all storage servers where a given prefix name exists

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical12/06/11X2-2(4170), X2-2, X2-8,X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact:
The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the basic storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.
If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.
Action / Repair:
To verify the storage server griddisk count matches across all storage server where a given prefix name exists, execute the following command as the "root" userid on the database server from which the
onecommand script was executed during initial deployment:

for GD_PREFIX in `dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e list griddisk attributes name" | cut -d" " -f2 | gawk -F "_CD_" '{print $1}' | sort -u`;
GD_PREFIX_RESULT=`dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e list griddisk where name like \'$GD_PREFIX\_.*\' | wc -l" | cut -d" " -f2 | sort -u | wc -l`;
if [ $GD_PREFIX_RESULT = 1 ]
dcli -g /opt/oracle.SupportTools/onecommand/cell_group -l root "cellcli -e list griddisk where name like \'$GD_PREFIX\_.*\' | wc -l";

The output should be similar to:


If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, the total number of griddisks per storage server for a given prefix name (e.g: DATA) should match across all storage servers
where the given prefix name exists.

NOTE: Not all storage servers are required to have all prefix names in use. This is possible where for security reasons a customer has segregated the storage servers, is using a data lifecycle management methodology,
or an Oracle Storage Expansion Rack is in use. For example, when an Oracle Storage Expansion Rack is in use for data lifecycle management, those storage servers will likely have griddisks with unique names that
differ from the griddisk names used on the storage servers that contain real time data, yet all griddisks are visible to the same cluster.

NOTE: This command requires that SSH equivalence exists for the "root" userid from the database server upon which it is executed to all storage servers in use by the cluster.

Verify griddisk ASM status

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical12/06/11X2-2(4170), X2-2, X2-8,X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact:
The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.
If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.
Action / Repair:
To verify the storage server griddisk ASM status, execute the following command as the "celladmin" user on each storage server:
ASM_STAT_RESLT=`cellcli -e "list griddisk attributes name,status, asmmodestatus,asmdeactivationoutcome" | egrep -v ".*\<active\>.*\<ONLINE\>.*\<Yes\>" | wc -l`;
if [ $ASM_STAT_RESLT = 0 ]
echo -e "\nSUCCESS\n"
echo -e "\nFAILURE:";
cellcli -e "list griddisk attributes name,status, asmmodestatus,asmdeactivationoutcome" | egrep -v ".*\<active\>.*\<ONLINE\>.*\<Yes\>";
echo -e "\n";

The output should be:

If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server configured according to Oracle best practices, all griddisks should have "status" of "active", "asmmodestatus" of "online" and "asmdeactivationoutcome" of "yes".

Verify that griddisks are distributed as expected across celldisks

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System PlatformBug(s)
CriticalFAIL10/11/17<Name>ProductionExadata - Physical,
Exadata - Management Domain
ALLBug 26651266 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section

Benefit / Impact:
The definition and maintenance of storage server griddisks is critical for optimal performance and outage avoidance.
The impact of verifying the storage server griddisk configuration is minimal. Correcting any abnormalities is dependent upon the reason for the anomaly, so the impact cannot be estimated here.
If the storage server griddisk configuration as designed is not verified, poor performance or unexpected outages may occur.
Action / Repair:
NOTE: The recommended best practice is to have each griddisk distributed across all celldisks. For older versions of Exadata storage server software and hardware, the griddisks "SYSTEM" or "DBFS_DG" had a slightly different distribution, and the code below correctly accounts for those cases.
To verify that griddisks are distributed as expected across celldisks, execute the following command as the "root" userid on each storage server:
RAW_CELLDISK=$(cellcli -e "list celldisk attributes name" | sed -e 's/^[ \t]*//')
RAW_GRIDDISK=$(cellcli -e "list griddisk attributes name" | sed -e 's/^[ \t]*//')
if [ `echo -e $RAW_CELLDISK | grep CD | wc -l` -ge 1 ]
  PARSED_CELLDISK=$(echo -e "$RAW_CELLDISK" | grep CD)
if [ `echo -e $RAW_GRIDDISK | grep CD | wc -l` -ge 1 ]
  SHORT_GD_NAME_ARRAY=$(echo -e "$RAW_GRIDDISK" | awk -F "_CD_" '{print $1}' | sort -u)
  SHORT_GD_NAME_ARRAY=$(echo -e "$RAW_GRIDDISK" | awk -F "_FD_" '{print $1}' | sort -u)
    GD_COUNT=$(expr `echo "$RAW_GRIDDISK" | grep $GD_SHORT_NAME | wc -l` + 2)
    GD_COUNT=$(echo "$RAW_GRIDDISK" | grep $GD_SHORT_NAME | wc -l)
    OUTPUT_ARRAY+=`echo -e "\n$GD_SHORT_NAME: FAILURE:\tGriddisk count:  $GD_COUNT\tCelldisk count:  $CELLDISK_COUNT"`
if [ $RETURN_RESULT -eq 0 ]
    echo -e "SUCCESS: All griddisks are distributed as expected across celldisks."
    echo -e -n "FAILURE: One or more griddisks are not distributed as expected across celldisks. Details:"
    echo -e "${OUTPUT_ARRAY[@]}"
The expected output should be:
SUCCESS: All griddisks are distributed as expected across celldisks.
Example of a "FAILURE" result:
FAILURE: One or more griddisks are not distributed as expected across celldisks. Details:
C_DATA:  FAILURE:       Griddisk count:  7      Celldisk count:  8
If the output is not as expected, investigate the condition and take corrective action based upon the root cause of the unexpected result.

Verify the percent of available celldisk space used by the griddisks

PriorityAlert LevelDateOwnerStatusEngineered System
CriticalINFO11/09/16<Name>ProductionExadata - Physical,
Exadata - Management Domain
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-811.2+Linux x86-64exachk
Benefit / Impact:
The impact of verifying the percent of available celldisk space used by the griddisks is minimal.
If the percent of available celldisk space used by the griddisks is not verified, an unexpected configuration change may be missed.
Action / Repair:
To verify the percent of available celldisk space used by the griddisks, execute the following command set as the "root" userid on each storage server:
ALLFLASHCELL=$(cellcli -e "list cell attributes makemodel"|egrep  -ic 'ALLFLASH|EXTREME_FLASH');
RAW_GRIDDISK_SIZE=$(cellcli -e "list griddisk attributes size");
TOTAL_GRIDDISK_SIZE=$(echo "$RAW_GRIDDISK_SIZE" | sed 's/\s//g'|awk '/G$/ { print $0 } /T$/ { size=substr($0, 0, length($0)-1); size=size*1024; print size "" "G";}'|awk '{ SUM += $1} END { print SUM}');
if [ $ALLFLASHCELL -eq 0 ]
  RAW_CELLDISK_SIZE=$(cellcli -e "list celldisk attributes size where disktype=harddisk");
  RAW_CELLDISK_SIZE=$(cellcli -e "list celldisk attributes size where disktype=flashdisk");
TOTAL_CELLDISK_SIZE=$(echo "$RAW_CELLDISK_SIZE" | sed 's/\s//g'|awk '/G$/ { print $0 } /T$/ { size=substr($0, 0, length($0)-1); size=size*1024; print size "" "G";}'| awk '{ SUM += $1} END { print SUM}');
GRIDDISK_CELLDISK_PCT=$(echo $TOTAL_GRIDDISK_SIZE $TOTAL_CELLDISK_SIZE | awk '{ printf("%d", ($1/$2)*100) }');
echo -e "INFO:  The percent of available celldisk space used by the griddisks is: $GRIDDISK_CELLDISK_PCT\nThe total griddisk size found is: $TOTAL_GRIDDISK_SIZE\nThe total celldisk size found is: $TOTAL_CELLDISK_SIZE";
The expected output will be similar to:
INFO:  The percent of available celldisk space used by the griddisks is: 99
The total griddisk size found is: 87818.7
The total celldisk size found is: 87819.3
If the output is not as expected for a given known configuration, investigate and take corrective action based upon the root cause of the unexpected result.

NOTE: On a storage server not in an Oracle Virtual Machine environment configured according to Oracle best practices, the percent utilization will typically be >= 99 for spinning disk and >= 94 <= 95 for Extreme Flash. The lower percentage of utilization for Extreme Flash is because the griddisks, Flash Log, and Flash Cache are all built on the same flash hardware.

NOTE: In an Oracle Virtual Machine environment, it is not unusual for the percentage of available celldisk space used by the griddisks to be in the middle 60 range. This is due in part to the fact the DBFS griddisk is not created by default, and user requirements to reserve free space for future use. For example:
INFO:  The percent of available celldisk space used by the griddisks is: 63
The total griddisk size found is: 4236
The total celldisk size found is: 6636.06

 Verify Database Server ZFS RAID Configuration

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical01/27/12X2-2, X2-8, X4-2Solaris11.2.x +11.2.x +
Benefit / Impact:
For a database server running Solaris deployed according to Oracle standards, there will be two ZFS RAID-1 pools, named "rpool" and "data". Each mirror in the pool contains two disk drives. For an X2-2,
there is one mirror for each name. For an X2-8, there is one mirror for "rpool" and 3 for "data". Verifying the database server ZFS RAID configuration helps to avoid a possible performance impact, or an outage.
The impact of validating the ZFS RAID configuration is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the ZFS RAID configuration increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server ZFS RAID configuration, execute the following command as the "root" userid:
/opt/oracle.SupportTools/ | ggrep mirror -A3
The output will be similar to:
------------------- mirror-0 ---------------------
16:5 c1t2d0s0 rpool
16:4 c1t1d0s0 rpool
------------------- mirror-0 ---------------------
16:6 c1t3d0 data
16:7 c1t4d0 data
------------------- mirror-2 ---------------------
16:0 c1t5d0 data
16:2 c1t6d0 data
------------------- mirror-1 ---------------------
16:3 c1t0d0 data
16:1 c1t7d0 data
For an X2-2, the expected output is one pool named "rpool", and one named "data", each comprised of 1 mirror with 2 disk drives.
For an X2-8, the expected output is one pool named "rpool", comprised of 1 mirror with 2 disk drives, and one pool named "data" comprised of 3 mirrors each with 2 disk drives.
If the reported output differs, investigate and correct the condition.

Verify InfiniBand is the Private Network for Oracle Clusterware Communication

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
The InfiniBand network in an Oracle Exadata Database Machine provides superior performance and throughput characteristics that allow Oracle Clusterware to operate at optimal efficiency.
The overhead for these verification steps is minimal.
If the InfiniBand network is not used for Oracle Clusterware communication, performance will be sub-optimal.
Action / Repair:
The InfiniBand network is preconfigured on the storage servers. Perform the following on the database servers:
Verify the InfiniBand network is the private network used for Oracle Clusterware communication with the following command:
$GI_HOME/bin/oifcfg getif -type cluster_interconnect
For X2-2 the output should be similar to:
bondib0 global cluster_interconnect
For X2-8 the output should be similar to:
bondib0 global cluster_interconnect
bondib1 global cluster_interconnect 
bondib2 global cluster_interconnect 
bondib3 global cluster_interconnect
If the InfiniBand network is not the private network used for Oracle Clusterware communication, configure it following the instructions in MOS Note 283684.1,
"How to Modify Private Network Interface in 11.2 Grid Infrastructure".
NOTE: It is important to ensure that your public interface is properly marked as public and not private. This can be checked with the oifcfg getif command. If it is inadvertantly marked private,
you can get errors such as "OS system dependent operation:bind failed with status" and "OS failure message: Cannot assign requested address".
It can be corrected with a command like oifcfg setif -global eth0/<public IP address>:public
 In each database verify that it is using the private IB interconnect withe following query :
SQL> select name,ip_address from v$cluster_interconnects;
--------------- ----------------
Or in the database alert log you can look for the following message:
Cluster communication is configured to use the following interface(s) for this instance

Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers

PriorityAlert LevelDateOwnerStatusEngineered System
CriticalFAIL7/13/16 <NameProduction StatusExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB VersionDB RoleEngineered System PlatformExadata VersionOS VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-811. x86-64exachk
Benefit / Impact: There are specific ARP configurations required for Real Application Clusters (RAC) to work correctly that vary between an active/passive or active/active configuration.
For an active/passive configuration, the settings for all IB interfaces should be:
  accept_local = 1
  rp_filter = 0
For an active/active configuration, the settings for all IB interfaces should be:
  accept_local = 1
  rp_filter = 0
  arp_announce = 2 (8 socket only!)
  - AND the three single attributes -
  net.ipv4.conf.all.rp_filter = 0
  net.ipv4.conf.default.rp_filter = 0
  net.ipv4.conf.all.accept_local = 1
The impact of verifying the ARP configuration is minimal. Correcting a configuration requires editing "/etc/sysctl.conf" and restarting the interface(s).
Incorrect ARP configurations may prevent RAC from starting, or result in dropped packets and inconsistent RAC operation.
Action / Repair:
To verify the InfiniBand interface ARP settings for a database server, use the following command as the "root" userid:
RAW_OUTPUT=$(sysctl -a)
RF_OUTPUT=$(echo "$RAW_OUTPUT" | egrep -i "\.ib|bondib" | egrep -i "\.rp_filter")
AL_OUTPUT=$(echo "$RAW_OUTPUT" | egrep -i "\.ib|bondib" | egrep -i "\.accept_local")
if [ `echo "$RAW_OUTPUT" | grep -i bondib | wc -l` -ge 1 ]
then #active/passive case
  if [[ `echo "$AL_OUTPUT" | cut -d" " -f3 | sort -u | wc -l` -eq 1 && `echo "$AL_OUTPUT" | cut -d" " -f3 | sort -u | head -1` -eq 1 ]]
    AL_RSLT=0 #all AL same value and value is 1
  if [[ `echo "$RF_OUTPUT" | cut -d" " -f3 | sort -u | wc -l` -eq 1 && `echo "$RF_OUTPUT" | cut -d" " -f3 | sort -u | head -1` -eq 0 ]]
    RF_RSLT=0 #all RF same value and value is 0
  if [[ $AL_RSLT -eq 0 && $RF_RSLT -eq 0 ]]
    echo -e "Success:  The active/passive ARP configuration is as recommended:\n"
    echo -e "Failure:  The active/passive ARP configuration is not as recommended:\n"
  echo -e "$AL_OUTPUT\n\n$RF_OUTPUT"
else #active/active case
  NICARF_OUTPUT=$(echo "$RAW_OUTPUT" | egrep -i "net.ipv4.conf.all.rp_filter")
  NICDRF_OUTPUT=$(echo "$RAW_OUTPUT" | egrep -i "net.ipv4.conf.default.rp_filter")
  NICAAL_OUTPUT=$(echo "$RAW_OUTPUT" | egrep -i "net.ipv4.conf.all.accept_local")
  NICARF_RSLT=$(echo "$NICARF_OUTPUT" | cut -d" " -f3)
  NICDRF_RSLT=$(echo "$NICDRF_OUTPUT" | cut -d" " -f3)
  NICAAL_RSLT=$(echo "$NICAAL_OUTPUT" | cut -d" " -f3)
  IB_INTRFCE_CNT=$(echo "$RAW_OUTPUT" | egrep "\.ib.\." | cut -d"." -f4 | sort -u | wc -l)
  if [[ `echo "$AL_OUTPUT" | cut -d" " -f3 | sort -u | wc -l` -eq 1 && `echo "$AL_OUTPUT" | cut -d" " -f3 | sort -u | head -1` -eq 1 ]]
    AL_RSLT=0 #all AL same value and value is 1
  if [[ `echo "$RF_OUTPUT" | cut -d" " -f3 | sort -u | wc -l` -eq 1 && `echo "$RF_OUTPUT" | cut -d" " -f3 | sort -u | head -1` -eq 0 ]]
    RF_RSLT=0 #all RF same value and value is 0
  if [ $IB_INTRFCE_CNT -eq 2 ] # 2 socket case
    if [[ $AL_RSLT -eq 0 && $RF_RSLT -eq 0 && $NICARF_RSLT -eq 0 && $NICDRF_RSLT -eq 0 && $NICAAL_RSLT -eq 1 ]]
      echo -e "Success:  The active/active ARP configuration is as recommended:\n"
      echo -e "Failure:  The active/active ARP configuration is not as recommended:\n"
  else # 8 socket case
  NICIAA_OUTPUT=$(echo "$RAW_OUTPUT" | egrep "\.ib.\." | egrep arp_announce)
  if [[ `echo "$NICIAA_OUTPUT" | cut -d" " -f3 | sort -u | wc -l` -eq 1 && `echo "$NICIAA_OUTPUT" | cut -d" " -f3 | sort -u | head -1` -eq 2 ]]
    NICIAA_RSLT=0 #all arp_announce same value and value is 2
    if [[ $AL_RSLT -eq 0 && $RF_RSLT -eq 0 && $NICIAA_RSLT -eq 0 && $NICARF_RSLT -eq 0 && $NICDRF_RSLT -eq 0 && $NICAAL_RSLT -eq 1 ]]
      echo -e "Success:  The active/active ARP configuration is as recommended:\n"
      echo -e "Failure:  The active/active ARP configuration is not as recommended:\n"

The expected output should be similar to:

Success: The active/passive ARP configuration is as recommended:
net.ipv4.conf.ib0.accept_local = 1
net.ipv4.conf.ib1.accept_local = 1
net.ipv4.conf.bondib0.accept_local = 1
net.ipv4.conf.ib0.rp_filter = 0
net.ipv4.conf.ib1.rp_filter = 0
net.ipv4.conf.bondib0.rp_filter = 0
- OR -
Success: The active/active ARP configuration is as recommended:
net.ipv4.conf.ib0.accept_local = 1
net.ipv4.conf.ib1.accept_local = 1
net.ipv4.conf.ib0.rp_filter = 0
net.ipv4.conf.ib1.rp_filter = 0
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.accept_local = 1
- OR -
Success: The active/active ARP configuration is as recommended:
net.ipv4.conf.ib0.accept_local = 1
<outpout truncated>
net.ipv4.conf.ib7.accept_local = 1
net.ipv4.conf.ib0.rp_filter = 0
<output truncated>
net.ipv4.conf.ib7.rp_filter = 0
net.ipv4.conf.ib0.arp_announce = 2
<output turncated>
net.ipv4.conf.ib7.arp_announce = 2
net.ipv4.conf.all.rp_filter = 0
net.ipv4.conf.default.rp_filter = 0
net.ipv4.conf.all.accept_local = 1
If a "FAILURE: ..." message appears, investigate for root cause, make the necessary edits to "/etc/sysctl.conf", and restart the interface(s).
NOTE: These recommendations are for the InfiniBand interfaces on database servers only! They do not apply to the Ethernet interfaces on the database servers. No changes are permitted on the storage servers.

Verify Oracle RAC Databases use RDS Protocol over InfiniBand Network.

PriorityAlert LevelDateOwnerStatusEngineered System     Bug(s)        
CriticalFAIL03/01/2017 <Name>ProductionExadata - Physical,
Exadata - User Domain,
25490898 - exachk
24958292 - exachk
Reference: 23039723
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD,
Standby, ASM
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, SL6AllLinux x86-64,
Sparc Linux
Benefit / Impact:
The RDS protocol over InfiniBand provides superior performance because it avoids additional memory buffering operations when moving data from process memory to the network interface for IO operations.
This includes both IO operations between the Oracle instance and the storage servers, as well as instance to instance block transfers via Cache Fusion.
There is minimal impact to verify that the RDS protocol is in use. Implementing the RDS protocol over InfiniBand requires an outage to relink the Oracle software.
If the Oracle RAC databases do not use RDS protocol over the InfiniBand network, IO operations will be sub-optimal.
Action / Repair:
To verify the RDS protocol is in use by a given Oracle instance, set the ORACLE_HOME and LD_LIBRARY_PATH variables properly for the instance and execute the following command as the oracle userid
on each database server where the instance is running:
The output should be:
Note: For Oracle software versions below, the skgxpinfo command is not present. For, you can copy over skgxpinfo to the proper path in your environment from an
available environment and execute it against the database home(s) using the provided command.
Note: An alternative check (regardless of Oracle software version) is to scan each instance's alert log (must contain a startup sequence!) for the following line:

Cluster communication is configured to use the following interface(s)for this instance cluster interconnect IPC version:Oracle RDS/IP (generic)
If the instance is not using the RDS protocol over InfiniBand, relink the Oracle binary using the following commands (with variables properly defined for each home being linked):

  • (as oracle) Shutdown any processes using the Oracle binary
  • If and only if relinking the grid infrastructure home, then (as root) GRID_HOME/crs/install/ -unlock
  • (as oracle) cd $ORACLE_HOME/rdbms/lib
  • (as oracle) make -f ipc_rds ioracle
  • If and only if relinking the Grid Infrastructure home, then (as root) GRID_HOME/crs/install/ -patch
Note: Avoid using the relink all command due to various issues. Use the make commands provided.

Verify Database and ASM instances use same SPFILE

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalMarch 2013AllAllAllAll
Benefit / Impact:
All instances for a particular database or ASM cluster should be using the same spfile. Making changes to databases and ASM instances needs to be done in a reliable and consistent way across all instances.
Multiple 'sources of truth' can cause confusion and possibly unintended values being set.
Action / Repair:
Verify what spfile is used across all instances of one particular ASM or database cluster. If multiple spfiles for one database are found, provide a recommendation to consolidate them into one.
Scope includes all machine types, os types and db versions
SQL> select name, value from gv$parameter where name = 'spfile';

NAME                           VALUE
------------------------------ ------------------------------------------------------------
spfile                         +DATA/racone/spfileracone.ora

The value for pfile should be empty:
SQL> select name, value from gv$parameter where name = 'pfile';
no rows selected

Verify Berkeley Database location for Cloned GI homes

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalMarch 2013X2-2(4170), X2-2, X2-8, X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact
After cloning a Grid Home the Berkeley Database configuration file ($GI_HOME/crf/admin/crf<node>.ora) in the new home should not be pointing to the
previous GI home where it is cloned from. During previous patch set updates Berkeley Database configuration files were found still pointing to the
'before (previously cloned from) home'. It was due an invalid cloning procedure the Berkeley Database location of the 'new home' was not updated during
the out of place bundle patching procedure
Berkeley Database configurations still pointing to the old GI home, will fail GI Upgrades to Error messages in $GRID_HOME/log/crflogd/crflogdOUT.log logfile
Action / Repair:
# cat $GI_HOME/crf/admin/crf`hostname -s`.ora | grep CRFHOME | grep $GI_HOME | wc -l 

# cat $GI_HOME/crf/admin/crf`hostname -s`.ora | grep BDBLOC | egrep "default|$GI_HOME | wc -l

For each of the above commands, when no '1' is returned, the CRFHOME or BDBLOC as mentioned the crf.ora file has the wrong reference to the GI_HOME in it.
To solve this, manually edit $GI_HOME/crf/admin/crf<node>.ora in the cloned Grid Infrastructure Home and change the values for BDBLOC and CRFHOME
and make sure none of them point to the previous GI_HOME but to their current home. The same change needs to be done on all nodes in the cluster.
It is recommended to set BDBLOC to "default". This needs to be done prior the upgrade.
. Reference: 1485970.1 / 14168708

Configure Storage Server alerts to be sent via email

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
Oracle Exadata Storage Servers can send various levels of alerts and clear messages via email or snmp, or both. Sending these messages via email at a minimum helps to ensure that a problem is detected and corrected.
There is little impact to storage server operation to send these messages via email.
If the storage servers are not configured to send alerts and clear messages via email at a minimum, there is an increased risk of a problem not being detected in a timely manner.
Action / Repair:
Use the following cellcli command to validate the email configuration by sending a test email:
alter cell validate mail; 
The output will be similar to:
Cell slcc09cel01 successfully altered 
If the output is not successful, configure a storage server to send email alerts using the following cellcli command (tailored to your environment):
ALTER CELL smtpServer='', -
smtpFromAddr='', -
smtpToAddr='', -
smtpFrom='Exadata cell', -
smtpPort='<port for mail server>', -
smtpUseSSL='TRUE', -
notificationPolicy='critical,warning,clear', -

NOTE: The recommended best practice to monitor an Oracle Exadata Database Machine is with Oracle Enterprise Manager (OEM) and the suite of OEM plugins developed for the Oracle Exadata Database Machine.
Please reference My Oracle Support (MOS) Note 1110675.1 for details.

Configure NTP and Timezone on the InfiniBand switches

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
Synchronized timestamps are important to switch operation and message logging, both within an InfiniBand switch between the InfiniBand switches. There is little impact to correctly configure the switches.
If the InfiniBand switches are not correctly configured, there is a risk of improper operation and disjoint message timestamping.
Action / Repair:
The InfiniBand switches should be properly configured during the initial deployment process. If for some reason they were were not, please consult  the "Configuring Sun Datacenter InfiniBand Switch 36 Switch"
section of the "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)".

Configure NTP slew_always settings as SMF property for Solaris

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2, X2-8, X4-2Solaris11.2.x + +
Benefit / Impact:
Configuring NTP slew settings as an SMF property will make sure the time is equally managed on all the systems which will prevent timing issues that may impact availability. This also helps in problem analysis
and will prevent for error messages in the system-log about in incorrect ntp setting
Not having a working NTP configuration using SMF will result in different time settings on the nodes. This may impact stability and makes problem analysis difficult.
"syntax error in /etc/inet/ntp.conf line 95, ignored"
Action / Repair:
As a best practice the ntp configuration setting slew_always should be configured as an SMF setting. After setting slew_always in SMF the other setting 'disable pll' is not required anymore.
On Solaris 11 Express and Solaris 11 both should not exist in ntp.conf

Enable Xeon Turbo Boost

PriorityAlert LevelDateOwnerStatusScopeBug(s)
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX4-2, X4-811. TBD 
Benefit / Impact:
Xeon Turbo Boost automatically allows processor cores to run faster than their rated frequency if operating below power, current, and temperature specification limits, which may result in better performance for some applications. Turbo Boost is supported on X4 systems only.
Action / Repair:
Verify your system is using X4-based hardware using the dmidecode command:
# dmidecode -s system-product-name
The output on an X4-based database server is "SUN SERVER X4-2". The output on an X4-based storage server is "SUN SERVER X4-2L".
Verify Turbo Boost is enabled on X4 database and storage servers using the following command:
# ubiosconfig export all -E | fgrep Turbo_Mode
Turbo Boost is enabled if the output is the following:
Turbo Boost is disabled if the output is the following:
If Turbo Boost is disabled, then enable it (on X4 systems only) by following the instructions in MOS Document 1487339.1, Issue 1.6 - Enable the Xeon Turbo Boost mode for X4 storage and database servers.
NOTE: Although it is possible to enable Turbo Boost on X3-based Exadata hardware, it is not supported.

Verify NUMA Configuration

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
Benefit / Impact:
X2-2 Database servers in Oracle Exadata Database Machine by default are booted with operating system NUMA support enabled. Commands that manipulate large files without using direct I/O on ext3 file systems
will cause low memory conditions on the NUMA node (Xeon 5500 processor) currently running the process.
By turning NUMA off, a potential local node low memory condition and subsequent performance drop is avoided.
X2-8 Database servers should have NUMA on
The impact of turning NUMA off is minimal.
Once local node memory is depleted, system performance as a whole will be severely impacted.
Action / Repair:
Follow the instructions in MOS Note 1053332.1 to turn NUMA off in the kernel for database servers.
NOTE: NUMA is configured to be off in the storage servers and should not be changed.

Verify Exadata Smart Flash Log is Created

PriorityAlert LevelDateOwnerStatusScope
CriticalFAIL03/05/2013<Name> ProductionExadata, SSC, Exalogic
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, EIGHTH, X3-8, X4- - 11
Linux x86-64 UEK5.8
exachk 2.2.1
Benefit / Impact:
When created, Exadata Smart Flash Log uses 512MB of flash memory per storage server by default to help minimize redo log write latency.

The impact of verifying that Exadata Smart Flash Log is created is minimal.
Without Exadata Smart Flash Log, the LGWR process may be delayed causing longer "log file parallel write" and "log file sync" waits.
Action / Repair:
To verify that Exadata Smart Flash Log is created, execute the following cellcli command as the "celladmin" user on each storage server:
list flashlog attributes size,status
The output should be similar to:
 512M normal
If the size is not as expected, Exadata Smart Flash Log may not be created, or there may be a hardware issue, or there may be a configuration issue.
It is extremely important that the root cause for the size not being as expected is understood before attempting corrective action. Because Smart Flash Log and Smart Flash Cache share the same physical memory structure on the storage servers, both are likely to be impacted by hardware failures, for example. Corrective action is also impacted by whether or not Write Back Flash Cache is in use, and solutions for the same root cause may vary if Write Back Flash Cache is in use.
After determining the root cause, refer to the Database Machine Owner's Guide and the Exadata Software User's Guide for the appropriate corrective action steps.
Because they share the same storage server physical flash memory, there is a space usage relationship between Exadata Smart Flash Log and Exadata Smart Flash Cache. Exadata Smart Flash Log should be created before Exadata Smart Flash Cache, because the default configuration for Exadata Smart Flash Cache will use all available storage server flash memory. If Exadata Smart Flash Cache already exists, a subsequent attempt to create Exadata Smart Flash Log will fail because all the available storage server flash memory is in use.

NOTE: Exadata Smart Flash Log is created by default with Exadata Storage Server Software version and above.
NOTE: Exadata Smart Flash Log will be used by Oracle software Bundle Patch 9 (or higher) or The recommended Oracle software version levels are Bundle Patch 11 (or higher) or Bundle Patch 1 (or higher).
NOTE: The default Exadata Smart Flash Log size of 512MB is the recommended value.
NOTE: See also "Configure Storage Server Flash Memory as Exadata Smart Flash Cache"

Verify Exadata Smart Flash Cache is Created

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAILupdated 10/11/17<Name>ProductionExadata - Physical,
Exadata - Management Domain,
SSC, Exalogic
<26637216>- exachk
<24514430>- exachk
<23063691>- exachk
<22344656>- exachk
<18691846>- exachk 
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
Linux x86-64
Benefit / Impact:
For the vast majority of situations, maximum performance is achieved by configuring the storage server flash memory as cache, allowing the Exadata software to determine the content of the cache.
The impact of configuring storage server flash memory as cache at initial deployment is minimal. If there are already grid disks configured in the flash memory, consideration must be given as to the relocation of the data when converting the flash memory back to cache.
Not configuring the storage server flash memory as cache may result in a degradation of overall performance.
Action / Repair:
To confirm all storage server flash memory is configured as smart flash cache, execute the command shown below:
cellcli -e "list flashcache detail" | grep size
The output will be similar to:
 size: 5.82122802734375T
Starting with Exadata software version, for an environment deployed according to Oracle standards, with the storage server "flashlog" feature in use at the default size of 512M, the size of the storage server "flashcache" should match one of the entries in this table:

Smart Flash Cache Expected Size Table
  System Description    Common Name  Cache Size with
Smart Flash Log
Cache Size without
Smart Flash Log
Cache Size with
and Smart Flash Log
Cache Size with
and no Smart Flash Log
FCC not available on this hardwareFCC not available on this hardware
X4270 M2X2-2, X2-80.356201171875T
FCC not available on this hardwareFCC not available on this hardware
X4270 M3X3-2, X3-81.453857421875T
X4270 M3EIGHTH0.7266845703125T
X5-2LX5-25.82122802734375T5.82171630859375TFCC not available on this hardwareFCC not available on this hardware
X5-2LEIGHTH2.910369873046875T2.910858154296875TFCC not available on this hardwareFCC not available on this hardware
X6-2LX6-2, X6-811.64312744140625T11.64361572265625TFCC not available on this hardwareFCC not available on this hardware
X6-2LEIGHTH5.821319580078125T5.821807861328125TFCC not available on this hardwareFCC not available on this hardware
X7-2L (all flash)X7-22.3287353515625T2.3287353515625TFCC-NAFCC-NA
If the size is not as expected, some of the storage server flash memory may be configured as grid disks, or there may be a hardware issue, or there may be a configuration issue.
It is extremely important that the root cause for the size not being as expected is understood before attempting corrective action. Because Smart Flash Log and Smart Flash Cache share the same physical memory structure on the storage servers, both are likely to be impacted by hardware failures, for example. Corrective action is also impacted by whether or not Write Back Flash Cache is in use, and solutions for the same root cause may vary if Write Back Flash Cache is in use.
After determining the root cause, refer to the Database Machine Owner's Guide and the Exadata Software User's Guide for the appropriate corrective action steps.
NOTE: While not configuring the Exadata Smart Flash Log is permitted, it is recommended that the Exadata Smart Flash Log be configured. If a decision is made not to create the Exadata Smart Flash Log, the expected size for the Smart Flash Cache is shown in column "Cache Size without Smart Flash Log" and "Cache Size with flashCacheCompression and no Smart Flash Log".
NOTE: On storage servers that use only flash memory devices(no spinning disks), the Exadata Smart Flash Cache size is the same whether or not Exadata Smart Flash Log is created. Therefore, the order in which Exadata Smart Flash Log and Exadata Smart Flash Cache are created does not matter.
NOTE: See also "Verify Exadata Smart Flash Log is Created".
Verify Exadata Smart Flash Cache status is "normal" 
PriorityAlert LevelDateOwnerStatusEngineered System
CriticalFAIL10/13/15<Name> ProductionExadata-Physical,
Exadata-Management Domain,
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X5-2,X5-8ALLLinux x86-64 el5uek,
Linux x86-64 el6uek

Benefit / Impact:
Verifying that the Exadata Smart Flash Cache status is "normal" helps to avoid a performance degradation.
The impact of verifying that the Exadata Smart Flash Cache status is "normal" is minimal. The impact of restoring the Exadata Smart Flash Cache status to "normal" varies, depending upon the reason for the abnormality, and cannot be estimated here.
If the Exadata Smart Flash Cache status is not "normal", a performance degradation is likely.
Action / Repair:
To verify that the Exadata Smart Flash Cache status is "normal", as the root userid on each storage server, execute the following command set:
CACHE_STATE=$(cellcli -e "list flashcache attributes status");
if [ $CACHE_STATE = "normal" ]
echo -e SUCCESS: the Exadata Smart Flash Cache state is: $CACHE_STATE;
echo -e FAILURE: the Exadata Smart Flash Cache state is: $CACHE_STATE;
The expected output is:
SUCCESS: the Exadata Smart Flash Cache state is: normal
If the output is not as expected, investigate for root cause and correct the discovered cause.
NOTE: If the word "degraded" appears in the output, investigate the hardware condition as a memory module may have failed.
NOTE: If the word "flushed" appears in the output, a cache flush command was issued and was not subsequently cancelled. For example:
FAILURE: the Exadata Smart Flash Cache state is: normal - flushed
In this condition, the Exadata Smart Flash Cache is not in use for cache operations of any type!
To cancel a flash cache flush operation, as the root userid on the storage server with the issue, execute the following command:
cellcli -e "alter flashcache all cancel flush"
The output should be:
Flash cache randomcel05_FLASHCACHE altered successfully

Verify Master (Rack) Serial Number is Set

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical03/02/11X2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +

Setting the Master Serial Number (MSN) (aka Rack Serial Number) assists Oracle Support Services to resolve entitlement issues which may arise. The MSN is listed on a label on the front and the rear of the chassis
but is not electronically readable unless this value is set.
The impact to set the MSN is minimal.
Not having the MSN set for the system may hinder entitlement when opening Service Requests.
Use the following command as the "root" userid to verify that all the MSN's are set correctly and match on all servers:
ipmitool sunoem cli "show /SP system_identifier" | grep "system_identifier ="
The output should resemble one of the following:
For X2-2(4170):
system_identifier = Sun Oracle Database Machine xxxxAKyyyy
For X2-2:
system_identifier = Exadata Database Machine X2-2 xxxxAKyyyy
For X2-8:
system_identifier = Exadata Database Machine X2-8 xxxxAKyyyy
(MSN's will be of the format either 4 numbers, the letters 'AK' followed by 4 more numbers or letters A-F, or the letters 'AK followed by 8 numbers or letters A-F)
On any server where the MSN is not set correctly, use the following command as the "root" userid to set it:
ipmitool sunoem cli 'set /SP system_identifier="text_identifier_string serial_number"'
Where "text_identifier_string" is one of:
For X2-2(4170): "Sun Oracle Database Machine"
For X2-2: "Exadata Database Machine X2-2"
For X2-8: "Exadata Database Machine X2-8"
and "serial_number" is the MSN from the label attached to the rack.

NOTE: The label with the Master Serial Number is located on the top left side wall (viewed from rear) inside the rack on the rear of the chassis.
NOTE: In the command to set the Master Serial Number there is a space between the "text_identifier_string" and the "serial_number".

Verify Management Network Interface (eth0) is on a Separate Subnet

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical03/02/11X2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
It is a requirement that the management network be on a different non-overlapping sub-net than the InfiniBand network and the client access network. This is necessary for better network security, better client access
bandwidths, and for Auto Service Request (ASR) to work correctly.
The management network comprises of the eth0 network interface in the database and storage severs, the ILOM network interfaces of the database and storage servers, and the Ethernet management interfaces of the
InfiniBand switches and PDUs.
Having the management network on the same subnet as the client access network will reduce network security, potentially restrict the client access bandwidth to/from the Database Machine to a single 1GbE link,
and will prevent ASR from working correctly.
To verify that the management network interface (eth0) is on a separate network from other network interfaces, execute the following command as the "root" userid on both storage and database servers:
grep -i network /etc/sysconfig/network-scripts/ifcfg* | cut -f5 -d"/" | grep -v "#"
The output will be similar to:

The expected result is that the network values are different. If they are not, investigate and correct the condition.

Verify RAID disk controller CacheVault capacitor condition

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL08/08/18 ProductionSSC, Exadata - Physical,
Exadata - Management Domain
X5-2, X5-8, X6-2, X6-8, X7-228438875 - exachk
27495768 - exachk
22911250 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/A18.1.0 or higherLinuxexachk 18.4.0N/A

The CacheVault capacitor loses its ability to support cache over time. Verifying the CacheVault capacitor condition helps to reasonably time proactive replacement.
The impact of verifying the CacheVault capacitor condition is minimal. Replacing the CacheVault will require downtime for the impacted server.
A failed CacheVault capacitor will put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.
NOTE: This check is not applicable to Extreme Flash Oracle Exadata Storage Servers nor X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
Execute the following command as the "root" userid on all storage and database servers:
RAW_OUTPUT=$(/opt/MegaRAID/storcli/storcli64 /c0/cv show all)
if [[ $(echo "$RAW_OUTPUT" | egrep -i "^state" | egrep -ic optimal)  -eq 1 ]]
  echo -e "SUCCESS: raid controller CacheVault condition is optimal."
  echo -e "FAILURE: raid controller CacheVault condition is not optimal.  Details:\n\n$RAW_OUTPUT"
The expected output should be:
SUCCESS: raid controller CacheVault condition is optimal.
If the output is a "FAILURE" message, upload the detailed information provided into a hardware service request for component replacement.

Verify RAID Disk Controller Battery Condition

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL08/01/18<Name> ProductionSSC, Exadata - Physical,
Exadata - Management Domain
X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-228280123 - exachk
27502799 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/A18.1.0 or higherLinuxexachk 18.4.0N/A
Maintaining optimal condition maximizes RAID controller battery life.
The impact of verifying RAID controller battery condition is minimal. The impact of correcting a non-optimal condition varies, and may include a server shutdown to replace batteries.
A non-optimal battery condition may place the RAID controller into WriteThrough mode which significantly impacts write I/O performance.
NOTE: This check is not applicable to Extreme Flash Oracle Exadata Storage Servers nor X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
To verify the RAID controller battery condition, execute the following command as the "root" userid on all database and storage servers:
RAW_OUTPUT=$(/opt/MegaRAID/storcli/storcli64 /c0/bbu show all)
if [[ $(echo "$RAW_OUTPUT" | egrep -i "battery state" | egrep -ic optimal)  -eq 1 ]]
  echo -e "SUCCESS: raid controller battery condition is optimal."
  echo -e "FAILURE: raid controller battery condition is not optimal.  Details:\n\n$RAW_OUTPUT"
The expected output should be similar to:
SUCCESS: raid controller battery condition is optimal.
If the output is a "FAILURE" message, upload the detailed information provided into a hardware service request for component replacement.
Verify Ambient Air Temperature
 Alert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFail03/16/16<Name> ProductionExadata - Physical,
Exadata - Management Domain
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8AllLinux x86-64exachk 
Benefit / Impact:
Maintaining ambient air temperature conditions within design specification for an Oracle Exadata Database Machine helps to achieve maximum efficiency and targeted component service lifetimes.
The impact of verifying the ambient air temperature is minimal. The impact of correcting ambient air temperatures outside of design specification range will vary depending upon the root cause of the issue.

Ambient air temperatures outside the design specification range affect all components within the chassis of an Oracle Exadata Database Machine, possibly manifesting performance problems and shortened service lifetimes.
Action / Repair:
To verify the ambient air temperature, execute the following command set as the "root" userid on each storage and database server:
AMBIENT_TEMP=$(ipmitool sunoem cli "show /SYS/T_AMB" | grep value | sed -e 's/^[ \t]*//;s/[ \t]*$//' | cut -d" " -f3);
if [[ 'echo "${AMBIENT_TEMP//./}"' -ge 5000 && 'echo "${AMBIENT_TEMP//./}"' -le 32000 ]]
echo "SUCCESS: Ambient air temperature is within the range of 5 to 32 degrees Centigrade: $AMBIENT_TEMP";
echo -e "FAILURE: Ambient air temperature is outside the range of 5 to 32 degrees Centigrade: $AMBIENT_TEMP";
The output should be similar to:
SUCCESS: Ambient air temperature is within the range of 5 to 32 degrees Centigrade: 27.250

If the ambient air temperature is not within the recommended range, investigate for root cause and take appropriate corrective action.
NOTE: Since there is no one sensor in the physical rack for overall ambient temperature of the data center air, this check reads the ambient temperature from each storage and database server.

Verify Platform Configuration and Initialization Parameters for Consolidation

Platform Consolidation Considerations

Consolidation Parameters Reference Table

Critical, 08/02/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameter settings should use the following formulas for platform consolidation. By using these formulas as recommended, known
problems may be avoided and performance maximized.
The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear
understanding of the performance impact.
Risk: If the operating system and database parameters are not set as recommended, a variety of issues may be encountered that can lead to system and database instability.
Action / Repair: To verify the database initialization parameters, use the following guidance:
The following are important platform level considerations in a consolidated environment.
  • Operating System Configuration Recommendations
  • Hugepages, when set, should equal the sum of shared memory from all databases, see MOS Note 401749.1 for precise computations and see MOS Note 361323.1
    for a description of Hugepages. Hugepages is generally required if "PageTables" in /proc/meminfo is > 2% of physical memory
    • Benefits: Memory savings. Prevent cases of paging and swapping when not configured.
    • Tradeoffs: Set Hugepages correctly and need to be adjusted when another instance is added/dropped or when sga sizes change.
    • As of to disable hugepages on an instance set parameter "use_large_pages=false" 
    • Note that as of onecommad version that supports BP9 hugepages is automatically configured upon deployment. The vm.nr_hugepages value
      may need to be adjusted if an instance memory parameters are changed post initial deployment
  • Amount of locked memory - 75% of physical memory
  • Number of Shared Memory Identifiers  - set greater than the number of databases
  • Size of Shared Memory Segments - OS setting for max size = 85% of physical memory
  • Number of semaphores - sum of processes cannot exceed the maximum number of semaphors. On linux, the max can be obtained with cat /proc/sys/kernel/sem | awk '{print $2}'.
    The number of semaphores on the system should not be so high such that maximizing oracle processes running causes performance problems .
  • Number of semaphores in a semaphore set: The number of semaphores in a semaphore set must be at least as high as the largest
    value for the processes parameter in all databases. On linux, the number of semaphore sets can be obtained with cat /proc/sys/kernel/sem | awk '{print $4}'
  • Applications with similar SLA requirements are best suited to co-exist in a consolidated environment together. Do not mix mission critical applications with non mission critical applications in the same consolidated environment. Do not mix production and test/dev databases in the same environment.
  • It is possible to œover-subscribe an application resource requirements in a consolidated environment as long as the other applications œunder-subscribe at that time. The exception
    to this is mission critical applications. Do not œover-subscribe in a consolidated environment that contains mission critical applications. Oracle Resource Manager can be used to
    manage varying degrees of IO and CPU requirements within one database and across databases. Within one database, Oracle Resource Manager can also manage parallel query processing.

Consolidation Parameters Reference Table

The performance related recommendations provide guidance to maintain highest stability without sacrificing performance. Changing these performance settings can be done after careful performance
evaluation and clear understanding of the performance impact.
This parameter consolidation health check table is a general reference for environments. This is not a hard prerequisite for a consolidated environment, rather a guideline used to establish
the formulas, maximum values, and notes below. It should suffice for most customers, but if you do not qualify for this formula, the table below can be used as a reference solely for important
parameters that must be considered. These values are per node.

Sga_target / Pga_aggregate_target
Sum of all sga_target and pga_aggregate_target for all databases < 75% of physical memory
Sum of Sga_target + (pga_aggregate_target x 3) < 75% of physical memory
Both OLTP and DW/BI:
Sum of Sga_target + pga_aggregate_limit < 75% of physical memory
75% of total memoryCheck aforementioned formula. Exceeding recommended memory usage can potentially cause performance problems. It is important to also ensure that the value computed from the formula is sufficient for the application using the associated database.Pga_aggregate_target setting does not enforce a maximum PGA usage. For some data warehouse and BI applications, 3 X specified target has been observed. For OLTP applications, the spill over is much less. The 25% room provides insurance from any additional spill over and for non-SGA/PGA memory allocation. Process memory and non-memory allocations can add up to be 1-5 MB/process in some cases. Monitoring application and system memory utilizatoin is required to ensure there's sufficient memory throughout your workload/business cycles. Oracle recommends at least 5% memory free at all times.
In 12c, new parameter pga_aggregate_limit was introduced, it enforces a maximum PGA usage so the specified parameter value should be used in calculations. pga_aggregate_limit is derived from pga_aggregate_target and defaults to the greater of 2gb or 2 times the pga_aggregate_target setting.
DBM Machine Type: Memory Available : Oracle Memory Target
DBM V2 | 72 GB | 54 GB
X2-2 | 96 GB | 60.8 GB can be expanded to 144GB
X2-8 | 1 TB | 768GB
X3-2 | 256G | 192GB
X3-8 | 2 TB | 1536G
X4-2 | 512G | 384GB
X4-8 | 6 TB | 4608GB
X5-2 | 1 TB | 768GB

Cpu_countFor mission critical applications:Sum of cpu_count of all databases <= 75% X Total CPUs
For light-weight CPU usage applications,
sum (CPU_COUNT) <=3 X CPUs
CPU intensive applications,
sum(CPU_COUNT) <= Total CPUs

Refer to the formulas in the previous columnRules of thumbs:1.Leverage CPU_COUNT and instance caging for platform consolidation (e.g. managing multiple databases within Exadata DBM). They are particularly helpful in preventing processes and jobs from over-consuming target CPU resources.
2. Most light weight applications are idle and consume < 3 CPUs.
3. Large reporting/DW/BI and some OLTP applications ("CPU intensive applications) can easily consume all the CPU so they need to be bounded with instance caging and resource management.
4. For consolidating mission critical applications, recommend not over-subscribing CPU resources to maximize stability and performance consistency.
For additional guidance and precautions, refer to <Doc ID 1362445.1>
Exadata DBM | # Cores |# CPUs
DBM V2 | 8 CPUs | 16 CPUs
X2-2 | 12 CPUs | 24 CPUs
X2-8 | 64 CPUs | 128 CPUs

resource_manager_planNANAEnsure this is enabled. A good starting value is '˜default_plan'™
processesSum of processes of all databases < maxNumber of semaphores on the systemCheck formula. Alert if > max
Alert if # Active Processes > 4 X CPUs
Sum (all processes for all instances) < 21K
Parallel parametersAutomatic Adjusting CPU_COUNT parameter for platform consolidation or resource management will automatically update PARALLEL_MAX_SERVERS and PARALLEL_SERVERS_TARGET parameter values provided these are not explicitly specified in the parameter file.
Db_recovery_file_dest_sizeSum of Db_recovery_file_dest_size <= Fast Recovery AreaSize of Usable Fast Recovery AreaCheck formula; Usable FRA space subtracts the space consumed by other files such as online log files in the case of RECO being the only high redundancy diskgroups

Verify operating system hugepages count satisfies total SGA requirements

Alert Level
Engineered System
Exadata - Physical
Exadata - Management Domain

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2
Linux x86-64 el5uek
Linux x86-64 el6uek

Benefit / Impact:

Properly configuring operating system hugepages on Linux and setting the database initialization parameter "use_large_pages" to "only" results in more efficient use of memory and reduced paging.

The impact of validating that the total current hugepages are greater than or equal to estimated requirements for all currently active SGAs is minimal. The impact of corrective actions will vary depending on the specific configuration, and because the hugepages pool must be contiguous, it is recommended to reboot the database server.


The risk of not correctly configuring operating system hugepages in advance of setting the database initialization parameter "use_large_pages" to "only" is that if not enough huge pages are configured, some databases will not start after you have set the parameter.

Action / Repair:

PREREQUISITE: All database instances that are supposed to run concurrently on a database server must be up and running for this check to be accurate.

To verify that the total number of configured hugepages is greater than or equal to the estimated requirements of all currently active SGAs using large pages. As the root user copy the following block of commands to a shell script (i.e, /tmp/ and execute it.


TOTAL_HUGEPAGES='grep HugePages_Total /proc/meminfo | cut -d":" -f 2 | sed -e 's/^[ \t]*//;s/[ \t]*$//''
HPG_SZ='grep Hugepagesize /proc/meminfo | awk '{print $2}''
MGMT_PID='/usr/bin/pgrep -f mdb_pmon_'
if [ $? -eq 0 ]; then
MGMT_SEGIDS='grep SYSV /proc/${MGMT_PID}/maps | awk '{print $5}' | uniq'
IPCARR=('ipcs -m | grep "^0x" | awk '{ print $2":"$5}'')
for SEGIDBYTES in "${IPCARR[@]}"
if [[ $MGMT_PID -eq 0 || ! "$MGMT_SEGIDS" =~ "$SEG_ID" ]]; then
MIN_PG='echo "$SEG_BYTES/($HPG_SZ*1024)" | bc -q'
if [ $MIN_PG -gt 0 ]; then
NUM_PG='echo "$NUM_PG+$MIN_PG+1" | bc -q'
then echo -e "\nSUCCESS: Total current hugepages ($TOTAL_HUGEPAGES) are greater than or equal to"
echo -e " estimated requirements for all currently active SGAs ($NUM_PG).\n"
else echo -e "\nFAILURE: Total current hugepages ($TOTAL_HUGEPAGES) should be greater than or equal to"
echo -e " estimated requirements for all currently active SGAs ($NUM_PG).\n"

The output should be similar to:

SUCCESS:  Total current hugepages (13004) are greater than or equal to         
                 estimated requirements for all currently active SGAs (632).

If the output is not "SUCCESS", investigate and correct the condition.

NOTE: Please refer to My Oracle Support notes MOS 401749.1, 361323.1, and 1392497.1 for additional details on configuring hugepages.

NOTE: If you have not reviewed notes 401749.1, 361323.1, and 1392497.1 and followed their guidance BEFORE using the database parameter "use_large_pages=only", this check will pass the environment but you will still not be able to start instances once the configured pool of operating system hugepages have been consumed by instance startups. If that should happen, you will need to change the "use_large_pages" inialization parameter to one of the other values, restart the instance, and follow the instructions in notes 401749.1 and 361323.1. The brute force alternative is to increase the huge page count until the newest instance will start, and then adjust the huge page count after you can see the estimated requirements for all currently active SGAs.

NOTE: While it is possible to modify the number of hugepages in active memory in the running kernel, it is not recommended for two reasons:
1) The hugepages pool must be contiguous, and it may not be possible to find enough contiguous pages to meet a request in the running kernel active memory.
2) Setting the value in the kernel configuration files and rebooting ensures the expected number of hugepages is properly configured and available. Misconfigurations in this area can impact server availability so following this operational best practice prevents an unexpected outage caused by user error.

Verify "MaxStartups 100" in /etc/ssh/sshd_config on all database servers

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical03/21/12X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.0.3+ +

Benefit / Impact:
Configuring "MaxStartups 100" helps to avoid the risk of certain cluster operations failing for clusters containing more than 10 database servers.
Cluster operations examples include installing or upgrading the grid infrastructure, and adding a cluster node.
The impact of verifying "MaxStartups 100" is minimal. The impact of correcting the setting is moderate, requiring a restart of the sshd service.
With "MaxStartups" configured at the default value (10), certain cluster operations for clusters containing more than 10 database servers may fail.
For example, if the Oracle Univeral Installer (OUI) calls the Cluster Verification Utility (CVU) and CVU starts an ssh session across all nodes
concurrently that fails because more than 10 concurrent ssh connections are required.
Action / Repair:
To verify that "MaxStartups 100" is set in /etc/ssh/sshd_config file, execute the following command as the "root" userid on the node where was executed:
dcli -g /opt/oracle.SupportTools/onecommand/dbs_group -l root "egrep -i maxstartups /etc/ssh/sshd_config"
The output should be similar to:
randomdb01: MaxStartups 100
<output truncated>
randomdb16: MaxStartups 100
If the output is not as expected, as the root userid on each database server, edit the sshd_config file to include "MaxStartups 100" and restart the ssh service with the "service sshd restart" command.

Verify all datafiles have "AUTOEXTEND" attribute "ON"

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux, [WIP:VW]Solaris11.2.x +11.2.x +

Benefit / Impact
The benefit of having "AUTOEXTEND" on is that applications may avoid out of space errors.
The impact of verifying that the "AUTOEXTEND" attribute is "ON" is minimal. The impact of setting "AUTOEXTEND" to "ON" varies depending upon if it is done during database creation, file addition to a tablespace, or added to an existing file.

The risk of running out of space in either the tablespace or diskgroup varies by application and cannot be quantified here. A tablespace that runs out
of space will interfere with an application, and a diskgroup running out of space could impact the entire database as well as ASM operations (e.g., rebalance operations)..

Action / Repair

To obtain a list of tablespaces that are not set to "AUTOEXTEND", enter the following sqlplus command logged into the database as sysdba:
select file_id, file_name, tablespace_name from dba_data_files where autoextensible <>'YES'
select file_id, file_name, tablespace_name from dba_temp_files where autoextensible <> 'YES'; 
The output should be:
no rows selected
If any rows are returned, investigate and correct the condition.

NOTE: Configuring "AUTOEXTEND" to "ON" requires comparing space utilization growth projections at the tablespace level to space available in the diskgroups to permit the expected
projected growth while retaining sufficient storage space in reserve to account for ASM rebalance operations that occur either as a result of planned operations or component failure.
The resulting growth targets are implemented with the "MAXSIZE" attribute that should always be used in conjunction with the "AUTOEXTEND" attribute. The "MAXSIZE" settings should
allow for projected growth while minimizing the prospect of depleting a disk group. The "MAXSIZE" settings will vary by customer and a blanket recommendation cannot be given here.

NOTE: When configuring a file for "AUTOEXTEND" to "ON", the size specified for the "NEXT" attribute should cover all disks in the diskgroup to optimize balance. For example,
with a 4MB AU size and 168 disks, the size of the "NEXT" attribute should be a multiple of 672M (4*168).

Enable portmap service if app requires it

By default, the portmap service is not enabled on the database nodes and it is required for things such as NFS. If needed, enable and start it using the following with dcli across required nodes:
chkconfig --level 345 portmap on
service portmap start

Enable proper services on database nodes to use NFS

In addition to the portmap service previously explained, the nflsock service must also be enabled and running to use NFS on database nodes. Below is a working example, showing the errors that will be encountered with
various utilities if not setup correctly. MOS Note 359515.1 can also be referenced.
SQL> create tablespace nfs_test_on_nfs datafile '/shared/dscbbg02/users/user/nfs_test/nfs_test_on_nfs.dbf' size 16M;
create tablespace nfs_test_on_nfs datafile '/shared/dscbbg02/users/user/nfs_test/nfs_test_on_nfs.dbf' size 16M
ERROR at line 1:
ORA-01119: error in creating database file
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
Elapsed: 00:00:30.08
SQL> create tablespace nfs_test datafile '+D/user/datafile/nfs_test.dbf' size 16M;
Tablespace created.
SQL> create table nfs_test(n not null) tablespace nfs_test as select rownum from dual connect by rownum < 1e5 + 1;
Table created.
SQL> alter tablespace nfs_test read only;
Tablespace altered.
SQL> create directory nfs_test as '/shared/dscbbg02/users/user/nfs_test';
Directory created.
SQL> create table nfs_test_x organization external(type oracle_datapump default directory nfs_test location('nfs_test.dp')) as select * from nfs_test;
create table nfs_test_x organization external(type oracle_datapump default directory nfs_test location('nfs_test.dp')) as select * from nfs_test
ERROR at line 1:
ORA-29913: error in executing ODCIEXTTABLEPOPULATE callout
ORA-31641: unable to create dump file
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
Elapsed: 00:00:31.17
$ expdp userid=scott/tiger parfile=nfs_test.par
Export: Release - Production on Wed Jun 2 10:44:51 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
Connected to: Oracle Database 11g Enterprise Edition Release - 64bit Production
With the Partitioning, Real Application Clusters, Automatic Storage Management, OLAP,
Data Mining and Real Application Testing options
ORA-39001: invalid argument value
ORA-39000: bad dump file specification
ORA-31641: unable to create dump file "/shared/dscbbg02/users/user/nfs_test/nfs_test.dmp"
ORA-27086: unable to lock file - already in use
Linux-x86_64 Error: 37: No locks available
Additional information: 10
RMAN works:
$ rman target=/
Recovery Manager: Release - Production on Wed Jun 2 10:46:40 2010
Copyright (c) 1982, 2009, Oracle and/or its affiliates. All rights reserved.
connected to target database: USER (DBID=3710096878)
RMAN> backup as copy datafile '+D/user/datafile/nfs_test.dbf' format '/shared/dscbbg02/users/user/nfs_test/nfs_test.dbf';
Starting backup at 20100602104700
using target database control file instead of recovery catalog
allocated channel: ORA_DISK_1
channel ORA_DISK_1: SID=204 device type=DISK
channel ORA_DISK_1: starting datafile copy
input datafile file number=00007 name=+D/user/datafile/nfs_test.dbf
output file name=/shared/dscbbg02/users/user/nfs_test/nfs_test.dbf tag=TAG<a target="_blank"
channel ORA_DISK_1: datafile copy complete, elapsed time: 00:00:01
Finished backup at 20100602104702
The solution is to ensure that the nfslock service (aka rpc.statd) is running:
# service nfslock status
rpc.statd (pid 10795) is running... Of course youâ€Â™d want to enable the service via chkconfig too.

Be Careful when Combining the InfiniBand Network across Clusters and Database Machines

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x +11.2.x +
If you want multiple database machines to run as separate environments yet be connected through the InfiniBand network, please be aware of the following items especially when the database machines
were deployed as separate environments.
The cell name, cell disk name, grid disk name, ASM diskgroup name, and ASM failgroup name should be unique to help avoid accidental damage during maintenance operations. For example do not have
diskgroup DATA on both database machines, call them DATA_DM01 and DATA_DM02.

IP Addresses

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x +11.2.x +
All nodes on the InfiniBand network must have a unique IP address. When an Oracle Database Machine is deployed, the default InfiniBand network is 192.168.10.x and we start with
If you used the default IP address on each Database Machine, you will have duplicate IP addresses. You must modify the IP addresses on one of the machines before re-configuring the InfiniBand Network.
Ensure any additional equipment ordered from Oracle is marked for an Oracle Exadata Database Machine and the hardware engineer is using the correct Multi-rack Cabling when the physical InfiniBand network is modified.
After the hardware engineer has modified the network, ensure that network is working correctly by running verify topology and infinicheck. Infinicheck will create load on the system and should not be run when
there is active workload on the system. Note: Infinicheck will need an input file of all IP addresses on the network.
I.E. Create a temporary file in /tmp that contains all cells for both database machines. Pass this file to the inifnicheck command using the -c option. Also pass the -b option
#cd /opt/oracle.SupportTools/ibdiagtools
#./verify-topology -t fattree
#./infinicheck -c /tmp/combined_cellip.ora -b


PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux,Solaris11.2.x +11.2.x +
The cellip.ora file in each database node of each cluster should only reference cells in use by that respective cluster.

Set fast_start_mttr_target=300 to optimize run time performance of writes

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8Linux11.2.x +11.2.x +
The deployment default for fast_start_mttr_target as of 12/22/2010 is 60. To optimize run time performance for write/redo generation intensive workloads, increase fast_start_mttr_target to 300.
This will reduce checkpoint writes from DBWR processes, making more room for LGWR IO. The trade-off is that instance recovery will run longer, so if instance recovery is more important than performance,
then keep fast_start_mttr_target low. Also keep in mind that an application with inadequately sized redo logs will likely not see an affect from this change due to frequent log switches.
Considerations for a direct writes in a data warehouse type of application: Even though direct operations aren't using the buffer cache, fast_start_mttr_target is very effective at controlling crash recovery time because
it ensures adequate checkpointing for the few buffers that are resident (ex: undo segment headers). fast_start_mttr_target should be set to the desired RTO (Recovery Time Objective) while still maintaining performance SLAs.

Enable auditd on database servers

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X4-2Linux11.2.x +11.2.x +
On database servers, when auditing is configured, as is done automatically by applying convenience pack or higher, the audit records are logged in /var/log/messages if the auditd service is not running.
By logging these messages to /var/log/messages, it may cause more frequent rotation of the messages file which may result in losing historical data more quickly than necessary or desired. By enabling auditd, audit records
are sent to /var/log/audit/audit.log which is rotated and managed separately using settings in /etc/audit/audit.conf.

The best practice is to run the auditd service whenever auditing is configured during kernel bootup by setting audit=1 on the kernel line in /boot/grub/grub.conf, as shown here:

title Trying_LABEL_DBSYS
root (hd0,0)
kernel /vmlinuz-2.6.18- root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb audit=1 numa=off console=ttyS0,115200n8 console=tty1 crashkernel=128M@16M
initrd /initrd-2.6.18-

To configure auditd to be enabled, run the following commands as root on each database server:
chkconfig auditd on
chkconfig --list auditd
auditd 0:off 1:off 2:on 3:on 4:on 5:on 6:off
service auditd start
service auditd status
auditd (pid 32582) is running...

Verify AUD$ and FGA_LOG$ tables use Automatic Segment Space Management

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical02/27/2012X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact:
With AUDIT_TRAIL set for database (AUDIT_TRAIL=db), and the AUD$ and FGA_LOG$ tables located in a dictionary segment space managed SYSTEM tablespace, "gc" wait events are sometimes observed
during heavy periods of database logon activity. Testing has shown that under such conditions, placing the AUD$ and FGA_LOG$ tables in the SYSAUX tablespace, which uses automatic segment space management,
reduces the space related wait events.
The impact of verifying that the AUD$ and FGA_LOG$ tables are in the SYSAUX table space is low. Moving them if they are not located in the SYSAUX does not require an outage, but should be done during a
scheduled maintenance period or slow audit record generation window.
If AUD$ and FGA_LOG$ tables are not verifed to use automatic segment space management, there is a risk of a performance slowdown during periods of high database login activity.
Action / Repair:
To verify the segment space management policy currently in use by the AUD$ and FGA_LOG$ tables, use the following Sqlplus command:

select t.table_name,ts.segment_space_management from dba_tables t, dba_tablespaces ts where ts.tablespace_name = t.tablespace_name and t.table_name in ('AUD$','FGA_LOG$');

The output should be:
------------------------------ ------
If one or both of the AUD$ or FGA_LOG$ tables return "MANUAL", use the DBMS_AUDIT_MGMT package to move them to the SYSAUX tablespace:
DBMS_AUDIT_MGMT.set_audit_trail_location(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_AUD_STD,--this moves table AUD$ audit_trail_location_value => 'SYSAUX'); END; /
DBMS_AUDIT_MGMT.set_audit_trail_location(audit_trail_type => DBMS_AUDIT_MGMT.AUDIT_TRAIL_FGA_STD,--this moves table FGA_LOG$ audit_trail_location_value => 'SYSAUX');
The output should be similar to:
PL/SQL procedure successfully completed. 
If the output is not as above, investigate and correct the condition.
NOTE: This "DBMS_AUDIT_MGMT.set_audit_trail" command should be executed as part of the dbca template post processing scripts, but for existing databases, the command can be executed,
but since it moves the AUD$ & FGA_LOG$ tables using "alter table ... move" command, it should be executed at a "quiet" time

Use dbca templates provided for current best practices

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact:
Starting with onecommand v, dbca templates with built in best practices are provided at deployment time for OLTP, DW/BI, and DBFS.
The database created at deployment time uses one of these templates. If other databases are created, the templates should be used to ensure
current database configuration best practices are implemented. If custom scripts are used to create databases, the templates can be used as a reference for those customer scripts.
Not adhering to best practices can lead to unnecessary outages and performance problems
Action / Repair:
Run health check to assess diffs with current best practices. Check configuration assistant logs for template use.

Updating database node OEL packages to match the cell

MOS Note 1284070.1 provides a working example of updating the db host OEL packages to match those on the cell.

Disable cell level flash caching for grid disks that don't need it when using Write Back Flash Cache

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
n/aAugust 2012X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.3.2+11.2.x +
Benefit / Impact
When using Write Back Flash Cache, disabling caching for grid disks that don't need it frees up cache space for more important objects.
The classic use-case for this is grid disks in the RECO diskgroup. Note that Exadata already has intelligence to not cache objects that
don't need it, but this extends that to the grid disk level in a Write Back Flash Cache configuration.
Cache pollution (less caching benefit) leading to performance impact.
Action / Repair:
The following cellcli command displays the cell caching mode. It should be "WriteBack" for this best practice.
list cell attributes flashCacheMode
The following cellcli command displays the caching mode for all grid disks on a cell. A cachingPolicy of "none" indicates caching is turned off for that particular grid disk.
list griddisk attributes name,cachingPolicy
To disable caching for a particular griddisk, first flush the cache data for that grid disk, and then set the cachedPolicy attribute to "none" as illustrated in the cellcli commands below
alter griddisk <grid disk name> flush
alter griddisk <grid disk name> cachingPolicy="none"
If caching needs to be enabled again after these steps, first cancel the prior flush, and then set the caching Policy attribute back to "default" as illustrated in the cellcli commands below
alter griddisk <grid disk name> cancel flush
alter griddisk <grid disk name> cachingPolicy="default"

Gather system statistics in Exadata mode if needed

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
n/aAuguest 2012X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x + BP18 and BP8
Benefit / Impact
Gathering Exadata specific system statistics ensure the optimizer is aware of Exadata scan speed. Accurately accounting for the speed of scan operations will ensure the Optimizer chooses an optimal execution plan in a Exadata environment. The following command gathers Exadata specific system statistics
exec dbms_stats.gather_system_stats('EXADATA');
Note this best practice is not a general recommendation to gather system statistics in Exadata mode for all Exadata environments. For existing customers
who have acceptable performance with their current execution plans, do not gather system statistics in Exadata mode. For existing customers whose cardinality
estimates are accurate, but suffer from the optimizer over estimating the cost of a full table scan where the full scan performs better, then gather system
statistics in Exadata mode. For new applications where the impact can be assessed from the beginning, and dealt with easily if there is a problem, gather system statistics in Exadata mode.
Lack of Exadata specific stats can lead to less performant optimizer plans.
Action / Repair:
To see if Exadata specific optimizer stats have been gathered, run the following query on a system with at least BP18 or BP8 Oracle software. If PVAL1 returns null or is not set, Exadata specific stats have not been gathered.
select pname, PVAL1 from aux_stats$ where pname='MBRC';

Verify Hidden Database Initialization Parameter Usage

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s) 
Critical FAIL 08/01/18 <Name>Production Exadata - Physical,
Exadata - User Domain 
26638705 - exachk
28321838 - exachk
26638705 - exachk
26136659 - exachk
 25143408 - exachk 
DB VersionDB Role Engineered System PlatformExadata VersionOS & Version Validation Tool Version TBD
ALL 11.2.3.+ Linux x86-64 exachk, 18.3.0  

Benefit / Impact
Hidden database initialization parameters are typically set as a workaround to solve a specific problem, and should be removed once a system has been upgraded to a version level that contains the fix for the specific problem. Often they are not removed during the upgrade process to the version level that contains the correct fix. Verifying the hidden database initialization parameter usage helps avoid hidden parameters being used any longer than necessary.
Use of hidden ASM or database initialization parameters not recommended by Oracle development in an Exadata environment can cause instability, performance problems, corruptions, and crashes.
Action / Repair:
To verify the hidden database initialization parameter usage in each ASM and database instance, execute the following sqlplus command as the owner of the respective home with the environment properly set to access the instance:
select name,value from v$parameter where substr(name,1,1)='_';
NOTE: v$parameter only contains hidden parameters that have been changed from the default, which are the ones of interest here.
The expected output should be a list of any hidden parameters in use that have been changed from the default value, similar to:
_enable_NUMA_support  FALSE
There should be no hidden parameters in use that are not shown in the "Generally Acceptable Hidden Parameters Table":
Generally Acceptable Hidden Parameters Table
_file_size_increase_increment 2143289344 <= BP11 ALL Database Enables more performant rman backup allocation sizes. 
_enable_NUMA_support Set _enable_NUMA_support=TRUE for all hardware generation 8-socket database servers (Note - applies to non-OVM only - OVM is not supported on 8-socket servers).

Set _enable_NUMA_support=TRUE for X5 and later 2-socket database servers deployed as non-OVM.

In all other cases do not explicitly set _enable_NUMA_support.
< ALL Database For any Exadata system using Database or higher, do not explicitly set _enable_NUMA_support (includes all hardware generations, 2-socket, 8-socket, non-OVM, and OVM). _enable_NUMA_support setting is automatically configured by the database.

For any Exadata system using Database or lower, reference the recommended setting in the Value column of this row.
_asm_resyncckpt ONLY ALL ASM Turns off resync checkpointing 
_smm_auto_max_io_size 1024 12.1 and lower ALL Database This permits 1MB IOs for hash joins that spill to disk, which can increase performance up to 40% due to increased throughput. These performance increases can prevent the need to move TEMP to flash.

Internal only note: this will no longer be needed when bug 20925115 is fixed. 
_parallel_adaptive_max_users 12.1 and higher ALL Database Check to ensure not more than the recommended value. Setting this higher than this recommended value can deplete memory and impact performance.*

Parameter PARALLEL_MAX_SERVERS is evaluated based on the below calculation method:

Parameter PARALLEL_SERVERS_TARGET is evaluated based on the below calculation method:

_PARALLEL_ADAPTIVE_MAX_USERS provides the value of concurrent_parallel_users in the calculation. The value of this parameter is set to 4 in most cases which would result in a higher than recommended maximum number of parallel servers, therefore the recommended value is 2.

PARALLEL_MAX_SERVERS would be calculated as below assuming cpu_count is set to all available CPUs:
X2-2: 1 * 24 * 2 * 5 = 240
X6-2: 1 * 88 * 2 * 5 = 880
X2-8: 1 * 128 * 2 * 5 = 1280
X6-8: 1 * 288 * 2 * 5 = 2880 
_assm_segment_repair_bg FALSE 12.2 and higher ALL Database work-around for bug 23734075 
_asm_max_connected_clients Dynamically changes 12.2. and 18.1 ONLY ALL ASM Used internally; Removed in release 19c 
_backup_disk_bufcnt 64 12.1 and lowerALL Database Only when ZFS based backups are in use
_backup_disk_bufsz 1048576 12.1 and lowerALL Database Only when ZFS based backups are in use
_backup_file_bufcnt 64 12.1 and lowerALL Database Only when ZFS based backups are in use
_backup_file_bufsz 1048576 12.1 and lowerALL Database Only when ZFS based backups are in use

1) For additional ZFS based backup configuration information, please see: Oracle ZFS Storage: FAQ: Exadata RMAN Backup with The Oracle ZFS Storage Appliance (Doc ID 1354980.1)
2) This best practice check does not include any application specific hidden parameters. If an application in use requires hidden parameters that are failed by this best practice, refer to the proper documentation for the application version in use. If the extra hidden parameters are correct, then ignore the failures reported for those specific parameters.

For Oracle E-Business Suite, please see: Database Initialization Parameters for Oracle E-Business Suite Release 12 (Doc ID 396009.1)
For Siebel CRM Application, please see: Performance Tuning Guidelines for Siebel CRM Application on Oracle Database (Doc ID 2077227.2)

Verify BDB location for Cloned GI homes

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
n/aAugust 2012X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux, Solaris11.2.x +11.2.x +
Benefit / Impact
After cloning a Grid Home the $GI_HOME/crf/admin/crf<node>.ora configuration file in the new home has the BDB location still pinpointing the GI home where it is cloned from.
GI Upgrade to 11203 from 11201 and 11202 can fail
Error messages in $GRID_HOME/log/crflogd/crflogdOUT.log logfile
Action / Repair:
Manually edit $GI_HOME/crf/admin/crf<node>.ora in the cloned Grid Infrastructure Home and change the values for BDBLOC and CRFHOME.
This same change needs to be done on all nodes in the cluster to the file referenced above if it exists. Reference: 1485970.1 / 14168708

Verify Shared Servers do not perform serial full table scans

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
WarnSeptember 2012X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x +11.2.x +

Benefit / Impact
As an Oracle kernel design decision, shared servers are intended to perform quick transactions and therefore do not issue serial (non PQ) direct reads. Consequently, shared servers do not perform serial (non PQ) Exadata smart scans.
The impact of verifying that shared servers are not doing serial full table scans is minimal. Modifying the shared server environment to avoid shared server serial full table scans varies by configuration and application behavior, so the impact cannot be estimated here.
Shared servers doing serial full table scans in an Exadata environment lead to a performance impact due to the loss of Exadata smart scans.
Action / Repair:
To verify shared servers are not in use, execute the following SQL query as the "oracle" userid:

SQL> select NAME,value from v$parameter where name='shared_servers';
The expected output is:
--------------- ------------------------------
shared_servers 0
If the output is not "0", use the following command as the "oracle" userid with properly defined environment variables and check the output for "SHARED" configurations:

$ORACLE_HOME/bin/lsnrctl service
If shared servers are confirmed to be present, check for serial full table scans performed by them. If shared servers performing serial full table
scans are found, the shared server environment and application behavior should be modified to favor the normal Oracle foreground processes so that
serial direct reads and Exadata smart scans can be used.

Verify Write Back Flash Cache minimum version requirements

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL02/06/13<Name> DevelopmentExadata, SSC16012455- exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD BP9+ASMX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
All eng systems with Exadata Storage - 11
Linux x86-64
exachk 2.2.1 
Benefit / Impact
Oracle Write Back Flash Cache requires Oracle version Bundle Patch 9 (BP9) in the Grid Infrastructure ORACLE_HOME or higher and Exadata version or higher.
Oracle BP9 or higher in the Grid Infrastructure ORACLE_HOME enables the resilvering feature, which drastically reduces the time required to restore redundancy after a flash disk failure (FDOM) failure.
Exadata software has critical optimizations and fixes (e.g. fix for bug 16232581) to fully take advantage of Exadata Write Back Flash Cache.
Without BP9 in the Grid Infrastructure ORACLE_HOME, disks cached by the failed DOM will be dropped and added which significantly extends the repair time.
Without the fixes in Exadata cell, IO errors and possible data corruptions may appear for very large IO intensive workloads when using Write Back Flash Cache.
Action / Repair:
To check if Write Back Flash Cache is in use, run the following cellcli command on all storage servers and check for 'WriteBack'
CellCLI> list cell attributes flashCacheMode WriteBack 
To check the Grid Infrastructure ORACLE_HOME for BP9 or above, run the following command from the Grid Infrastructure ORACLE_HOME as the oracle userid:
$ $ORACLE_HOME/OPatch/opatch lspatches
The output should be similar to:
14307915;DISKMON PATCH FOR EXADATA (NOV 2012 - : (14307915) 
14275572;CRS PATCH FOR EXADATA (NOV 2012 - : (14275572) 
14662263;DATABASE PATCH FOR EXADATA (NOV 2012 - : (14662263)
In this case, patch 14275572 is applied, which is BP12, and therefore the proper fixes are in place.
If the Oracle version is less than BP9, upgrade to BP9 or higher.
To check the Exadata software version, execute the following command as the root userid on all storage servers:
imageinfo -version
The output should be similar to:
If the Exadata software version is less than, upgrade to or higher.

Verify bundle patch version installed matches bundle patch version registered in database

DB VersionAlert LevelDateOwnerStatusScope
CriticalFAIL11/04/15<Name>ProductionExadata, Exalogic, SSC
DB VersionDB RoleEngineered SystemExadata VersionOS & VerionValidaton Tool Version
>=, X2-2, X2-8, X3-2, X3-8, X4-2, X5-2, X5-811.2.x +Linux, Solarisexachk
Benefit / Impact:
Crosschecking the software bundle patch version installed with the bundle patch registered in the database to make sure they match ensures software correctness and stability. If a bundle patch is being installed in a Data Guard configuration in a standby-first manner where the SQL portion of the bundle patch is not installed inside the database until the primary and all standby software homes have the same version installed, then this crosscheck is expected to fail until both the binary and SQL portion of the bundle patch application is fully installed.
Incomplete bug fixes, software instability, and unexpected behavior
Action / Repair:
To verify that the bundle patch version installed matches bundle patch version registered in database, as the oracle home owner for the primary database, and with ORACLE_SID and ORACLE_HOME properly set, execute the following command:
opatch_bp=$($ORACLE_HOME/OPatch/opatch lspatches 2>/dev/null|grep -iwv javavm|grep -wi database|head -1|awk -F';' '{print $1}');
database_bp_status=$(echo -e "set heading off feedback off timing off \n select ACTION, STATUS from (select * from dba_registry_sqlpatch where PATCH_ID = $opatch_bp order by action_time desc) where rownum=1;"|$ORACLE_HOME/bin/sqlplus -s " / as sysdba" | sed -e '/^ *$/d');
database_bp_status='echo $database_bp_status';
if [ "$database_bp_status" == "APPLY SUCCESS" ];
echo "SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully.";
echo "FAILURE: Bundle patch installed in the database does not match the software home, or is installed with errors.";
The output should be similar to:
SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully.
If FAILURE is reported, then investigate and correct the discrepancy.

NOTE: For versions less than, please see this archived best practice:
Verify bundle patch version installed matches bundle patch version registered in database (ARCHIVE)

Verify database server file systems have "Maximum mount count" = "-1"

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFAIL03/16/16 <Name>ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, EIGHTH, X4-2, X4-8, X5-2, X5-811. x86-64exacheck 

Benefit / Impact:
A filesystem will be checked for consistency (fsck) after the number of times it is mounted exceeds the "Maximum mount count" setting, typically at reboot time. On a database server, the "Maximum mount count" is set to "-1" by default.
Verifying that the database server file systems all have "Maximum mount count" set to "-1" helps to avoid an unexpectedly long reboot sequence as an fsck of the file system completes. The Impact of verifying the database server file systems "Maximum mount count" is minimal. The impact of changing the "Maximum mount count" value is minimal as it can be changed dynamically.

A database server reboot may take an unexpectedly long time as an fsck operation completes, potentially extending an outage or maintenance window.
Action / Repair:
To verify the database server disk devices maximum mount count configuration, execute the following command as the "root" userid on all database servers:
LVM_IN_USE=$(parted -ls 2>/dev/null | egrep -i lvm | wc -l);
if [ $LVM_IN_USE -ge 1 ]
if test -f /proc/xen/capabilities && grep -q "control_d" /proc/xen/capabilities
FS_COMMAND=tune4fs # dom0 case
FS_COMMAND=tune2fs # physical, domU case
LOGICAL_VOLUME_ARRAY=$(lvscan | cut -d"'" -f2);
if [ 'file -sL $INDIVIDUAL_LOGICAL_VOLUME | egrep -wc "ext3|ext4" 2> /dev/null' -eq 1 ]
if [ "'$FS_COMMAND -l $INDIVIDUAL_LOGICAL_VOLUME | egrep "^Maximum mount" | cut -d ":" -f 2 | sed -e 's/^[ \t]*//''" -ne "-1" ]
if [ "$MNT_CNT_CHK_RSLT" -eq "0" ]
echo -e "\nSUCCESS: All database server logical volumes found with filesystems had \"Maximum mount count\" equal to -1";
echo -e "\nFAILURE: One or more database server logical volumes found with filesystems had \"Maximum mount count\" not equal to -1";
echo "$INDIVIDUAL_LOGICAL_VOLUME: '$FS_COMMAND -l $INDIVIDUAL_LOGICAL_VOLUME | egrep \"^Maximum mount\" | cut -d ":" -f 2 | sed -e 's/^[ \t]*//''";
export SWAP_DEVICE='swapon -s | grep -v Filename | cut -d" " -f1'
export PARTITIONED_DEVICE_ARRAY='fdisk -l 2>/dev/null | egrep ^/dev | egrep -v $SWAP_DEVICE | cut -d" " -f1';
export MNT_CNT_CHK_RSLT=0;
if [ "'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | egrep "^Maximum mount" | cut -d ":" -f 2 | sed -e 's/^[ \t]*//''" -ne "-1" ]
if [ "$MNT_CNT_CHK_RSLT" -eq "0" ]
echo -e "\nSUCCESS: All database server partitioned devices (other than swap) found had \"Maximum mount count\" equal to -1";
echo "$INDIVIDUAL_PARTITIONED_DEVICE: 'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | egrep \"^Maximum mount\" | cut -d ":" -f 2 | sed -e 's/^[ \t]*//''";
echo -e "\nFAILURE: One or more database partitioned devices (other than swap) found had \"Maximum mount count\" not equal to -1";
echo "$INDIVIDUAL_PARTITIONED_DEVICE: 'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | egrep \"^Maximum mount\" | cut -d ":" -f 2 | sed -e 's/^[ \t]*//''";
The output should be similar to:
SUCCESS: All database server logical volumes found (other than swap) and the boot device had "Maximum mount count" equal to -1
Boot Device /dev/sda1: -1
/dev/VGExaDb/LVDbSys1: -1
/dev/VGExaDb/LVDbOra1: -1
/dev/VGExaDb/LVDbSys2: -1
- OR -
SUCCESS: All database server partitioned devices (other than swap) found had "Maximum mount count" equal to -1
/dev/sda1: -1
/dev/sda3: -1
If the output is not as expected, you can change the "Maximum mount count" value as the "root" userid using the appropriate command for your environment ("tune2fs" or "tune4fs") on the database server for either partitioned or logical volume devices. Only the device name portion of the command differs. For example, if the appropriate command for your environment is "tune2fs":
# tune2fs -c -1 /dev/mapper/VGExaDb-LVDbOra1
tune2fs 1.39 (29-May-2006)
Setting maximal mount count to -1
NOTE: fsck should be periodically executed as part of the regular maintenance schedule for an Oracle Exadata Database Machine, where the timing is controlled by the customer. This check only verifies that the timing of the run should be controlled and not unexpected.

NOTE: In Exadata versions,, and, the database server may reset "Maximum mount count" to 27 and "Check interval" to 15552000 for some devices upon reboot. This is due to a change introduced in bug 14223777. The recommended fix is to upgrade to or higher.

Verify database server file systems have "Check interval" = "0"

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFAIL03/16/16 <Name>ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, EIGHTH, X4-2, X4-8, X5-2, X5-811. x86-64exachk 

Benefit / Impact:
A filesystem will be checked for consistency (fsck) after the elapsed time from the last fsck run exceeds the "Check interval" setting, typically at reboot time. On a database server, the "Check interval" is set to "0" by default.
Verifying that the database server filesystems all have the "Check interval" set to "0" helps to avoid an unexpectedly long reboot sequence as an fsck of the file system completes. The Impact of verifying the database server file system "Check interval" is minimal. The impact of changing the file system "Check interval" value is minimal as it can be changed dynamically.

A database server reboot may take an unexpectedly long time as an fsck operation completes, potentially extending an outage or maintenance window.
Action / Repair:
To verify the database server disk devices check interval configuration, execute the following command as the "root" userid on all database servers:
LVM_IN_USE=$(parted -ls 2>/dev/null | egrep -i lvm | wc -l);
if [ $LVM_IN_USE -ge 1 ]
if test -f /proc/xen/capabilities && grep -q "control_d" /proc/xen/capabilities
FS_COMMAND=tune4fs # dom0 case
FS_COMMAND=tune2fs # physical, domU case
LOGICAL_VOLUME_ARRAY=$(lvscan | cut -d"'" -f2);
if [ 'file -sL $INDIVIDUAL_LOGICAL_VOLUME | egrep -wc "ext3|ext4" 2> /dev/null' -eq 1 ]
if [ "'$FS_COMMAND -l $INDIVIDUAL_LOGICAL_VOLUME | grep "Check interval:"|awk '{print $3}''" -ne "0" ]
if [ "$LVM_CHECK_INTERVAL_RSLT" -eq "0" ]
echo -e "\nSUCCESS: All database server logical volumes found with filesystems had \"Check interval\" equal to 0";
echo -e "\nFAILURE: One or more database server logical volumes found with filesystems had \"Check interval\" not equal to 0";
echo "$INDIVIDUAL_LOGICAL_VOLUME: '$FS_COMMAND -l $INDIVIDUAL_LOGICAL_VOLUME | grep "Check interval:"|awk '{print $3}''";
export SWAP_DEVICE='swapon -s | grep -v Filename | cut -d" " -f1'
export PARTITIONED_DEVICE_ARRAY='fdisk -l 2>/dev/null | egrep ^/dev | egrep -v $SWAP_DEVICE | cut -d" " -f1';
if [ "'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | grep "Check interval:"|awk '{print $3}''" -ne "0" ]
if [ "$PRTN_CHECK_INTERVAL_RSLT" -eq "0" ]
echo -e "\nSUCCESS: All database server partitioned devices (other than swap) found had \"Check interval\" equal to 0";
echo "$INDIVIDUAL_PARTITIONED_DEVICE: 'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | grep "Check interval:"|awk '{print $3}''";
echo -e "\nFAILURE: One or more database partitioned devices (other than swap) found had \"Check interval\" not equal to 0";
echo "$INDIVIDUAL_PARTITIONED_DEVICE: 'tune2fs -l $INDIVIDUAL_PARTITIONED_DEVICE | grep "Check interval:"|awk '{print $3}''";
The output should be similar to:
SUCCESS: All database server disk devices found (other than swap) and the boot device had "Check interval" equal to 0
Boot Device /dev/sda1: 0
/dev/VGExaDb/LVDbSys1: 0
/dev/VGExaDb/LVDbOra1: 0
/dev/VGExaDb/LVDbSys2: 0
- OR -
SUCCESS: All database server partitioned devices (other than swap) found had "Check interval" equal to 0
/dev/cciss/c0d0p1: 0
/dev/cciss/c0d0p3: 0
If the output is not as expected, you can change the "Check interval" value as the "root" userid using the appropriate command for your environment ("tune2fs" or "tune4fs") on the database server for either partitioned or logical volume devices. Only the device name portion of the command differs. For example, if the appropriate command for your environment is "tune2fs":
# tune2fs -i 0 /dev/VGExaDb/LVDbOra1
tune2fs 1.39 (29-May-2006)
Setting interval between checks to 0 seconds
NOTE: fsck should be periodically executed as part of the regular maintenance schedule for an Oracle Exadata Database Machine, where the timing is controlled by the customer. This check only verifies that the timing of the run should be controlled and not unexpected.

NOTE: In Exadata versions,, and, the database server may reset "Maximum mount count" to 27 and "Check interval" to 15552000 for some devices upon reboot. This is due to a change introduced in bug 14223777. The recommended fix is to upgrade to or higher.

Verify Automated Service Request (ASR) configuration

PriorityAlert LevelDateOwnerStatusScope
CriticalFAIL11/11/12 <Name>DevelopmentExadata, SSC, Exalogic
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4- - 11
Linux x86-64
exachk 2.1.6
Benefit / Impact:
Verifying the Automated Service Request (ASR) is necessary to ensure that an Oracle Exadata Database Machine can automatically open an Oracle support Service Request when a qualifying condition is detected.
The Impact of verifying the ASR configuration is minimal. The impact of correcting deficiencies found varies by the corrective action required, and cannot be estimated here.
Risk: If the ASR configuration is not correct, service requests will not be correctly opened automatically when a qualifying condition is detected, leading to delays in correcting the qualifying condition.
Action / Repair:
There are two methods to verify that the ASR configuration is correct:
1) Read and follow the instructions in My Oracle Support Doc ID 1450112.1, which provides the asrexacheck script to verify the ASR configuration.
2) Download and execute the latest exachk from My Oracle Support Doc ID 1070954.1, which includes the asrexacheck script.
Refer to the output of the asrexacheck script, or the "Systemwide Automatic Service request (ASR) healthcheck" section of the exachk HTML report, for findings and corrective actions.

Verify ZFS File System User and Group Quotas are configured

PriorityAlert LevelDateOwnerStatusScope
CriticalWARN3/1/2013<Name>ReviewExadata, SSC
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool Version
N/AN/AX2-2(4170), X2-2, X3-2, X4- +Solaris - 11exachk 2.2.0
Benefit / Impact:
Filesystem quotas enable control of filesystem space to users and groups. Especially on systems where the grid infrastructure and RDBMS software are managed through separate OS users, restrictions on space consumption are helpful to ensure that system stability and application availability are maximized.
Without quotas, filesystems can fill up and application availability can be impacted. When quotas are used, soft limits enable warnings when the quota limits approach and hard limits keep the filesystem from filling to ensure that the system remains stable.
Action / Repair:
To verify ZFS file system user and group quotas are configured, as the "root" userid on all storage servers, perform the following commands:
# zfs get userquota@oracle data/u01 NAME PROPERTY VALUE SOURCE data/u01 userquota@oracle none local # zfs get groupquota@oinstall data/u01 NAME PROPERTY VALUE SOURCE data/u01 groupquota@oinstall none local 
NOTE: a value of "none" means quotas have not yet been created.

NOTE: This procedure only applies to Solaris database servers in Exadata database machine. No changes are permitted on Exadata storage cells. For instructions on how to implement ZFS quotas on Exadata, please refer to Chapter 7 of the Database Machine Owners Guide - "Resetting the Quota of a ZFS Storage Pool File System"

Verify the file /.updfrm_exact does not exist

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL04/02/2014<Name>ProductionExadata, SSC, Exalogic18746642- exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2AllAllexachk 2.2.5 
Benefit / Impact:
To workaround a firmware patching issue for an earlier Exadata release, the file /.updfrm_exact had to be manually created. This file should only be temporarily created during patching at the direction of Oracle Support, and should be removed immediately after patching is complete.
The impact of verifying the existance of the file /.updfrm_exact and removing it is minimal.
If /.updfrm_exact exists, a manual firmware upgrade may be inadvertantly rolled back when the server is next rebooted.
Action / Repair:
To verify that the file /.updfrm_exact does not exist, as the root userid on all database and storage servers, execute the following command:
bash -c '[ -f /.updfrm_exact ] && echo "FAIL: /.updfrm_exact exists"'
The output should be empty.
If the output is similar to the following:
randomdb01: FAIL: /.updfrm_exact exists
then remove the file /.updfrm_exact with the following command executed as the root userid:
 rm -f /.updfrm_exact

Verify the vm.min_free_kbytes configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL04/10/19<Name>ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain, RA
ALL29604454 - exachk
27679610 - exachk
26308040 - exachk,
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 19.3.0N/A
Benefit / Impact:
Maintaining vm.min_free_kbytes as recommended helps a Linux system to reclaim memory faster. For a database server with 1 NUMA node, the minimum value is 512KB. For database servers with more than 1 NUMA node, the minimum value is the_number_of_NUMA_nodes multiplied by 512KB.
The impact of verifying the vm.min_free_kbytes configuration is minimal. The impact of adjusting vm.min_free_kbytes should include a reboot to verify the configuration is correctly configured and retained during the boot cycle.
NOTE: It is possible, but NOT recommended, especially for a system already under memory pressure, to modify the setting interactively.
Exposure to unexpected node eviction and reboot.
Action / Repair:
To verify the vm.min_free_kbytes configuration, as the "root" userid on each database server, execute the following command set:
MIN_FREE_KBYTES_SYSCTL=$(egrep ^vm.min_free_kbytes /etc/sysctl.conf | awk '{print $3}');
MIN_FREE_KBYTES_MEMORY=$(cat /proc/sys/vm/min_free_kbytes);
RAW_NUMA_DATA=$(numactl -s | egrep ^cpubind | awk '{$1=$1;print}')
FIELD=$(expr $(echo "$RAW_NUMA_DATA" | tr -cd ' ' | wc -c) + 1)
NUMA_NODE_COUNT=$(expr $(echo "$RAW_NUMA_DATA" | cut -d " " -f$FIELD) + 1)
if [[ $NUMA_NODE_COUNT = 1 ]]
  MINIMUM_SIZE=$(expr $NUMA_NODE_COUNT '*' 524288)
echo -e "NUMA node count:   $NUMA_NODE_COUNT";
echo -e "minimum size:      $MINIMUM_SIZE";
echo -e "in sysctl.conf:    $MIN_FREE_KBYTES_SYSCTL";
echo -e "in active memory:  $MIN_FREE_KBYTES_MEMORY";
  echo -e "\nSUCCESS: vm.min_free_kbytes is set as recommended:\n$DETAIL";
  echo -e "\nFAILURE: vm.min_free_kbytes is not set as recommended:\n$DETAIL";
The output should be similar to:
SUCCESS: vm.min_free_kbytes is set as recommended:
NUMA node count:   8
minimum size:      4194304
in sysctl.conf:    4194304
in active memory:  4194304
-- OR --
SUCCESS: vm.min_free_kbytes is set as recommended:
NUMA node count:   2
minimum size:      1048576
in sysctl.conf:    1048576
in active memory:  1048576
-- OR --
SUCCESS: vm.min_free_kbytes is set as recommended:
NUMA node count:   1
minimum size:      524288
in sysctl.conf:    524288
in active memory:  524288
Example of a "FAILURE" result:
FAILURE: vm.min_free_kbytes is not set as recommended:
NUMA node count:   8
minimum size:      4194304
in sysctl.conf:    1048576
in active memory:  2097152
NOTE: In the above "FAILURE" example, it appears the sysctl.conf file setting is too low, and then the active kernel setting was expanded but still too low, and neither is close to the recommended minimum value.
If the output is a "FAILURE" result, investigate and take corrective action. Corrective action should include setting the minimum recommended vm.min_free_kbytes value for the given NUMA configuration in sysctl.conf and reboot the database server.

Validate key sysctl.conf parameters on database servers

PriorityAlert LevelDateOwnerStatusScopeBug(s)
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2AllLinux  
Benefit / Impact:
Kernel parameter settings in /etc/sysctl.conf are applied to the kernel automatically at boot time and manually via the sysctl utility at runtime. The semantics of each kernel parameter are known only to the kernel, so the sysctl utility passes all values directly to the kernel with minimal processing and validation. Invalid values can be misinterpreted by the kernel, leading to unexpected results. For certain key parameters, such invalid values can have an immediate and critical impact on the system. Invalid values stored in /etc/sysctl.conf at boot time can prevent the system from booting, making it difficult to identify and correct the problem. Validating the format of some key parameters periodically or after changes to sysctl.conf can prevent unexpected outages due to human error.
Applying improperly formatted values to kernel parameters can render a system unusable.
Action / Repair: Run the command "awk -f check_sysctl.awk /etc/sysctl.conf" and correct any parameters reported to be formatted incorrectly. The contents of check_sysctl.awk are shown below:
# Notes:
# - The purpose of this script is to check certain kernel parameters in
# /etc/sysctl.conf that could prevent the server from booting if set
# incorrectly.
# - This script is only capable of checking the validity of the *syntax*
# of these parameters, but is not capable of assessing whether the
# values themselves are correct or optimal.
# - This script does not attempt to check all parameters in sysctl.conf.
# It only checks parameters which have been observed to cause severe
# impact on server stability.
# Revision history:
# 08-May-2014 - initial version
# 28-May-2014 - vm.nr_hugepages must be < 100% of physical memory
# 24-Jun-2014 - add corrective action guidance

 errcnt = 0

 if( !errcnt ) { print "All sysctl.conf formatting checks succeeded" }
 exit errcnt

function BEGIN_memtotal_bytes() {
 if( NR )
 exit -1

 cmd = "grep MemTotal /proc/meminfo"
 if( 1 != cmd | getline )
 close( cmd )
 exit -1
 else if( 3 != NF || $3 != "kB" )
 print "Unexpected /proc/meminfo format"
 exit -1
 close( cmd )
 memtotal_bytes = $2 * 1024

 cmd = "grep Hugepagesize /proc/meminfo"
 if( 1 != cmd | getline )
 hugepage_size = 2048 * 1024
 else if( 3 != NF || $3 != "kB" )
 print "Unexpected /proc/meminfo format"
 exit -1
 hugepage_size = $2 * 1024;
 close( cmd )

 memtotal_hugepages = memtotal_bytes / hugepage_size

# This function extracts the value portion of the setting with whitespace
# before and after trimmed, as sysctl does
function extract_value( localval ) {
 localval = gensub( /^[^=]*=[[:space:]]*/, "", 1 )
 localval = gensub( /[[:space:]]*$/, "", 1, localval)
 return localval;

# This function verifies that the specified value consists entirely of
# numeric digits 0-9
function check_decimal_int( v ) {
 if( v !~ /^[[:digit:]]*$/ ) { return 0 }
 return 1;

# Check for comments first and skip to the next line if found
/^[[:space:]]*[#;]/ {

/vm\.nr_hugepages/ {
 valstr = extract_value()
 if( !check_decimal_int(valstr) )
 print "Invalid hugepages line: '" $0 "'"
 print "ACTION: A valid hugepages line should look similar to the following example,"
 print " with no additional comments or other characters:"
 print ""
 print " vm.nr_hugepages = 10000"
 print ""

 # Add 0 to valstr to force it to numeric type. Otherwise
 # subsequent comparisons will use string comparisons,
 # which won't yield expected results
 valnum = 0 + valstr
 if( valnum >= memtotal_hugepages )
 print "Hugepages value '" valnum "' is larger than physical memory"
 print "ACTION: Reduce the hugepages value to something much less than the total size of"
 print " physical RAM in the server. For this server, a value of " memtotal_hugepages
 print " would consume all of physical RAM, and would prevent the server from"
 print " booting. Please refer to MOS Note 401749.1 for guidance on choosing"
 print " an appropriate value for this server."

Remove "fix_control=32" from dbfs mount options

PriorityAlert LevelDateOwnerStatusScope
CriticalNone5/2/2013<Name> Exadata
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool Version, X2-2, X2-8, X3-2, X3-8, SSC, X4-2AllLinux x86-64 UEK5.8, SPARC Solaris 11 
Benefit / Impact:
DBFS is designed to use an async statfs to handle the need of getting the filesystem info. Bug #13340960 added an extra mount option of "fix_control=32",
which allowed statfs to be done asynchronously due to a timeout issue. If patch 13340960 is already applied, it's recommended to remove "fix_control=32".
Bug 13340960 is fixed in BP5 and higher.
Changes the statfs behavior if mount option "fix_control=32" is not removed
Action / Repair:
1) Check on Exadata compute node(s) if DBFS is mounted with "fix_control=32";
On Linux:
 #ps -ef | grep -E 'dbfs_client' | grep -E 'fix_control'
 On Solaris:
# ps -ef | grep dbfs_client 
# pargs <pid> - from dbfs_client above

2) Check to see if  bug:1334096 is installed or BP5+ is applied to the RDBMS Oracle home:
$RDBMS/OPatch/opatch lspatches 
3) Check make sure you're using the latest script from note: Configuring DBFS on Oracle Database Machine [ID 1054431.1]

Set Linux kernel log buffer size to 1MB

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalWARN7/31/13<Name>  Exadata17250965
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2AllLinux  
Benefit / Impact:
Set the kernel command line parameter "log_buf_len=1m" in /boot/grub/grub.conf to increase the size of the kernel's internal log buffer. This will help ensure all messages from the kernel's boot sequence can be captured to /var/log/messages by syslogd/klogd.
This is primarily a concern only on larger servers like the Sun Server X2-8, where the large number of hardware components causes the kernel to produce a larger volume of messages than the internal log buffer can hold during the boot sequence.
The default size of the kernel's internal log buffer is not large enough to hold all messages from the entire boot sequence on some large hardware models.
Without this change, some messages from the kernel's boot sequence may be lost before they can be captured to /var/log/messages, which may make it difficult
 to diagnose some system issues.
Action / Repair:
Edit /boot/grub/grub.conf and add "log_buf_len=1m" (excluding quotes) to each kernel command line entry, as in the following example:
title Oracle Linux Server (2.6.32-400.21.1.el5uek)
root (hd0,0)
kernel /vmlinuz-2.6.32-400.21.1.el5uek root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb console=ttyS0,115200n8 console=tty1 crashkernel=512M bootfrom=BOOT audit=1 processor.max_cstate=1 log_buf_len=1m
initrd /initrd-2.6.32-400.21.1.el5uek.img 

Verify IP routing configuration on DB nodes

PriorityAlert LevelDateOwnerStatusEngineered System  Engineered System
CriticalWARN05/31/17<Name>ProductionRA, Exadata - Physical,
Exadata - Management Domain
ALL  Bug 26138002 - exachk
Related to: Bug 17723513
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool Version MAA Scorecard Section
N/A      N/AN/AN/AN/ALinuxexachk  N/A
Benefit / Impact:
The default IP routing configuration on Exadata database nodes has changed over time so that the latest configuration works well in all environments, but due to a kernel bug in kernels pre-2.6.31, the older configurations only worked in some cases. Since the configurations aren't changed during Exadata software upgrades, legacy configurations should be updated to avoid issues during future upgrades.
If the Linux routing configuration is not updated before the kernel is upgraded from pre-2.6.31 (Exadata pre- to Exadata software version or later, it is likely that routing/network issues will surface following the upgrade. The required changes (or potential changes) are outlined in MOS note 1306154.1.
Action / Repair:
To verify the routing configuration requires updating, execute the following as any userid on a database server:
cd /etc/sysconfig/network-scripts
. ./network-functions
# find all the interfaces besides loopback.  ignore aliases, alternative configurations, and editor backup files
interfaces=$(ls ifcfg* | grep -v -e ifcfg-ib -e ifcfg-bondib | LANG=C sed -e "$__sed_discard_ignored_files" -e '/\(ifcfg-lo$\|:\|ifcfg-.*-range\)/d' -e '/ifcfg-[A-Za-z0-9#\._-]\+$/ { s/^ifcfg-//g;s/[0-9]/ &/}' | LANG=C sort -k 1,1 -k 2n | LANG=C sed 's/ //')

for i in $interfaces
    unset SLAVE
    unset IPADDR
    unset NETWORK
    unset CNT
    unset NETMASK
    unset RNT
    unset IPV6ADDR

    . /etc/sysconfig/network-scripts/ifcfg-$i
    AGREE=`/bin/grep ^SLAVE= ifcfg-$i | /bin/cut -d= -f2`
  if [ [$AGREE] == [yes] ]
    then echo " NOTICE: Slave Interfaces ($i) do not have rule or route files"
# IPv4 check
      if [ -z $IPADDR ]
        then echo " NOTICE: $i is not configured for IPv4"
          if [ -z $NETWORK ]
            then NETWORK=`/bin/ipcalc $IPADDR $NETMASK -n | /bin/cut -d= -f2`
# check the rule file exists and has the two rules that apply (to and from)
          if [ ! -f rule-$i ]
            then echo "FAILURE: Need to create the rule configuration for rule-$i per 1306154.1"
              CNT=`/sbin/ip rule list | /bin/grep -e $NETWORK -e $IPADDR -e GATEWAY | wc -l`
              if [ $CNT -lt 2 ]
                then echo "FAILURE: Need to update rule configuration for rule-$i per 1306154.1"
                else echo "   PASS: rule-$i is configured with rules."
# check the route file exists and have the proper route
          if [ ! -f route-$i ]
            then echo "FAILURE: Need to create the route configuration for route-$i per 1306154.1"
              RNT=`/sbin/ip route list table all | /bin/grep $NETWORK | grep -v local | wc -l`
              if [ $RNT -lt 2 ]
                then echo "FAILURE: Need to update route configuration for route-$i per 1306154.1"
                else echo "   PASS: route-$i is configured with routes."
# IPv6 check
      if [ -z $IPV6ADDR ]
        then echo " NOTICE: $i is not configured for IPv6"
          if [ -z $NETWORK ]
              NETWORK=`echo $IPV6ADDR | /bin/cut -d: -f1,2,3,4` 
# check the rule file exists and has the two rules that apply (to and from)
              if [ ! -f rule6-$i ]
                then echo "FAILURE: Need to create the rule configuration for rule6-$i per 1306154.1"
                  CNT=`/sbin/ip -6 rule list | /bin/grep $NETWORK | wc -l`
                  if [ $CNT -lt 2 ]
                    then echo "FAILURE: Need to update rule configuration for rule6-$i per 1306154.1"
                    else echo "   PASS: rule6-$i is configured with rules."
# check the route file exists and have the proper route
              if [ ! -f route6-$i ]
                then echo "FAILURE: Need to create the route configuration for route6-$i per 1306154.1"
                  RNT=`/sbin/ip -6 route list table all | /bin/grep $NETWORK | grep -v local | grep table | wc -l`
                  if [ $RNT -lt 2 ]
                    then echo "FAILURE: Need to update route configuration for route6-$i per 1306154.1"
                    else echo "   PASS: route6-$i is configured with routes."

The expected result will be similar to:
   PASS: rule-bondeth0 is configured with rules.
   PASS: route-bondeth0 is configured with routes.
 NOTICE: bondeth0 is not configured for IPv6
   PASS: rule-eth0 is configured with rules.
   PASS: route-eth0 is configured with routes.
 NOTICE: eth0 is not configured for IPv6
 NOTICE: eth1 is not configured for IPv4
 NOTICE: eth1 is not configured for IPv6
 NOTICE: eth2 is not configured for IPv4
 NOTICE: eth2 is not configured for IPv6
 NOTICE: eth3 is not configured for IPv4
 NOTICE: eth3 is not configured for IPv6
 NOTICE: Slave Interfaces (eth4) do not have rule or route files
 NOTICE: Slave Interfaces (eth5) do not have rule or route files

Example of a "FAILURE" result:
   PASS: rule-bondeth0 is configured with rules.
FAILURE: Need to create the route configuration for route-bondeth0 per 1306154.1
 NOTICE: bondeth0 is not configured for IPv6
   PASS: rule-eth0 is configured with rules.
   PASS: route-eth0 is configured with routes.
 NOTICE: eth0 is not configured for IPv6
 NOTICE: eth1 is not configured for IPv4
 NOTICE: eth1 is not configured for IPv6
 NOTICE: eth2 is not configured for IPv4
 NOTICE: eth2 is not configured for IPv6
 NOTICE: eth3 is not configured for IPv4
 NOTICE: eth3 is not configured for IPv6
 NOTICE: Slave Interfaces (eth4) do not have rule or route files
 NOTICE: Slave Interfaces (eth5) do not have rule or route files

NOTE: If any "FAILURE:" results are returned, follow the guidance provided in the message.


PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalWARNING12/4/2013<Name>ProductionExadata, SSC, Exalogic17159324
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-811. - 11
Linux x86-64 UEK5.8
Benefit / Impact:
Setting this in DB Home will prevent a connection over SQL*Plus from timing out
If this is not set then the SQL*Net connection held by RMAN can timeout while the database is backed up over HTTP protocol.
Action / Repair:
To verify the parameter is set - look in ${ORACLE_HOME}/network/admin/sqlnet.ora
The output should be similar to

Verify there are no .fuse_hidden files under the dbfs mount

Alert Level
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
exachk TBD

Benefit / Impact:
Verifying the existence of .fuse_hidden files located under the dbfs mount point will positively identify a recommended bug fix. The impact of verifying the existance of these files is minimal.
This problem is specific to fuse on OEL5 (which is 2.7.4 based version).
When a file is opened under the dbfs mount and later removed whilst a process still holds the file descriptor, the fuse library may not unlink
correctly leaving .fuse_hidden files remaining under the dbfs mount.These files can accumulate causing slow performance for simple filesystem
commands such as "ls". Also, the number of files can grow quite large taking up unnecessary space.
Action / Repair:
It's recommended to perform these actions during your next planned maintenance schedule as dbfs will need to be restarted.
These instructions are applicable to those environments who configured DBFS using MOS note:1054431.1
1) While dbfs is mounted, manually delete any existing .fuse_hidden files under the dbfs mount as the patch does not clear these.
2) Stop and unmount dbfs:
$GI/bin/crsctl stop res <dbfs_mount> 
3) Obtain and install the new fuse rpms related to bug:17401424 from Oracle's public Yum Server
4) Verify the new rpm is installed <fuse-libs-2.7.4->:
# rpm -qa|grep fuse
5) Start and remount dbfs:
$GI/bin/crsctl start res <dbfs_mount> 

Verify that the SDP over IB option "sdp_apm_enable(d)" is set to "0"

Alert Level
Engineered System
Exadata-Physical, Exadata-Management Domain,
Exadata-user Domain, SSC, Exalogic

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2
Linux x86-64 el5uek
Linux x86-64 el6uek

Benefit / Impact:
The Impact of verifying that the SDP over IB option "sdp_apm_enable" is set to "0" is minimal. To set the option, a reboot is recommended to make sure the configuration file syntax is correct.
If the the SDP over IB option "sdp_apm_enable" is not set to "0" on all Exadata database servers and clients that communicate with each other using SDP, either the client or database server side of the connection request will eventually hang.
NOTE: While the original issue was reported in environments where Exalogic application servers where accessing an Oracle Exadata Database Machine using SDP, ANY client requesting a connection using SDP with Automatic Path Migration (APM) enabled to an Oracle Exadata Database Machine will cause the connection to hang on the database server. exachk cannot tell from querying an Oracle Exadata Database Machine if there is, or ever will be, an end user application accessing the database servers via SDP. The Best Practice recommendation for stability is therefore to turn off APM on all Oracle Exadata Database Machines and any clients that may seek to establish an SDP connection with them.
Action / Repair:
To verify that the SDP over IB option "sdp_apm_enable" is set to "0" in the proper configuration file and the running kernel, execute the following command as the "root" userid on all database servers.
unset IB_SDP_FILE;
if ! /sbin/lsmod | grep -q "^${MODULE}[[:space:]]"; then
        echo "Module ${MODULE} is not loaded, so ${OPTION} will not be checked";
  echo "Module ${MODULE} is loaded, so ${OPTION} will be checked";
  KERNEL_TYPE=$(uname -r | cut -d"." -f6);
  if [ $KERNEL_TYPE = "el5uek" ]
   elif [ $KERNEL_TYPE = "el6uek" ]
   echo -e "ERROR: unable to determine IB_SDP_FILE: $KERNEL_TYPE";
   IB_SDP_OUTPUT_FILE=$(egrep "ib_sdp" $IB_SDP_FILE);
   if [ -s /sys/module/ib_sdp/parameters/sdp_apm_enable ]
     IB_SDP_OUTPUT_KERNEL_RSLT=$(cat /sys/module/ib_sdp/parameters/sdp_apm_enable);
     IB_SDP_OUTPUT_KERNEL_RSLT="/sys/module/ib_sdp/parameters/sdp_apm_enable not found";
   if [[ `echo "$IB_SDP_OUTPUT_FILE" | egrep "sdp_apm_enable*.=0" | wc -l | sed -e 's/^[ \t]*//'` = 1 && `echo "$IB_SDP_OUTPUT_FILE" | wc -l | sed -e 's/^[ \t]*//'` = 1 ]]
      echo -e "SUCCESS: sdp_apm_enable is set to 0 in $IB_SDP_FILE and running kernel.";
      echo -e "$IB_SDP_FILE: $IB_SDP_OUTPUT_FILE";
      echo -e "Running Kernel: $IB_SDP_OUTPUT_KERNEL_RSLT";
    echo -e "FAILURE: sdp_apm_enable should be set to 0 in $IB_SDP_FILE and running kernel.";
    echo -e "$IB_SDP_FILE: $IB_SDP_OUTPUT_FILE";
    echo -e "Running Kernel: $IB_SDP_OUTPUT_KERNEL_RSLT";

The output should be similar to:
Module ib_sdp is not loaded, so sdp_apm_enable will not be checked
- OR -
Module ib_sdp is loaded, so sdp_apm_enable will be checked
SUCCESS: sdp_apm_enable is set to 0 in /etc/modprobe.conf and running kernel.
/etc/modprobe.conf: options ib_sdp sdp_zcopy_thresh=0 recv_poll=0 sdp_apm_enable=0
Running Kernel: 0
If the output is not as expected, investigate the configuration for root cause and make appropriate corrections.
NOTE: The 11.x and 12.x series are separate code lines, which is why there are two entries under "Exadata Version". Above the versions listed in "Exadata Version", APM is off by default in the Linux kernel, but it can still be manually activated.
NOTE: For additional guidance on configuring sdp_apm_enable, please see "SDP Connection in inter-connected Exalogic and Exadata stopped working (Doc ID 1588546.1)"

Verify /etc/oratab

Alert Level

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Solaris - 11
Linux x86-64 UEK5.8
exachk 2.2.5

Benefit / Impact:
Validate oratab contents - prevents against invalid entries that make automation difficult
oratab having stale or invalid entries takes away the ability to automate - for example relinking of oracle homes.
Action / Repair:

  • all directories point to real locations with $ORACLE_HOME/bin/oracle binary in place
  • only ony GI home
  • one and only one +ASM entry exists
  • +ASM entry matched with GI home with $ORACLE_HOME/bin/crsd.bin binary
A quick script with 5 basic checks is made available here. The script was written quick and only serves as an example of what we are trying to accomplish

Verify consistent software and configuration across nodes

Alert Level
See bug list in linked section below.
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
exachk various

Benefit / Impact:
Consistent software and configuration across nodes increases stabillty and performance, and facilitates problem diagnosis.
Inconsistent software and configuration across nodes can cause crashes and performance degredation, and can make problem diagnosis difficult.
Action / Repair:
Recommended consistency checks are provided at the following location:
Exadata Best Practices Cross Node Consistency

 Verify all database and storage servers time server configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalCRITICAL05/01/19<Name>ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain, SSC
ALL29605287 - exachk
29031050 - exachk
27262264 - exachk
24696447 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinux, Sparcexachk 19.3.0N/A
Benefit / Impact:
Verifying all database and storage servers time server configurations are as expected can help avoid issues such as impaired performance or node eviction.
The impact of verifying all database and storage servers time server configuration is minimal. The impact of making corrections varies depending upon the root cause of the difference.
Significant time drift on database and storage servers may cause unexpected storage server crashes or database server node evictions.
Action / Repair:
NOTE: This check will only pass if the following are all true on each database or storage server:
1) There are one or more time servers specified in the configuration file (/etc/chrony.conf or /etc/ntp.conf).
2) Each storage or database server is synched with one of the set of available time sources in the configuration file.
3) The maximum time drift for each storage or database server from the synched time source reported is less than or equal to 1 second.
To verify all database and storage servers time server configuration, run exachk and review the provided report.
The expected output in the exachk report should be as follows:
In the "Cluster Wide" section of the report, the overall result should be "PASS":
PASS   All database and storage servers time server configuration is as expected  Cluster Wide   View
In the "View" detail section of the report for this check the expected output should be similar to:
Status on Cluster Wide:
PASS => Time services are properly configured


SUCCESS: time services are properly configured.
In the "View" detail section of the report for this check a "FAILURE" example will be similar to:
FAILURE: time services are not properly configured.  Details:

randomadm05:    FAILURE:      server count:  1        synched server in conf:  1      timedrift: 2
randomceladm07: FAILURE:      server count:  0        synched server in conf:  1      timedrift: 0
randomceladm08: FAILURE:      server count:  1        synched server in conf:  0      timedrift: 0
NOTE: A "FAILURE" result prints the gathered data from the cluster to help identify the issue.
NOTE: This configuration failed because
1) randomadm05 timedrift is too high.
2) randomceladm07 has no servers defined in the configuration file.
3) randomceladm08 is not synchronized to a server defined in the configuration file.
If the result is not as expected, investigate for root cause and take appropriate corrective action.
NOTE: If after corrective actions are completed, you wish to run this one check without a full exachk run execute the following command as the "root" userid in the directory in which exachk was installed:
./exachk -check 85C96EAB566F8F13E053D498EB0AE6F1,85C9BA643125E253E053D598EB0A6D07,85CEDB9B0FBF1262E053D298EB0A29F9

Verify Sar files have read permissions for non-root user

Alert Level
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Solaris - 11
Linux x86-64 UEK5.8

Benefit / Impact:
Ability for non-root users including EM to monitor System Activity Report (sar) files.
Inability for non-root users including EM to monitor sar files.
Action / Repair:
Verify if read permissions are set for the sar files, execute the script below.
##### begin script 

if [ `stat -c %A /var/log/sa/sa* | awk 'END{print}' | sed 's/.......\(.\).\+/\1/'` != "r" ]
 echo "Sar files does not have the proper read permission set for non-root users. To correct, issue this command as root: chmod o+r /var/log/sa/* "
 echo "Sar file permissions are correct and no further action is needed."
#### end script 

Verify that the patch for bug 16618055 is applied

Alert Level

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
exachk 2.2.5

Benefit / Impact:
Applying the patch for bug 16618055 allows recovery to utilize ASYNC I/O, providing greater recovery performance and a shorter Recovery Time Objective.
The impact of verifying that the patch for bug 16618055 is applied is minimal. The impact of applying the patch for bug 16618055 varies by method.
Without the patch for bug 16618055 applied, recovery uses SYNC I/O for all log and block read operations which causes slower recovery slave performance and a longer Recovery Time Objective.
Action / Repair:
To verify that the patch for bug 16618055 is applied, as the owner of each RDBMS home, with the environment properly configured, execute the following command for each RDBMS home:

$ORACLE_HOME/OPatch/opatch lsinventory -bugs_fixed|egrep -w '^16618055|^Bug|Patch'|grep -v Installer

The output should be similar to:

Bug Fixed by Installed at Description
16618055 18642122 Fri Jun 13 11:32:22 PDT 2014 SLOW REDO APPLY ON EXADATA DUE TO SYNC IOS

If the appropriate patch is not already applied, and the database software version is and the Bundle Patch applied is less than Bundle Patch 8 then you must apply the patch for bug 16618055 to the appropriate database home.

NOTE: For additional detail, please see My Oracle Support note "ASYNC IO In Exadata Not Working (Doc ID 1642088.1)".

Verify the Name Service Cache Daemon (NSCD) is Running

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL07/17/17 <Name>ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool Version
Benefit / Impact
Verifying the NSCD configuration ensures the correct configuration when providing cache for the most common name service requests, like passwords, groups, hosts.
The impact of verifying the NSCD configuration is minimal. While configuring and starting the NSCD can be done without a reboot, a reboot is recommended to prove the configuration is correct and survives a boot procedure.
NOTE: The recommended NSCD attribute values varying depending upon whether or not the System Security Service Daemon (SSSD) is also in use.
When NCSD and SSSD daemons are running together, an incorrect configuration could cause processes to use the incorrect cache service. Typical problems are CRS start failure due to an invalid password, new connections to the database suddenly failing due to invalid password error (ORA-1031, ORA-1017) among others.
Action / Repair:
To verify the NSCD is properly configured, as the root userid on each database server, execute the following code:
NSCD_SERVICE_DATA=$(service nscd status 2>&1)
SSSD_SERVICE_DATA=$(service sssd status 2>&1)
NSCD_AUTOSTART_DATA=$(chkconfig --list nscd 2>&1 | sed -e 's/  */ /g' -e 's/ *//')
NSCD_AUTOSTART_CONFIGURED=$(echo $NSCD_AUTOSTART_DATA |awk '{if ($0 ~ /3:on/ || $0 ~ /5:on/) {print "1";exit 1}else{print "0";exit 0}}')
if [ -r /etc/nscd.conf ]
  NSCD_FILE_DATA=$(egrep "enable-cache" /etc/nscd.conf | grep -v "#" | awk '{print $2 ": " $3}')
  NSCD_FILE_DATA=$(ls -l /etc/nscd.conf 2>&1)
NSCD_MEMORY_DATA=$(for CACHE_NAME in passwd group hosts services netgroup; do echo -e "$CACHE_NAME: `nscd -g 2>/dev/null | egrep -w "$CACHE_NAME" -A3 | egrep "is enabled" | cut -dc -f1 | sed -e 's/  */ /g' -e 's/ *//'`"; done)
NSCD_SERVICE_STATUS=$(echo $NSCD_SERVICE_DATA | grep running | wc -l)
SSSD_SERVICE_STATUS=$(echo $SSSD_SERVICE_DATA | grep running | wc -l)
NSCD_FILE_DATA_SHORT=$(echo "$NSCD_FILE_DATA" | awk '{print $2}' | tr -d " \t\n\r")
NSCD_MEMORY_DATA_SHORT=$(echo "$NSCD_MEMORY_DATA" | awk '{print $2}' | tr -d " \t\n\r")
if [ $SSSD_SERVICE_STATUS -eq 0 ] # only NSCD
  if [ "$NSCD_FILE_DATA_SHORT" == "yesyesyesyesno" ]
else # NSCD and SSSD
  if [ "$NSCD_FILE_DATA_SHORT" == "yesnononono" ]
  echo -e "SUCCESS:  The Name Service Cache Daemon (NSCD) configuration is correct:\n"
  echo -e "NSCD service data:       $NSCD_SERVICE_DATA\n"
  echo -e "SSSD service data:       $SSSD_SERVICE_DATA\n"  
  echo -e "NSCD autostart data:     $NSCD_AUTOSTART_DATA\n"
  echo -e "NSCD file data:\n$NSCD_FILE_DATA\n"
  echo -e "NSCD memory data:\n$NSCD_MEMORY_DATA\n"
  echo -e "FAILURE:  The Name Service Cache Daemon (NSCD) configuration is not correct:\n"
  echo -e "NSCD service data:       $NSCD_SERVICE_DATA\n"
  echo -e "SSSD service data:       $SSSD_SERVICE_DATA\n"
  echo -e "NSCD autostart data:     $NSCD_AUTOSTART_DATA\n"
  echo -e "NSCD file data:\n$NSCD_FILE_DATA\n"
  echo -e "NSCD memory data:\n$NSCD_MEMORY_DATA\n"
The expected output should be similar to:
SUCCESS:  The Name Service Cache Daemon (NSCD) configuration is correct:

NSCD service data:       nscd (pid 69150) is running...
SSSD service data:       sssd: unrecognized service
NSCD autostart data:     nscd  0:off   1:off   2:on    3:on    4:on    5:on    6:off

NSCD file data:
passwd: yes
group: yes
hosts: yes
services: yes
netgroup: no

NSCD memory data:
passwd: yes 
group: yes 
hosts: yes 
services: yes 
netgroup: no 

-- OR --
SUCCESS:  The Name Service Cache Daemon (NSCD) configuration is correct:

NSCD service data:       nscd (pid 69150) is running...

SSSD service data:       sssd (pid 91505) is running...

NSCD autostart data:     nscd   0:off   1:off   2:on    3:on    4:on    5:on    6:off

NSCD file data:
passwd: yes
group: no
hosts: no
services: no
netgroup: no

NSCD memory data:
passwd: yes 
group: no 
hosts: no
services: no
netgroup: no 
If the output is not as expected take the following actions as the root userid:

1) If the NSCD is not set for autostart, enable the NSCD to autostart on reboots:
chkconfig --level 35 nscd on
NOTE: The autostart levels vary by Exadata Storage Server Software version, at least levels 3 and 5 should be set.
2) The entries for the /etc/nscd.conf file depend upon whether or not SSSD is in use with NSCD. For NSCD without SSSD, the following entries should be present in the /etc/nscd.conf file:
        enable-cache            passwd          yes
        enable-cache            group           yes
        enable-cache            hosts           yes
        enable-cache            services        yes
        enable-cache            netgroup        no
For NSCD with SSSD, the following entries should be present in the /etc/nscd.conf file:
        enable-cache            passwd          yes
        enable-cache            group           no
        enable-cache            hosts           no
        enable-cache            services        no
        enable-cache            netgroup        no
If the values are not as expected, modify the /etc/nscd.conf file.

NOTE: the /etc/nscd.conf file can be edited with the "vi" editor.
NOTE: these attributes are spread throughout the /etc/nscd.conf file, at the head of other attributes that pertain to each cache. They are not grouped together. For example:
        enable-cache            services        yes
        positive-time-to-live   services        28800
        negative-time-to-live   services        20
        suggested-size          services        211
        check-files             services        yes
        persistent              services        yes
        shared                  services        yes
        max-db-size             services        33554432

3) It is recommended to reboot the database server to ensure that the configuration is correct and is persistent across the reboot process.

4) If a reboot is not immediately possible, as a workaround, the service may be started or restarted manually:
service nscd start
Starting nscd:                                             [  OK  ]
- OR -
service nscd restart
Stopping nscd:                                             [  OK  ]
Starting nscd:                                             [  OK  ]
For additional guidance on NSCD, please see:
Oracle® Grid Infrastructure Installation Guide 11g Release 2 (11.2) for Linux
Oracle® Grid Infrastructure Installation Guide 12c Release 1 (12.1) for Linux

Verify kernels and initrd in /boot/grub/grub.conf are available on the system

Alert Level
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Linux x86-64

Benefit / Impact:
The impact of verifying that the kernel and initrd listed in grub.conf are actually available on the system is minimal. When the kernel or initrd file is unavailable the user should either remove the corresponding entry from grub.conf (if possible) or install the appropriate files on the right location (recommended)
If entries in grub.conf exist that refer to kernel and initrd files not installed on the system, a next reboot may fail. The system will 'hang' in the bootloaded.
Action / Repair:
To verify entries in grub.conf match with what is installed. I would think of the following approach in pseudo:
for each 'title' in /boot/grub/grub.conf 
 get the value for 'kernel' without other arguments; check if the file is found on disk in /boot; raise an alert when not found 
 get the value for 'initrd' without other arguments; check if the file is found on disk in /boot; raise an alert when not found 

Verify basic Logical Volume(LVM) system devices configuration

Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64

Benefit / Impact:
The impact of verifying that the basic Logical Volume(LVM) system devices configuration is correct is minimal. The impact of correcting any abnormalities depends upon the specific abnormality.
If the basic Logical Volume(LVM) system devices configuration is not correct, there may be risk of patching interruption or unexpected downtime.
Action / Repair:
The basic Logical Volume(LVM) system devices configuration varies by Exadata software version level and hardware type. exachk runs the appropriate checks based on Exadata software version levels and hardware type. To validate the basic Logical Volume(LVM) system devices configuration, run exachk and review the provided report.
The expected output in the exachk report should be as follows:
In the "Findings Passed" summary section of the report, the overall result should be "PASS":
PASS OS Check Basic Logical Volume(LVM) system devices configuration meets recommendations. All Database Servers View
In the "View" detail section of the report for each individual database server:
(*) PASS: This is an LV (Logical Volume) enabled system
(*) PASS: LVDbSys1 should reside in Volume Group (VG) VGExaDb.
(*) PASS: LVDbSys2 should reside in Volume Group (VG) VGExaDb.
(*) PASS: Minimum number of LVDbSys LV's
(*) PASS: Maximum number of LVDbSys LV's
(*) PASS: LVDbSys LV minimum size of /dev/mapper/VGExaDb-LVDbSys2
(*) PASS: LVDbSys LV size
(*) PASS: LVDbSys inactive LV minimum size of /dev/mapper/VGExaDb-LVDbSys1
(*) PASS: Inactive LVDbSys LV's not mounted
(*) PASS: Enough free space found for snapshot
(*) PASS: No filesystem label issues for DBSYS
(*) PASS: No reclaimdisk issues found
(*) PASS: No active lvm snapshots found
If the items reported are not all "PASS", investigate the root cause and take appropriate corrective action.

Ensure db_unique_name is unique across the enterprise

Alert Level

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Solaris - 11
Linux x86-64 UEK5.8

Benefit / Impact:
db_unique_name is used extensively in many Clusterware, RDBMS, and Exadata code layers. Uniqueness is enforced within clusters but not across clusters. Ensuring db_unique_name is unique across clusters, especially those that are sharing the same Exadata storage, ensures that all code layers that use it work properly.
Having databases with the same db_unique_name across different Real Application Clusters that share the same Exadata storage causes unexpected behavior such as database isolation, crashes, or failures to start.
Action / Repair:
The following is an example of a sqlplus command checking whether db_unique_name has been explicitely set:
SQL> select isdefault from v$parameter where name ='db_unique_name';

If the output is "FALSE", then someone has explicitely set db_unique_name and not let it default to the value of db_name.
If the output is "TRUE", then db_unique_name is set to its default value, ie the same as db_name.
Oracle recommends that db_unique_name is unique across a customer's Oracle enterprise. exachk running on a given Real Application Cluster cannot check all values across a customer's enterprise. This exachk check assumes that "FALSE" means specific care has been taken to ensure uniqueness across the customer's enterprise and is considered the "PASS" condition. "TRUE" is assumed to imply that enterprise uniqeness may not have been considered and is the "FAIL" condition.
NOTE: the corrective action is to ensure all databases have a unique name across the customer's Oracle enterprise, especially those accessing the same Exadata storage. If every database is confirmed to have a unique name without setting db_unique_name universally, then this exachk check may be disabled or ignored.

Verify average ping times to DNS nameserver

Alert Level

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, EIGHTH, X3-8, X4-2
Solaris - 11
Linux x86-64 UEK5.8

Benefit / Impact:
Secure Shell (SSH) remote login procedures require communication between the remote target device and the DNS nameserver. Minimal average ping times to the DNS nameserver improve SSH login times and help to avoid problems such as timeouts or failed connection attempts.
The impact of verifying average ping times to the DNS nameserver is minimal. The impact required to minimize average ping times to the DNS nameserver varies by configuration and cannot be estimated here.
Long ping times between remote SSH targets and the active DNS server may cause remote login failures, performance issues, or dropped application connections.
Action / Repair:
To verify average ping times to DNS nameserver, enter the following command set as the "root" userid on each database server, storage server, and InfiniBand switch:
if [ -s /usr/local/bin/version ]
 DNS_SERVER=$(grep -o '[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}\.[0-9]\{1,3\}' /etc/resolv.conf | head -1);
 DNS_SERVER=$(nslookup $HOST_NAME | head -1 | cut -d: -f2 | sed -e 's/^[ \t]*//');
if [ $OS_TYPE = "Linux" ]
 PING_COMM="ping -c10 $DNS_SERVER";
 PING_COMM="ping -s $DNS_SERVER 56 5";
AVG_PING_TIME=$($PING_COMM | egrep avg | cut -d"/" -f5);
TRNC_AVG_PING_TIME=$(echo $AVG_PING_TIME | cut -d"." -f1); 
if [ "$TRNC_AVG_PING_TIME" -le "3" ];
 echo -e "SUCCESS: Average ping times to DNS nameserver should not be negatively impacting SSH operations: $AVG_PING_TIME";
 echo -e "Active DNS Server IP: $DNS_SERVER\n";
 echo -e "WARNING: Average ping times to DNS nameserver MAY be negatively impacting SSH operations: $AVG_PING_TIME";
 echo -e "Active DNS Server IP: $DNS_SERVER\n";
The output should be similar to the following:
SUCCESS: Average ping times to DNS nameserver should not be negatively impacting operations: 3.255
Active DNS Server IP: 111.222.333.444
If the result is a "WARNING", first repeat the command set several times at different intervals to determine if the results are consistent. The command set is one spot check for ten pings. The environment could normally have a short delay and an execution just happened to catch a period of poor response, or it could normally have a long delay and an execution just happened to catch a period of good response. If the results are consistent, determine the root cause and take appropriate corrective action.

NOTE: The result of this command set is a reflection of how DNS is implemented in the environment and not evidence in itself of a defect in the Oracle Exadata Database Machine.

NOTE: A "WARNING" result does not prove that a delay is causing SSH connectivity problems in the environment. A "WARNING" result should always be evaluated in conjunction with a review of SSH connectivity issues in the environment. If there are other SSH connectivity issues present, evaluate if reducing or stabilizing the average ping times to the DNS nameserver may correct the issues.

NOTE: As with many other network performance metrics, the average ping times to DNS nameserver should be "minimal". However, it is possible that any given environment may return a result that exceeds the threshold used in this command set, yet it is satisfactory given the overall environment characteristics and lack of other related problems. IF NO OTHER PROBLEMS related to DNS exist other than this command set returning a "WARNING", and the numbers reported are acceptable after a "baseline" for the given environment has been established by repeated sampling, then the documented procedures for bypassing this check in exachk may be implemented.

NOTE: Due to the differences in available commands for the InfiniBand switch, the command set assumes the first "nameserver" in /etc/resolv.conf is the "active" DNS server.

NOTE: The use of the Name Service Cache Daemon (NSCD) may also mitigate the effects of long average ping times to DNS nameserver. For more information see: Verify the Name Service Cache Daemon (NSCD) is Running

Verify Running-config and Startup-config are the same on the Cisco switch

Alert Level
Exadata, SSC, Exalogic
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2, X2-8, X3-2, X3-8, X4-2
cat4500-IPBASEK9-M, Version 15.0(2)SG8

Benefit / Impact:
To keep the switch running the same configuration after it reboots, it is a best practice to have the running-config the same as the startup-config.
Potential management network issues if the startup-config contains pre-install defaults, or other customizations made by the Customer.
Action / Repair:
Compare the startup-config and the running-config. The simplest way to do this is to capture the output from the switch and run diff on the capture files.

Capture output of an ssh session, use the tee command to create the log file:
unixhost ~ > ssh admin@randomsw-adm0 2>&1 | tee /tmp/running.out
From that new connection, go into enable, set the terminal so it does not pause its output, and show the running configuration (not the "all")
randomsw-adm0> enable
randomsw-adm0# terminal length 0 - this causes the output to not pause
randomsw-adm0# show running-config all
randomsw-adm0# exit
Now, do the same to check the startup configuration:
unixhost ~ >ssh admin@randomsw-adm0 2>&1 | tee /tmp/start.out
From that new connection, go into enable, set the terminal so it does not pause its output, and show the startup configuration (not the "all")
randomsw-adm0> enable
randomsw-adm0# terminal length 0 <- this causes the output to not pause
randomsw-adm0# show startup-config all
randomsw-adm0# exit
Modify the two files by removing the lines before the version number:
Version 15.0
and the last entry from the show command:
This modification will make the file format more suited for the diff command:
unixhost ~ > diff /tmp/start.out /tmp/running.out > /tmp/diff.out
The two files should have identical parameters. In my first attempt to validate the two config's I saved running to startup and then output both. The diff was:
< spanning-tree uplinkfast max-update-rate 444318408
> spanning-tree uplinkfast max-update-rate 444318920
It seems that no matter how many times I copy running to startup, it still differs by those few bytes. This might be the same, so examining the diff.out file you should be able to determine if the differences make any difference at all.

To make running and startup the same, go into the switch and then into the enable mode:
randomsw-adm0> enable
randomsw-adm0# copy running-config startup-config all
Destination filename [startup-config]?
Compressed configuration from 75923 bytes to 22210 bytes[OK]
To protect this setup, you should also copy the new config to a backup on the switch itself and to an external tftp server:
randomsw-adm0# copy running-config bootflash:cisco4948-ip-confg-before
Destination filename [cisco4948-ip-confg-before]?

13815 bytes copied in 1.376 secs (10040 bytes/sec)
Now to the external tftp server:
randomsw-adm0#copy running-config tftp
Address or name of remote host []? random-tftp-1
Destination filename [randomsw-adm0-confg]? cisco4948-ip-confg-before
13815 bytes copied in 1.564 secs (8833 bytes/sec)

Validate SSH is installed and configured on Cisco management switch

Alert Level
Exadata, SSC, Exalogic
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2, X2-8, X3-2, X3-8, X4-2

Benefit / Impact: Telnet has no security and should be avoided. Early versions of the Cisco Internetwork Operating System (IOS) for the Catalyst 4948 only had telnet available. Note 1415044.1 describes how to get the version of the IOS and how to configure SSH. This is a check to validate SSHis enabled and also how to configure it, and restrict the number of simultaneous sessions into the switch.
By using telnet, one risks a network sniffer obtaining the administrative and enable passwords. Once these passwords are had, it is trivial to breach the switch and cause administrative access to Exadata be disabled. Depending on how this switch is integrated into a Customer's network, infiltration into the Customer's network becomes a possibility.

The versions which do not contain SSH are those which only have the IP Base image which will have only IPBASE only in its image name. For instance, Cat4500-IPBASE-M will not have SSH while Cat4500-IPBASEK9-M will have SSH in it.
Action / Repair:
The following was done on this version of the Cisco IOS which contains SSH (as it is cat4500-IPBASEK9-M).

Cisco IOS Software, Catalyst 4500 L3 Switch Software (cat4500-IPBASEK9-M), Version 15.0(2)SG8, RELEASE SOFTWARE (fc2)
One must first start a session into the switch. Once there, go into "enable" mode. One will notice the prompt change from ">" to "#" to represent the enable session.

Find if SSH is enabled on the switch.

randomsw-adm0#show ip ssh
SSH Enabled - version 2.0
Authentication timeout: 60 secs; Authentication retries: 3
Validate the SSH configuration:

randomsw-adm0#show running-config all | include transport
 no destination transport-method http
 destination transport-method email
 transport preferred none
 transport preferred telnet
 transport input telnet
 transport output telnet
 transport preferred none
 transport input none
 transport output none
In this case SSH is not listed so it is not configured to be used. A configuration that passes looks like this:

randomsw-adm0#show running-config all | include transport
 no destination transport-method http
 destination transport-method email
 transport preferred none
 transport preferred ssh
 transport input ssh
 transport output ssh
 transport preferred none
 transport input none
 transport output none
Validate that the startup configuration is the same as the running.

randomsw-adm0#show startup-config | include transport
 no destination transport-method http
 destination transport-method email
 transport preferred none
 transport preferred ssh
 transport input ssh
 transport output ssh
 transport preferred none
 transport input none
 transport output none
In this case they match. If further validation is needed, one will have to capture the running configuration and the startup configuration and compare them.
If SSH is not enabled and there still are telnet entries in the above output, then the system needs to be configured for SSH. The first step is to discover how many simultaneous sessions are available.

randomsw-adm0#show line
Tty Typ Tx/Rx A Modem Roty AccO AccI Uses Noise Overruns Int
 0 CTY - - - - - 0 0 0/0 -
 1 VTY - - - - - 66 0 0/0 -
 2 VTY - - - - - 20 0 0/0 -
 3 VTY - - - - - 6 0 0/0 -
 4 VTY - - - - - 0 0 0/0 -
 5 VTY - - - - - 0 0 0/0 -
There can be up to 16 VTY lines in this version of the IOS, so the list you see might be longer. This will allow up to 16 telnet/SSH sessions in the switch at the same time. Normally this is not a good idea, so in this document we will assume only five total sessions are needed and will disable the rest. So below we will configure vty 1 up to vty 4. We will disable vty 5 through 16. The vty 0 is the serial port in the back of the switch.

randomsw-adm0#configure terminal
Enter configuration commands, one per line. End with CNTL/Z.

randomsw-adm0(config)#line vty 1 4
randomsw-adm0(config-line)#transport preferred ssh
randomsw-adm0(config-line)#transport input none
randomsw-adm0(config-line)#transport input ssh
randomsw-adm0(config-line)#transport output none
randomsw-adm0(config-line)#transport output ssh
randomsw-adm0(config)#line vty 5 16
randomsw-adm0(config-line)#transport preferred none
randomsw-adm0(config-line)#transport input none
randomsw-adm0(config-line)#transport output none

randomsw-adm0#show line vty 0 | include transport
 Allowed input transports are ssh.
 Allowed output transports are ssh.
 Preferred transport is ssh.
randomsw-adm0#show line vty 1 | include transport
 Allowed input transports are ssh.
 Allowed output transports are ssh.
 Preferred transport is ssh.
randomsw-adm0#show line vty 2 | include transport
 Allowed input transports are ssh.
 Allowed output transports are ssh.
 Preferred transport is ssh.
randomsw-adm0#show line vty 3 | include transport
 Allowed input transports are ssh.
 Allowed output transports are ssh.
 Preferred transport is ssh.
randomsw-adm0#show line vty 4 | include transport
 Allowed input transports are ssh.
 Allowed output transports are ssh.
 Preferred transport is ssh.
randomsw-adm0#show line vty 5 | include transport
 Allowed input transports are none.
 Allowed output transports are none.
 Preferred transport is none.
The rest of the "show line vty #" will show all transport options will be set to one. Because they are set to none, you will only be able to have up to five SSH sessions. You will also not be able get a telnet session on any of the vty's. We will test this in later steps.
We now need to save the running configuration to the startup configuration so these changes will take.

randomsw-adm0#copy running-config startup-config all
Destination filename [startup-config]?
Now that you have exited from the session to the switch, time to test its really working. First try telneting to the switch:

user@host ~ >telnet randomsw-adm0
Trying 111.222.333.444...
telnet: connect to address 111.222.333.444: Connection refused
telnet: Unable to connect to remote host: Connection refused
Now try SSH:

user@host ~ >ssh admin@randomsw-adm0
Warning: untrusted X11 forwarding setup failed: xauth key data not generated
Warning: No xauth data; using fake authentication data for X11 forwarding.
To test simultaneous connect restriction, keep opening SSH sessions (without exiting from them) until you get a Connection refused error. Once you get that error, you've discovered the number of simultaneous SSH sessions are possible. From this point, while keeping those SSH sessions open and telnet into the switch. If you do not get a Session refused error, the switch is still open to telnet so the configuration above needs to be troubleshot.

Verify Database Memory Allocation is not Greater than Physical Memory Installed on Database node

Alert Level

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version

Benefit / Impact:
Database memory allocation should never be greater than the physical memory installed on a database node. Over allocating memory can cause memory swapping which will negatively impact performance.
Database performance can be significantly impacted by over allocating memory.
Action / Repair:
Generate a collection of all of the running databases in the environment. This must be done on a per-node basis as databases may not have instances running on all nodes. In a loop, connect to each database and query gv$parameter and ensure all database instances are using USE_LARGE_PAGES = ONLY.
If any instance does not have USE_LARGE_PAGES = ONLY set, FAIL with a message similar to the following and stop processing:
It is highly recommended that you use hugepages in the Linux environment (link to BP for USE_LARGE_PAGES). We have found at least one instance without USE_LARGE_PAGES = ONLY and thus cannot with absolute accuracy calculate actual memory utilization.
If all instances PASS the previous check, calculate PGA memory allocation in use by each database instance (this includes ASM and MGMTDB instances).
  • When accessing the ASM instance, at this time PGA_AGGREGATE_LIMIT is not used, so in all cases for ASM retrieve the PGA_AGGREGATE_TARGET
    SQL> select value*3 from v$parameter where name='pga_aggregate_target';
  • If the database version is or higherretrieve the PGA_AGGREGATE_LIMIT and add to PGA total. Note that in 12c, PGA_AGGREGATE_LIMIT is derived from PGA_AGGREGATE_TARGET and defaults to greater of 2gb or 2 times setting of PGA_AGGREGATE_TARGET.
    • SQL> select value from v$parameter where name='pga_aggregate_limit';VALUE--------------------------------------------------------------------------------3221225472
  • If the database version is earlier than, retrieve the PGA_AGGREGATE_TARGET * 3 and add to PGA total. Note that PGA_AGGREGATE_TARGET can actually consume memory up to 3 times the setting for the parameter
    SQL> select value*3 from v$parameter where name='pga_aggregate_target';VALUE*3--------------------------------------------------------------------------------4831838208
  • Determine the amount of memory being used by HugePages?
    $ cat /proc/meminfo|grep Huge
    HugePages_Total: 256000
    HugePages_Free: 234587
    HugePages_Rsvd: 67
    HugePages_Surp: 0
    Hugepagesize: 2048 kB

    Memory being used by HugePages? is HugePages? _Total * Hugepagesize

    $ bc -q
  • Determine the memory available on the node for PGA
    $ cat /proc/meminfo |grep MemTotal? |awk '{print $2 * 1024}'

    Subtract the memory allocated for HugePages? (gathered above)

    $ bc -q
    1083965984768 - 536870912000
If the PGA database instance memory total is > memory available on the node for PGA provide FAILURE message stating something similar to "Database PGA allocation of <PGA memory total> is greater than the memory available for PGA <memory available on the node for PGA> on this node. Please change memory allocations by reducing PGA_AGGREGATE_TARGET as appropriate in one or more databases until PGA memory allocation is less than memory available for PGA.

This last item should be scripted so that we can provide it as part of the best practices page for customers to run outside of exachk.

Verify Cluster Verification Utility(CVU) Output Directory Contents Consume < 500MB of Disk Space

Alert Level
Exadata, SSC

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5
Solaris - 11
Linux x86-64 UEK5.8

Benefit / Impact:
Beginning with Oracle version, the CVU is configured by default to run and generate an XML output file every 6 hours (360 minutes). These files, and occasionally CVU command text output files, are stored in the output directory. If not monitored, the files in the CVU output directory could eventually exhaust the available disk space. Currently, there is no effective purging of these files, but this is expected to be addressed in a future release of CVU.
The benefit of verifying that the CVU output directory contents consume < 500MB of disk space is that an outage due to depleted disk space is avoided. The impact of the verification is small, the impact of reducing disk space consumption depends upon the chosen remediation strategy.
Not verifying that the CVU output directory contents consume < 500MB of disk space increases the risk of a cluster instance crash or other failures related to a file system running out of space.
Action / Repair:
To verify that the CVU output directory contents consume < 500MB of disk space, as the RDBMS home owner, and with the environment properly set, execute the following command set on each database server:
 CVU_SPACE_USED=$(du -sm $DEFAULT_LOCATION | awk '{ print $1}')
 if [ $CVU_SPACE_USED -le "500" ]
 then echo -e "SUCCESS: Automated CVU check output consumes <= 500MB of disk space: "$CVU_SPACE_USED"MB"
 else echo -e "WARNING: Automated CVU check output consumes > 500MB of disk space: "$CVU_SPACE_USED"MB"
 echo -e "WARNING: There seems to be some issue with $DEFAULT_LOCATION"
The expected output should be similar to:
SUCCESS: Automated CVU check output consumes <= 500MB of disk space: 224MB
If the output is "WARNING", these are the recommended corrective options:
1) Manually purge the accumulated files from all database servers on a schedule that suits your retention and space usage requirements. Do not just delete all files.

2) Lengthen the interval at which the automated CVU check executes:

As the RDBMS home owner, with the environment properly set, and with CVU enabled and running, execute the following command set on a database server:
[oracle@randomadm03 ~]$ srvctl modify cvu -checkinterval 720
[oracle@randomadm03 ~]$ srvctl config cvu
CVU is configured to run once every 720 minutes
CVU is enabled.
CVU is individually enabled on nodes: 
CVU is individually disabled on nodes: 
NOTE: the "modify" command does not return any output confirmation. Follow up with the "config" command.
NOTE: The interval change takes effect without restarting the CVU.
NOTE: The CVU process only runs on one database server, but the files accumulate on all database servers.
For additional information see: "Oracle® Real Application Clusters Administration and Deployment Guide 12c Release 1 (12.1) E48838-10"

Verify active system values match those defined in configuration file "cell.conf"

BDA, Exadata, Exalogic, Exalytics, SSC, ZDLRA

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64

Benefit / Impact:
The Impact of verifying that active system values match those defined in configuration file "cell.conf" is minimal.
Changing the options defined in configuration file "cell.conf" directly in the active kernel may impact availability.
A run time kernel configuration that does not match the values defined in the configuration file "cell.conf" may result in an outage or unexpected issues during the next boot.
Action / Repair:
Note: Modifications to the Oracle Exadata Storage Server hardware or software are not supported. Only the documented network interfaces on the Oracle Exadata Storage Server should be used for all connectivity including management and storage traffic. Additional network interfaces should not be used.

NOTE: Always follow the recommended procedures to make changes on an Exadata system, and use a reboot to verify that the changes are persistent in order to avoid unexpected issues during a reboot.

NOTE: ipconf validation restarts the cellwall service, which resets the storage server to the default configuration. If manual changes have been made regardless that such configuration is not permitted, the manual configuration will be lost when the cellwall service is restarted.

NOTE: The "ipconf" command performs a number of cross-checks. The length of time to execute varies by Exadata version, environment complexity, and system load. Newer versions of Exadata software have longer execution times due to more cross checking, as do more complex environments. Internal testing has taken up to 60 seconds. Please make sure the command is truly stuck before terminating.
To verify that active system values match those defined in configuration file "cell.conf", as the "root" userid execute the following command set only on each storage server:
IPCONF_RAW_OUTPUT=$(/opt/oracle.cellos/ipconf -verify -semantic -at-runtime -check-consistency -verbose 2>/dev/null);
IPCONF_RESULT=$(echo "$IPCONF_RAW_OUTPUT" | egrep "Consistency check PASSED" | wc -l);
if [ $IPCONF_RESULT = "1" ]
    echo -e "FAILURE: $IPCONF_SUMMARY\n"
    echo -e "`echo -e "$IPCONF_RAW_OUTPUT" | grep FAILED`"
The expected output is:
SUCCESS: [Info]: Consistency check PASSED
If the result is not as expected, the detailed output data will be echoed back after the "FAILURE" message. For example:
FAILURE: Info. Consistency check FAILED

ILOM timezone 00:21:28:A5:1B:BC found in /usr/share/zoneinfo                                      : FAILED
ILOM timezone America/Denver matches 00:21:28:A5:1B:BC from Exadata configuration file            : FAILED
Info. Consistency check FAILED
Review the data and take corrective action based upon the specific configuration items that did not pass.

Verify that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED"

Alert Level
Engineered System
Exadata-User Domain, Exadata-Physical, SSC, Exalogic
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2
Solaris - 11
Linux x86-64 el5uek
Linux x86-64 el6uek

Benefit / Impact:
Verifying that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED" avoids node eviction and potential cluster crashes due to insufficient resources, and it helps avoid a possible denial of service attack.
The impact of verifying that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED" is minimal. The impact of correcting CRS_LIMIT_NPROC should include a restart of the clusterware to ensure the setting is as expected after a restart.
Without verifying that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED" there is a risk of node eviction and potential cluster crashes due to insufficient resources, and a possible denial of service attack avenue.
Action / Repair:
To verify that CRS_LIMIT_NPROC is greater than 65535 and not "UNLIMITED", execute the following command set as the grid owner userid with the environment properly set on each of the database servers or each user domain of a virtualized environment:
export MIN_VAL=65535;
CONFG_CRS_LIMIT_NPROC=$(grep -w CRS_LIMIT_NPROC $CRS_HOME/crs/install/s_crsconfig_`hostname -s`_env.txt|grep -v ^#|cut -d= -f2);
if [ `echo $CONFG_CRS_LIMIT_NPROC | tr -s '[:upper:]' '[:lower:]'` = "unlimited" ]
 echo "WARNING: CRS_LIMIT_NPROC should be set to a value greater than or equal to $MIN_VAL, but not \"UNLIMITED\": $CONFG_CRS_LIMIT_NPROC.";
 echo "SUCCESS: CRS_LIMIT_NPROC is set to a value greater than or equal to $MIN_VAL, but not \"UNLIMITED\": $CONFG_CRS_LIMIT_NPROC.";
 echo "FAILURE: CRS_LIMIT_NPROC is set to a value less than $MIN_VAL: $CONFG_CRS_LIMIT_NPROC.";
The expected output should be:
SUCCESS: CRS_LIMIT_NPROC is set to a value greater than or equal to 65535, but not "UNLIMITED": 65536.
Example of a FAILURE result:
FAILURE: CRS_LIMIT_NPROC is set to a value less than 65535: 16384.
Example of a WARNING result:
WARNING: CRS_LIMIT_NPROC should be set to a value greater than or equal to 65535, but not "UNLIMITED": UnliMITed.
If the result is not "SUCCESS", determine the root cause and correct the cause.

For example, to correct the "FAILURE" example provided, as the owner userid of the grid infrastructure on the database server or user domain that produced the warning, edit with the "vi" editor the file $CRS_HOME/crs/install/s_crsconfig_`hostname -s`_env.txt and add this line:
as a minimum acceptable value. The limit name is typically in upper case. If thorough testing indicates a larger value should be used, the value can be set to any value within the recommended range. After you have closed the file and verified the value, restart the clusterware.

Verify TCP Segmentation Offload (TSO) is set to off

Alert Level
Engineered System
Exadata-Physical, Exadata-User Domain, Exalogic

DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X5-2,
Linux x86-64 el6uek

Benefit / Impact:
The Impact of verifying that the TSO option for IB and bonded IB interfaces is set to "off" is minimal. With the chosen implementation (of updating a configuration file) to make the setting effective a reboot is required.
If the TSO option is not set to "off" cluster node evictions can occur.
NOTE: Starting TSO function is disabled by the kernel. This does not apply for other Exadata releases then mentioned.
Action / Repair:
To verify that the TSO option is set to "off" in the run time configuration, execute the following command as the "root" userid on all database servers where the exadata image version is >= and <= on Exadata physical and domU deployments (not dom0)
get_ib_interfaces ()
 local -i ret_val=0
 local interface_list=''
 if [ ! -e /opt/oracle.cellos/ORACLE_CELL_NODE ]; then
 ActiveInterfaces=$(/sbin/ip link show up | awk '/[\t ]+bondib/ {print $2}' | sed -e 's/:$//' | grep -v eth | sort)
 ActiveInterfaces1=$(/sbin/ip link show up | awk '/[\t ]+ib/ {print $2}' | sed -e 's/:$//' | grep -v eth | sort)
 ActiveInterfaces2=$(/sbin/ip link show up | awk '/[\t ]+bond/ {print $2}' | sed -e 's/:$//' | grep -v eth | sort)
 for Interface in ${ActiveInterfaces} ${ActiveInterfaces1} ${ActiveInterfaces2}; do
 interface_list="${Interface} ${interface_list}"
 interface_list=`echo $interface_list| xargs -n1 | sort -u | xargs`
 echo "$interface_list"
gettso ()
 local tso=UNDEFINED
 local -i ret_val=0
 for Interface in `get_ib_interfaces | tail -1`; do
 if [ -z "$Interface" ]; then
 echo "`date '+%F %T %z'` [INFO] No ib interfaces need this work around."
 tso=$(/sbin/ethtool --show-offload $Interface | awk '(/tcp-segmentation-offload:/){print $NF}')
 if [ $tso == 'off' ]; then
 echo -e "SUCCESS: ${Interface}: tcp-segmentation-offload: set to off"
 echo -e "FAILURE: ${Interface}: tcp-segmentation-offload: not set to off"
 return $ret_val
The output should be similar to:
SUCCESS: bondib0: tcp-segmentation-offload: set to off
SUCCESS: ib0: tcp-segmentation-offload: set to off
SUCCESS: ib1: tcp-segmentation-offload: set to off 
- OR -
2015-05-27 10:49:01 -0500 [INFO] No ib interfaces need this work around.
If the output is not as expected, add the option ETHTOOL_OPTS="-K <ibdev> tso off" to the configuration files. Shutdown the stack followed by the command (executed as root) "ifdown <ibdev>" and "ifup <ibdev>" (where <ibdev> is ib0, ib1 or bondib0). Then restart the stack. For the majority of two socket database servers, these files are:

  • /etc/sysconfig/network-scripts/ifcfg-bondib0
  • /etc/sysconfig/network-scripts/ifcfg-ib0
  • /etc/sysconfig/network-scripts/ifcfg-ib1

NOTE: For older compute nodes, the file is: /etc/sysconfig/network-scripts/ifcfg-bond0
NOTE: Eight socket database servers may have additional bonded interfaces in use, with additional configuration files.

Check alerthistory for stateful alerts not cleared

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 06/19/19 <Name> Production Exadata - Physical,
Exadata - Management Domain 
ALL 27848031 - exachk
26651210 - exachk
21299782 - exachk 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/A N/A N/A N/A ALL Linux exachk 19.3.0 N/A 
Benefit / Impact:
There are two types of alerts maintained in the alerthistory of a storage or database server, stateful and stateless.
A stateful alert is usually associated with a transient condition, often hardware related, and it will clear itself after that transient condition is corrected. These alerts age out of the alerthistory after 7 days (default time) once they are set to clear.
The benefit of checking for stateful alerts that have not been cleared is faster problem resolution. The impact of correcting any stateful alert that has not been cleared depends upon each individual alert.
Failure to investigate a stateful alert that has not been cleared may result in significant impact, which varies by the particular alert.
Action / Repair:
To verify there are no stateful alerts that have not been cleared, as the root userid on each storage and database server execute the following commands:
unset SID
unset ACTION
if [ $(egrep -i node.type /opt/oracle.cellos/cell.conf | grep -i db | wc -l) -eq 1 ]
  then NODE_TYPE=db
IMAGE_VERSION=$(imageinfo -version |tr -d '.'|cut -c1-6)
if [ $NODE_TYPE = "cell" ]
  if [ $IMAGE_VERSION -ge 121211 ]
    then COMMAND_NAME=dbmcli
if [ -n "$COMMAND_NAME" ]
  NAME_ARRAY=$($COMMAND_NAME -e "list alerthistory attributes name where alerttype=stateful and endtime=null" | sed -e 's/^[ \t]*//');
  if [ -z "$NAME_ARRAY" ]
    echo -e "SUCCESS: there are no stateful alerts that have not been cleared."
      NAME_RECORD=$($COMMAND_NAME -e "list alerthistory attributes alertsequenceid,severity,alertMessage,alertAction where name=$INDIVIDUAL_NAME" | tr -s "\t")
      SID=$(echo "$NAME_RECORD" | cut -f2 | tr -s " " | sed -e 's/^[[:space:]]*//')
      SEVERITY=$(echo "$NAME_RECORD" | cut -f3 | tr -s " " | sed -e 's/^[[:space:]]*//')
      MESSAGE=$(echo "$NAME_RECORD" | cut -f4 | tr -s " " | sed -e 's/^[[:space:]]*//')
      ACTION=$(echo "$NAME_RECORD" | cut -f5 | tr -s " " | sed -e 's/^[[:space:]]*//')
      OUTPUT_ARRAY+=$(echo -e "\n";echo -e "SID:\t\t$SID";echo -e "NAME:\t\t$INDIVIDUAL_NAME";echo -e "SEVERITY:\t$SEVERITY";echo -e "MESSAGE:\t$MESSAGE";echo -e "ACTION:\t\t$ACTION")
    echo -e -n "FAILURE: there are one or more stateful alerts that have not been cleared. Details:"
    echo -e "${OUTPUT_ARRAY[@]}"
  echo "alerthistory is not available on database servers at image versions below $NODE_TYPE $IMAGE_VERSION"
The output should be similar to:
SUCCESS: there are no stateful alerts that have not been cleared.
- OR -
alerthistory is not available on database servers at image versions below db 112322
Example of a FAILURE result:
FAILURE: there are one or more stateful alerts that have not been cleared. Details:

SID:            1
NAME:           1_2
SEVERITY:       critical
MESSAGE:        A IO subsystem component is suspected of causing a fault with a 100% certainty. Component Name : /SYS/MB/RISER3/PCIE3 Fault class : Fault message :
ACTION:         For additional information, please refer to This alert occurred while the Management Server was not available and is being sent out on restart of the Management Server. Note the event time may reflect the time when the alert was detected by the Management Server, not the time when the fault occurred. Diagnostic package is attached. It is also accessible at /opt/oracle/dbserver/dbms/deploy/log/scam07adm07_2014_08_11T17_40_33_1_2.tar.bz2

SID:            2
NAME:           2_1
SEVERITY:       critical
MESSAGE:        A processor component is suspected of causing a fault with a 100% certainty. Component Name : /SYS/MB/P0 Fault class : Fault message :
ACTION:         For additional information, please refer to This alert occurred while the Management Server was not available and is being sent out on restart of the Management Server. Note the event time may reflect the time when the alert was detected by the Management Server, not the time when the fault occurred.

If the output is not as expected, examine the full details for each alert that has not been cleared and follow the recommendations.

Check alerthistory for non-test open stateless alerts

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 06/19/19 Vern Wagman Production Exadata - Physical,
Exadata - Management Domain 
ALL 27848031 - exachk
26651210 - exachk
21299794 - exachk 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/A N/A N/A N/A ALL Linux exachk 19.3.0 N/A 
Benefit / Impact:
There are two types of alerts maintained in the alerthistory of a storage or database server, stateful and stateless.
A stateless alert is not cleared automatically. They will not age out of the alerthistory until the alert is manually investigated and the "examinedby" field set manually to a non-null value, typically the name of the person who reviewed the stateless alert and corrected or otherwise acted upon the information provided.
The benefit of checking for for non-test open stateless alerts is faster problem resolution. The impact of correcting any stateless alert that has not been cleared depends upon each individual alert.
Failure to investigate a stateless non-test alert that has not been cleared may result in significant impact, which varies by the particular alert.
Action / Repair:
To verify there are no non-test open stateless alerts, as the root userid on each storage and database server execute the following commands:
unset SID
unset ACTION
if [ $(egrep -i node.type /opt/oracle.cellos/cell.conf | grep -i db | wc -l) -eq 1 ]
  then NODE_TYPE=db
IMAGE_VERSION=$(imageinfo -version |tr -d '.'|cut -c1-6)
if [ $NODE_TYPE = "cell" ]
  if [ $IMAGE_VERSION -ge 121211 ]
    then COMMAND_NAME=dbmcli
if [ -n "$COMMAND_NAME" ]
  NAME_ARRAY=$($COMMAND_NAME -e list alerthistory attributes name where alerttype=stateless and examinedby=\'\' | grep -viw test | sed -e 's/^[ \t]*//' | cut -d" " -f1);
  if [ -z "$NAME_ARRAY" ]
    echo -e "SUCCESS: there are no non-test open stateless alerts."
      NAME_RECORD=$($COMMAND_NAME -e "list alerthistory attributes alertsequenceid,severity,alertMessage,alertAction where name=$INDIVIDUAL_NAME" | tr -s "\t")
      SID=$(echo "$NAME_RECORD" | cut -f2 | tr -s " " | sed -e 's/^[[:space:]]*//')
      SEVERITY=$(echo "$NAME_RECORD" | cut -f3 | tr -s " " | sed -e 's/^[[:space:]]*//')
      MESSAGE=$(echo "$NAME_RECORD" | cut -f4 | tr -s " " | sed -e 's/^[[:space:]]*//')
      ACTION=$(echo "$NAME_RECORD" | cut -f5 | tr -s " " | sed -e 's/^[[:space:]]*//')
      OUTPUT_ARRAY+=$(echo -e "\n";echo -e "SID:\t\t$SID";echo -e "NAME:\t\t$INDIVIDUAL_NAME";echo -e "SEVERITY:\t$SEVERITY";echo -e "MESSAGE:\t$MESSAGE";echo -e "ACTION:\t\t$ACTION")
    echo -e -n "FAILURE: there are one or more non-test open stateless alerts that have not been cleared. Details:"
    echo -e "${OUTPUT_ARRAY[@]}"
  echo "alerthistory is not available on database servers at image versions below $NODE_TYPE $IMAGE_VERSION"
The output should be similar to:
SUCCESS: there are no non-test open stateless alerts.
- OR -
alerthistory is not available on database servers at image versions below db 112322
If the output is not as expected, examine the full details for each name that has not been cleared and follow the recommendations.
Example of a FAILURE result:
FAILURE: there are one or more non-test open stateless alerts that have not been cleared. Details:

SID:            1
NAME:           1
SEVERITY:       critical
MESSAGE:        Critical interrupt detected: . Power cycle forced.
ACTION:         Informational. Diagnostic package is attached. It is also accessible at /opt/oracle/dbserver/dbms/deploy/log/slcc32adm05_2017_10_03T07_14_53_1.tar.bz2
When the underlying issue for a given name is resolved, manually set the "examinedby" field with a command similar to the following (command name is either cellcli or dbmcli, depending upon whether a storage or database server is involved):
CellCLI> alter alerthistory 1 examinedby="jdoe"
Alert 1 successfully altered
Where jdoe is the name of the person who verified the cause of the stateless alert no longer exists, and the number is the name of the stateless alert. Note that double quotes are used around the value to be set, but not the name of the stateless alert.

Verify clusterware state is "Normal"

Alert Level
Engineered System
Exadata-User Domain,

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
11.2.+, 12.1.+
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X5-2

Benefit / Impact:

The Impact of verifying that the clusterware state is "Normal" is minimal. The impact of returning the clusterware state to normal varies depending upon the clusterware state found, and the root cause that lead to the found clusterware state.

    NOTE: The clusterware state, unless an upgrade or patching exercise is in progress, should always be "Normal".


Outside of an active upgrade or patching exercise, having cluster nodes with clusterware states other than "Normal" can lead to problems with disk rebalances, dropping griddisks, and other maintenance operations.

    NOTE: The following operations cannot be performed while the clusterware is in some form of "Rolling" state:

        User invoked disk operations (ex: add, drop, replace, online, offline, undrop, resize, expel)
        Create/Drop Diskgroup
        Voting File Creation/Deletion
        Advancing compatibility
        SP file parameter add/change/remove
        Create/Drop ADVM volume

    NOTE: Outside of an active upgrade or patching exercise, having different cluster nodes report a mix of states, particularly "In Rolling Patch" and "In Rolling Upgrade" is an indication of an incomplete or incorrect upgrade or patching exercise!

Action / Repair:

To verify the clusterware state, execute the following command set as the owner of the clusterware home with the environment properly set to access the ASM instance on each database server:

CLUSTER_STATE=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set head off lines 80 feedback off timing off serveroutput on
if [ `echo $CLUSTER_STATE | wc -w` = 1 ]
  if [ $CLUSTER_STATE = "Normal" ]
      echo -e SUCCESS: the clusterware state is: $CLUSTER_STATE;
      echo -e FAILURE: the clusterware state is: $CLUSTER_STATE;
  echo -e FAILURE: the clusterware state is: $CLUSTER_STATE;

The expected output should be:

SUCCESS: the clusterware state is: Normal

If the output is not as expected, investigate the root cause and correct the condition.

Verify the grid Infrastructure management database (MGMTDB) does not use hugepages
Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain,

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
>= 12.1
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64 el5uek
Linux x86-64 el6uek
Benefit / Impact:
MGMTDB can start on any node within the cluster which makes the configuration and allocation of hugepages more difficult. Verifying that MGMTDB doesn't use hugepages helps to avoid instance start failures because not enough huge pages are available.
The impact of verifying MGMTDB does not use hugepages is minimal. Configuring MGMTDB to not use hugepages requires an instance restart.
If MGMTDB is configured to use hugepages and it starts on a database server where MGMTDB's use of hugepages has not been considered, other database instances may fail to start because not enough hugepages are available, or MGMTDB itself may not acquire hugepages when it fails over to a different database server.
Action / Repair:
To verify MGMTDB does not use hugepages, as the root userid on the database server where MGMTDB is running, execute the following command set:
# Main
v_pmon_pid=$(ps -ef | grep pmon | grep '\-MGMTDB' | awk ' { print $2 } ') # If we have a value continue, else exit - MGMTDB may not be running here.
if [ "${v_pmon_pid}" != '' ]
# Check value we found is a number
expr ${v_pmon_pid} + 1 > /dev/null 2>&1
if [ $? -eq 0 ]
v_hugep_count=$(grep -a -s huge /proc/${v_pmon_pid}/numa_maps 2>/dev/null | grep -a -s dirty | wc -l)
if [ ${v_hugep_count} -gt 0 ]
v_logger_msg="MGMTDB should not be running with hugepages"
echo -e "\nFAILURE: ${v_logger_msg}"
v_logger_msg="MGMTDB is not running with hugepages"
echo -e "\nSUCCESS: ${v_logger_msg}"
v_logger_msg="Unable to find pmon pid for MGMTDB unable to detect if MGMTDB runs with hugepages or not"
echo -e "\nFAILURE: ${v_logger_msg}"
The expected output will be similar to:
SUCCESS: MGMTDB is not running with hugepages
If the output is 'FAILURE', execute the following steps to deconfigure hugepages for MGMTDB as owner of the Grid Infrastructure with Oracle home set to the grid Home and Oracle Sid to -MGMTDB:
[oracle@dbm01 ~]$ sqlplus / as sysdba
SQL> alter system set use_large_pages=FALSE scope=spfile;
[oracle@dbm01 ~]$ srvctl stop mgmtdb -o immediate
[oracle@dbm01 ~]$ srvctl start mgmtdb

Verify the "localhost" alias is pingable

Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain,
Exadata - Management Domain,

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64 el5uek
Linux x86-64 el6uek
Solaris 11
Benefit / Impact:
Many scripts and programs, including patching utilities rely on the "localhost" alias. Verifying the "localhost" alias is pingable helps avoid operational issues or incorrect patch applications.
The impact of verifying the "localhost" alias is pingable is minimal. Changing the "localhost" alias definition does not require a reboot or network restart.
If the "localhost" alias is not pingable operational issues or incorrect patch applications may result.
Action / Repair:
To verify the "localhost" alias is pingable, as the "root" userid on each storage server, database server, and InfiniBand switch, execute the following command set (IPv4 or IPv6 compatible):
v_cmd[0]="ping -c1 localhost"
v_netw_ipv6=$(grep ^NETWORKING_IPV6 /etc/sysconfig/network | awk -F "=" ' { print $2 } ')
if [ "${v_netw_ipv6}" == "yes" ]
# ipv6 detected also check for ip6-localhost
v_cmd[1]="ping6 -c1 ip6-localhost"
# Main
while [ $v_index -lt ${#v_cmd[*]} ]
v_localhostname=$(echo ${v_cmd[$v_index]} | awk ' { print $3 } ')
${v_cmd[$v_index]} > /dev/null 2>&1
if [ $? != 0 ]
v_logger_msg="${v_localhostname} is not pingable by name"
echo -e "\nFAILURE: ${v_logger_msg}"
v_logger_msg="${v_localhostname} is pingable by name"
echo -e "\nSUCCESS: ${v_logger_msg}"
The expected output should be similar to:
SUCCESS: localhost is pingable by name
- OR -
SUCCESS: ip6-localhost is pingable by name
</verbatim> If the output is 'FAILURE' then manually edit /etc/hosts and test to make sure the "localhost" alias definition is a valid entry.
IPv4 example: localhost.localdomain localhost
IPv6 example: localhost.localdomain localhost
::1 ip6-localhost.localdomain ip6-localhost
Verify bundle patch version installed matches bundle patch version registered in database
Alert Level
Exadata, Exalogic, SSC

DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X5-2, X5-8
11.2.x +
Linux, Solaris
Benefit / Impact:
Crosschecking the software bundle patch version installed with the bundle patch registered in the database to make sure they match ensures software correctness and stability. If a bundle patch is being installed in a Data Guard configuration in a standby-first manner where the SQL portion of the bundle patch is not installed inside the database until the primary and all standby software homes have the same version installed, then this crosscheck is expected to fail until both the binary and SQL portion of the bundle patch application is fully installed.
Incomplete bug fixes, software instability, and unexpected behavior
Action / Repair:
To verify that the bundle patch version installed matches bundle patch version registered in database, as the oracle home owner for the primary database, and with ORACLE_SID and ORACLE_HOME properly set, execute the following command:
opatch_bp=$($ORACLE_HOME/OPatch/opatch lspatches 2>/dev/null|grep -iwv javavm|grep -wi database|head -1|awk -F';' '{print $1}');
database_bp_status=$(echo -e "set heading off feedback off timing off \n select ACTION, STATUS from (select * from dba_registry_sqlpatch where PATCH_ID = $opatch_bp order by action_time desc) where rownum=1;"|$ORACLE_HOME/bin/sqlplus -s " / as sysdba" | sed -e '/^ *$/d');
database_bp_status='echo $database_bp_status';
if [ "$database_bp_status" == "APPLY SUCCESS" ];
echo "SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully.";
echo "FAILURE: Bundle patch installed in the database does not match the software home, or is installed with errors.";
The output should be similar to:
SUCCESS: Bundle patch installed in the database matches the software home and is installed successfully.
If FAILURE is reported, then investigate and correct the discrepancy.
NOTE: For versions less than, please see this archived best practice: Verify bundle patch version installed matches bundle patch version registered in database
Verify database is not in DST upgrade state
Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain,
Exadata - Management Domain,
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X5-2, X5-8
11.2.x +
Linux x86-64 el5uek
Linux x86-64 el6uek
Solaris 11
Benefit / Impact:
When the DB timezone is in upgrade mode or inconsistent mode, I/Os issued from DB nodes to cell nodes will not go through smart scan and hence block I/O or passthru will take place instead. This results in cell nodes shipping all blocks rather than blocks of interest (filtered) to the database for qualified scans.
Smart scan will be disabled or do passthru and can cause potential performance issues. If the I/O size is huge it might saturate the RDS traffic and impact the RDA service times along with database performance.
Action / Repair:
To check whether database DST_UPGRADE_STATE is set to anything other than the normal value NONE, as the owner of the oracle home for a given database and with the environment set to access that database, execute the following command set:
DST_UPGRADE_STATE_VALUE=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set head off lines 80 feedback off timing off serveroutput on
select upper(property_value) from sys.database_properties where property_name = 'DST_UPGRADE_STATE';
echo -e "SUCCESS: DB is not in DST upgrade state. \"DST_UPGRADE_STATE\" column value = "$DST_UPGRADE_STATE_VALUE""
echo -e "FAILURE: DB is in DST upgrade state. \"DST_UPGRADE_STATE\" column value = "$DST_UPGRADE_STATE_VALUE""
The expected output should be similar to:
SUCCESS: DB is not in DST upgrade state. "DST_UPGRADE_STATE" column value =  NONE
NOTE: Oracle recommends that database should not be in DST upgrade state under normal operations. Refer to MOS Doc ID 1583297.1 for fixing or closing the DST upgrade state. If DST_UPGRADE_STATE is UPGRADE, PREPARE or DATAPUMP then possibly a prepare or upgrade window or an on-demand or datapump-job loading of a secondary time zone data file is in an active state. A failed or terminated Datapump job can also cause DST_UPGRADE_STATE value to be Datapump(1) which should be fixed. This check could fail if there is an active Datapump job loading a secondary timezone file at the same time.
Verify there are no failed diskgroup rebalance operations
Alert Level
Engineered System
Exadata - Physical, Exadata - User Domain, SSC

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Linux x86-64 el5uek
Linux x86-64 el6uek
Solaris 11
Benefit / Impact:
Verifying there are no failed diskgroup rebalance operations helps to ensure that all diskgroups have the chosen redundancy. The impact of correcting any failed diskgroup rebalance operations depends upon the error responsible for the failure.
A failed diskgroup rebalance operation could leave the diskgroup without the proper redundancy, exposing the diskgroup to a loss of data if another partner disk fails.
Action / Repair:
To verify there are no failed diskgroup rebalance operations, as the owner of the grid home and with the environment set to access one ASM instance, execute the following command set:
REBALANCE_ERROR='$ORACLE_HOME/bin/sqlplus -s "/ as sysasm" << EOF
set head off pagesize 0 timing off serveroutput on feedback off
select group_number,error_code from gv\\$asm_operation where error_code is not null and upper(state) not in ('DONE','WAIT','RUN');
if [ -z 'echo $REBALANCE_ERROR | tr -d ' \t\n\r\f'' ]
echo -e "\nSUCCESS: There were no failed rebalance operations found.\n"
echo -e "\nFAILURE: Failed rebalance operations were found:\n"
The output should be similar to:
SUCCESS: There were no failed rebalance operations found.
If the output is not "SUCCESS...", investigate the reported errors and correct appropriately.
Verify the CRS_HOME is properly locked
Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain,

DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, EIGHTH, X4-2, X4-8, X5-2, X5-8
Linux x86-64
Benefit / Impact:
The CRS_HOME should be locked properly after patching.
The CRS_HOME not being locked properly may result in permissions being wrongly set as well as files not being instantiated.
Action / Repair:
To verify the CRS_HOME is properly locked, as the "root" userid on each database server execute the following command set:
export CRS_HOME=$(awk -F: '/^+ASM[0-9].*/{printf "%s\n", $2}' /etc/oratab)
CRS_CHECK=$(stat -c %U $CRS_HOME);
if [[ $CRS_CHECK == "root" ]];
then echo -e "SUCCESS:CRS Home is locked.";
else echo -e "WARN:CRS Home is NOT locked."
The expected output should be:
SUCCESS:CRS Home is locked.
If the output is not "SUCCESS...", open an SR and work with Oracle Support to determine the root cause and proper corrective action.

Verify storage server data (non-system) disks have no partitions
PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFAIL01/27/2016<Name>ProductionExadata - Physical,
Exadata - Management Domain,
SSC, Exalogic, Exalytics,
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, EIGHTH, X3-8, X4-2, X5- x86-64exachk
Benefit / Impact:
Verifying that storage server data (non-system) disks have no partitions helps avoid an outage or data loss.
The impact of verifying that storage server data (non-system) disks have no partitions is minimal. The impact of correcting storage server data (non-system) disks that have partitions varies according to the reason for the partitions and the state of the device, and cannot be estimated here.
During a storage server reboot, for storage server data (non-system) disks that have partitions, the partitions may become not visible to the operation system, and therefore unusable.
Action / Repair:
To verify that storage server data (non-system) disks have no partitions, as the "root" userid, execute the following command set on each storage server:
unset report_command  
SYS_DISKS=`cellcli -x -e "list lun attributes deviceName where isSystemLun = TRUE"`     
SYS_DISK0=`echo $SYS_DISKS|cut -f1 -d' '`     
SYS_DISK1=`echo $SYS_DISKS|cut -f2 -d' '`     
if [ -z "$OSS_SCRIPTS_HOME" ]; then     
   report_command=$(echo "$report_command\nEnvironment variable OSS_SCRIPTS_HOME is not defined")     
if [ ! -f $DISK_DEV ]; then     
   report_command=$(echo "$report_command\nFile $DISK_DEV does not exists")     
    DATA_DISKS=`$DISK_DEV 2 |grep -v $SYS_DISK0 |grep -v $SYS_DISK1`     
    for disk in $DATA_DISKS; do     
       if [ $size -eq 9 ]; then     
       parted -s $disk print 1>&2 >/dev/null     
       if [ $? -eq 0 ]; then     
          failDiskCount=`expr $failDiskCount + 1`     
    if [ $failDiskCount -eq 0 ]; then     
      report_command=$(echo "$report_command\nAll data disks have no partitions")     
      report_command=$(echo "$report_command\nThe following disks have partitions:")     
      report_command=$(echo "$report_command\n ${disks[@]}")     
      report_command=$(echo "$report_command\nAssociated griddisks needs to be removed from diskgroups")     
      report_command=$(echo "$report_command\nRebalance should complete before replacing/reformatting this device.")     
echo -e "$report_command"
The expected output should be:
All data disks have no partitions

If data disks with partitions are discovered, they will be echoed back. If the output is not as expected, investigate for root cause and take appropriate corrective action.
NOTE: For additional information, please see: Exadata: Problems introduced when replacing a physical disk having a foreign partition table (Doc ID 1965314.1).

Verify db_unique_name is used in I/O Resource Management (IORM) interdatabase plans
PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalWARN02/24/2016<Name>ProductionExadata - Physical,
Exadata - User Domain,
SSC, Exalogic, Exalytics,
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
11g, 12cPrimary
Physical Standby
X2-2(4170), X2-2, X2-8, X3-2, EIGHTH, X3-8, X4-2, X5- - 11
Linux x86-64
Benefit / Impact:
Starting with Oracle Exadata Storage Server software version, IORM will no longer support using "db_name" in the inter-database IORM plan directive if the directive does not contain the "role" attribute. Existing customers who may be using "db_name" need to be alerted to this change.
NOTE: even though the effective version level for this change is, this check should be performed on versions prior to that, and the situation resolved, to avoid any issues immediately after the upgrade.
If the inter-database IORM plan is not updated to use "db_unique_name", IORM may not manage that database as defined in the plan since the mapping will not be correct. DB, PDB and CG metrics for that database will also be impacted.
Action / Repair:
To determine if an existing IORM interdatabase plan requires modification, repeat the following process for all databases:
As the "root" userid on one storage server accessed by the target database, check if an interdatabase plan has been configured. If the count is non-zero, an interdatabase plan has been configured.
cellcli -e "list iormplan attributes dbplan detail" | grep "name=" | wc -l
NOTE: If no IORM interdatabase plan is configured, no further checking is required.
As the database home owner userid, execute the following to determine if "db_name"; is distinct from "db_unique_name":
$ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set head off lines 80 feedback off timing off serveroutput on
select VALUE from v\$parameter where name = 'db_name' and VALUE != (select VALUE from v\$parameter where name = 'db_unique_name');

NOTE: if no rows are returned, "db_name" is not distinct and no further checking is required.
If an IORM interdatabase plan is configured and the "db_name" is distinct, as the "root" userid on one storage server accessed by the target database, execute the following (correctly substituting the target database db_name value) to query the IORM plan and check if it contains any directive using the "db_name" without the "role" attribute:
cellcli -e "list iormplan attributes dbplan detail" | grep -i "name=<target database db_name value>" | grep –v “role=” | wc –l
If the number of lines returned is non-zero, the interdatabase IORM plan directive needs to be updated to use the target database "db_unique_name" value.
NOTE: Also review "Ensure db_unique_name is unique across the enterprise".

Verify Datafiles are Placed on Diskgroups consisting of griddisks with cachingPolicy = DEFAULT
Alert LevelDateOwnerStatus
Engineered System
CriticalWARNING08/04/2015      <Name>ProductionExadata, AVM 
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool Version
11.2.x+N/AV2, X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-811.2.3.2+Linux x86-64 UEK5.8
Benefit / Impact:
Datafiles should be placed in diskgroups consisting of griddisks with their cachingPolicy set to DEFAULT. The cachingPolicy attribute determines if flashcache is used for blocks stored on the griddisk. When cachingPolicy is set to DEFAULT, then flashcache is used; when cachingPolicy is set to NONE, then flaschache will not be used for any blocks stored on the griddisk. Per Oracle best practices, Exadata is configured with cachingPolicy set to NONE for griddisks in the RECO diskgroup and set to DEFAULT (to use flashcache) for the DATA diskgroup. Oracle does not recommend storing datafiles in the RECO diskgroup or any other diskgroup that has its cachingPolicy set to NONE.
You will not get the benefit of flashcache and may see greater I/O and related waits and/or higher hard disk utilization than is expected.
Action / Repair:
First, determine if you have placed datafiles onto a diskgroup that has its cachingPolicy set to NONE. Do this by creating a small script as follows and set its execute permission; in this example the script is called "" :
# $1 = cell to check
CELL_CP=$( ssh root@$CELL cellcli -e list griddisk attributes name, cachingpolicy,asmDiskGroupName where cachingpolicy=NONE  | awk '{print $3}'  | sort -u )
if [ -n "$CELL_CP" ]; then
   for i in $( echo -e $CELL_CP )
      file_part=$( echo -e "+$i%" )
      RETVAL1=`sqlplus -silent / as sysdba <<EOF
      set linesize 250 pagesize 10000 feedback off heading off echo off show off verify off
      set serveroutput on
      var vDG varchar2(2000)
         :vDG := '$file_part';
      select count(1) from v\\\$datafile where name like :vDG;
      RETVAL="$(echo $RETVAL1 |tr '\n' ' ')"
      if [ "$RETVAL" -gt "0" ]; then
         echo "FAIL : There are $RETVAL datfiles stored on griddisks in the $CELL_CP diskgroup with cachingPolicy=none"
         exit 1
         echo "SUCCESS : There are NO datafiles stored on griddisks with cachingPolicy=none "
   echo "SUCCESS : There are NO griddisks with cachingPolicy=none "
Set your shell environment to the ORACLE_HOME, ORACLE_SID, etc to allow sqlplus to log on and then run the script against a single cell by calling it like this:
$ ./ exacel01
The expected output should be similar to:
SUCCESS : There are NO datafiles stored on griddisks with cachingPolicy=none
If any cell has an output of "FAIL ...", the corrective action is to review which files are on the diskgroup reported by the script and ensure their placement in that diskgroup was intentional. The following query will show the specific datafiles:
select name from v$datafile where name like '<DISKGROUP LISTED IN THE COMMAND OUTPUT>%';
For example, if command returned the following:
FAIL : There are 2  datfiles stored on griddisks in the RECOC1 diskgroup with cachingPolicy=none
the diskgroup to use in the query is +RECOC1, and the query would be:
select name from v$datafile where name like '+RECOC1%';
The script should be executed across all cells and repeated for each database instance you're interested in checking. If you have a list of cells stored in a file such as /home/oracle/cell_group, you can check all of the cells like this:
 for c in $( cat /home/oracle/cell_group );
     echo "Now checking cell $c ...";
     ./ $c;

Verify all datafiles are placed on griddisks that are cached on flash disks

PriorityAlert LevelDateOwnerStatusEngineered System
CriticalWARNING02/18/2016<Name>ProductionExadata - Physical,
Exadata - User Domain,
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool Version
11.2.x+N/AV2, X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5- x86-64exachk
Benefit / Impact:
Datafiles should be placed in diskgroups consisting of griddisks with their cachedBy attribute that are set to a list of flash disks. The cachedBy attribute determines if flashcache is used for blocks stored on the griddisk. When cachedBy is set to a list of flash disks, then flashcache is used; when cachedBy is not set, then flaschache will not be used for any blocks stored on the griddisk. Per Oracle best practices, Exadata is configured with cachedBy set to NULL for griddisks in the RECO diskgroup and set to the list of flash disks (to use flashcache) for the DATA diskgroup. Oracle does not recommend storing datafiles in the RECO diskgroup or any other diskgroup that has one or more of its griddisks with cachedBy unset.
You will not get the benefit of flashcache and may see greater I/O and related waits and/or higher hard disk utilization than is expected.
Action / Repair:
First, determine if you have placed datafiles onto a diskgroup that has cachedBy unset. Do this by creating a small script as follows and set its execute permission; in this example the script is called "" :
# $1 = cell to check
FLASH_MODE=$( ssh root@$CELL cellcli -e  'list cell attributes flashCacheMode' | grep -i -c writeback )
CELL_CBY=$( ssh root@$CELL cellcli -e 'list griddisk attributes name,cachedby,asmDiskGroupName where cachedby\=\"\" ' | awk '{print $2}'  | sort -u )

if [ -n "$CELL_CBY" ] && [ "$FLASH_MODE" -eq "1" ]; then

   for i in $( echo -e $CELL_CBY )
      echo "Diskgroup ${i} has griddisks with unset CachedBy attributes....checking if any datafiles are present... "

      file_part=$( echo -e "+$i%" )

      RETVAL1=`sqlplus -silent / as sysdba <<EOF

      set linesize 250 pagesize 10000 feedback off heading off echo off show off verify off
      set serveroutput on

      var vDG varchar2(2000)
         :vDG := '$file_part';

      select count(1) from v\\\$datafile where name like :vDG;

      RETVAL="$(echo $RETVAL1 |tr '\n' ' ')"

      if [ "$RETVAL" -gt "0" ]; then
         echo "FAIL : There are $RETVAL datafiles stored on griddisks in the ${i} diskgroup that are not cached by flash (have cachedBy attribute unset for at least one griddisk)"
         echo "SUCCESS : There are NO datafiles stored on griddisks with cachedBy unset in the ${i} diskgroup "

   if  [ "$FLASH_MODE" -eq "1" ]; then
       echo "SUCCESS :  There are NO datafiles stored on griddisks with cachedBy unset "
       echo "SUCCESS :  Cell is in WRITETHROUGH flashcache mode - test does not apply."

Set your shell environment to the ORACLE_HOME, ORACLE_SID, etc to allow sqlplus to log on and then run the script against a single cell by calling it like this:

$ ./ exacel01

The expected output when a cell is in WriteBack flashcache mode should be:

SUCCESS :  There are NO datafiles stored on griddisks with cachedBy unset 

The expected output when a cell is in WriteThrough flashcache mode should be:

SUCCESS :  Cell is in WRITETHROUGH flashcache mode - test does not apply.

If any cell has an output of "FAIL ...", the corrective action is to review which files are on the diskgroup reported by the script and ensure their placement in that diskgroup was intentional. The following query will show the specific datafiles:

select name from v$datafile where name like '<DISKGROUP LISTED IN THE COMMAND OUTPUT>%';

For example, if command returned the following:

FAIL : There are 3  datafiles stored on griddisks in the RECOC1 diskgroup that are not cached by flash (have cachedBy attribute unset for at least one griddisk)

the diskgroup to use in the query is +RECOC1, and the query would be:

select name from v$datafile where name like '+RECOC1%';

The script should be executed across all cells and repeated for each database instance you're interested in checking. If you have a list of cells stored in a file such as /home/oracle/cell_group, you can check all of the cells like this:

 for c in $( cat /home/oracle/cell_group ); do echo "Now checking cell $c ..."; ./ $c; done 

Validate key sysctl.conf parameters on database servers
Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8
Benefit / Impact:
Kernel parameter settings in /etc/sysctl.conf are applied to the kernel automatically at boot time and manually via the sysctl utility at runtime. The semantics of each kernel parameter are known only to the kernel, so the sysctl utility passes all values directly to the kernel with minimal processing and validation. Invalid values can be misinterpreted by the kernel, leading to unexpected results. For certain key parameters, such invalid values can have an immediate and critical impact on the system. Invalid values stored in /etc/sysctl.conf at boot time can prevent the system from booting, making it difficult to identify and correct the problem. Validating the format of some key parameters periodically or after changes to sysctl.conf can prevent unexpected outages due to human error.
Applying improperly formatted or incorrect value settings to kernel parameters can render a system unusable.
Action / Repair:
Key sysctl.conf parameters on database servers vary by Exadata software version level, hardware type, and whether or not virtualization is used. exachk runs the appropriate checks based upon the discovered environment configuration. To validate Key sysctl.conf parameters on database servers, run exachk and review the provided report.
The expected output in the exachk report should be as follows:

In the "Findings Passed" summary section of the report, the overall result should be "PASS":
PASS   OS Check   sysctl.conf parameters on database servers are configured as recommended   All Database Servers   View
In the "View" detail section of the report for each individual database server:
Status on randomadm01:
PASS => sysctl.conf parameters on database servers are configured as recommended 
All sysctl.conf formatting checks succeeded
If there are issues discovered, the overall result will be "FAIL" and more information will be listed in the "View" detail section. Investigate the reported issues for root cause and take appropriate corrective action.
NOTE: If after corrective actions are completed, you wish to run just this review manually without a full exachk run, as the "root" userid in the directory in which exachk was installed, execute the following:
./exachk -check 018D274D1212689AE05313C0E50AB893

Detect duplicate files in /etc/*init* directories

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalWarning04/06/16<Name>ProductionExadata - Physical,
Exadata - User Domain,
Exadata - Management Domain
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
n/an/aX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8AllLinux x86-64exachk 
Benefit / Impact:
It happens administrators backup contents of /etc/init before updating a database node.
Directories with names such as /etc/init122_old can be created with duplicate startup files in it - files that already exist in /etc/init.
Making sure no duplicate startup files exist is helping in preventing against boot failures.
The impact of verifying /etc/*init* contents is minimal. The impact of correcting the duplicate contents zero.
At boot time the Operating System traverses through all directories in /etc starting with the word "init" to execute startup scripts, duplicate files can cause startup scripts to be executed multiple times which fails the boot process.
Action / Repair:
Execute the following command as the "root" userid on all database servers:
v_dupe_cnt=$(find  /etc/*init* -type f -exec basename {} \;  | sort | uniq -c | grep -v  "^[ \t]*1 " | wc -l);
if [ $v_dupe_cnt -gt 0 ]
  echo -e "FAILURE:  Duplicate content found in /etc/init* directories";  
  echo -e  "SUCCESS:  No duplicate content found in /etc/init* directories"; 
The expected output should be:
SUCCESS:  No duplicate content found in /etc/init* directories 
A "FAILURE" message would be as follows:
FAILURE: Duplicate content found in /etc/init* directories 
If output is a "FAILURE" message, run the following command to identify the duplicate files. Remove (or move) the duplicate files found in the /etc/*init* directories to another location (out of /etc):
find /etc/*init* -type f -exec basename {} \;  | sort | uniq -c | grep -v "^[ \t]*1 " 

Verify Database Server Quorum Disks configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 05/29/19 <Name>Production Exadata - Physical,
Exadata - User Domain, SSC 
ALL 28496580 - exachk
27274882 - exachk
25306232 - exachk
23065735 - exachk
27067655 - OEDA 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section and UP ASM N/A N/A ALL Linux, Sparc exachk 19.3.0 N/A 
Benefit / Impact:
The configuration of Quorum Disks for any High Redundancy diskgroup using less than five failgroups, provides following benefits:
  • When storing Voting disks, protects the Grid Infrastructure in the event of a double partner storage failure or an event involving Exadata storage server being offline due to planned maintenance and a subsequent partner storage failure.
  • Expanding diskgroups to use a higher number of failgroups and the subsequent shrinking to use less than five failgroups, will avoid the diskgroup dismount during planned or unplanned maintenance. This is due to changes introduced in bug 26199003
  • Without this feature, voting files get stored in a normal redundancy diskgroup on Exadata racks with less than 5 storage servers which makes the Grid Infrastructure vulnerable to a cluster outage if multiple vote disks are inaccessible.
  • Diskgroups used on a flex configuration (expanding/shrinking) are exposed to be dismounted during planned or unplanned maintenance.
Action / Repair:

NOTE: This check will only pass if the following are all true:
1) /opt/oracle.SupportTools/quorumdiskmgr exists on the db nodes
2) The GI BP version is above
3) At least one HIGH redundancy diskgroup exists
4) Quorum disks on DB nodes are implemented when there are less than 5 storage cells in the high redundancy disk group.
5) All HIGH redundancy diskgroups contain quorum disks
6) If the number of cells is greater than or equal to 5, all the voting files are in the cells
NOTE WELL:For a complete picture, please also reference: Verify all voting disks are online
To verify the database server quorum disks configuration, run exachk and review the provided report.
The expected output in the exachk report should be as follows:
The overall result should be "PASS" or "WARNING" or "FAIL":
In the "View" detail section of the report for this check the expected output should be similar to:

Voting File redundancy check Passed

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   11ccca4125424fb1bfec2180a22e24cb (/dev/exadata_quorum/QD_DATAC1_SCAQAE05ADM01VM01) [DATAC1]
 2. ONLINE   5da7f33dc5f64f64bfb2b756787a6b48 (o/; [DATAC1]
 3. ONLINE   1eefa3ec1ebc4fd3bf8933ca0c587e13 (o/; [DATAC1]
 4. ONLINE   6d65ea6de3eb4fcebf3e7984d62d51b9 (/dev/exadata_quorum/QD_DATAC1_SCAQAE05ADM02VM01) [DATAC1]
 5. ONLINE   de0d94da4fc94f57bf2a12dbc46a3603 (o/; [DATAC1]
Located 5 voting disk(s).
In the "View" detail section of the report for this check a "WARNING" example will be similar to:

A database server quorum disk configuration is not applicable to this system because no high redundancy diskgroups were found.
High redundancy is a MAA best practice.  
For details, see

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   5da7f33dc5f64f64bfb43434787a6b48 (o/; [RECOC1]
 2. ONLINE   1eefa3ec1dffe4d3bf8933ca0c587e13 (o/; [RECOC1]
 3. ONLINE   de0d94da4fc94f52332dr2dbc46a3603 (o/; [RECOC1]
Located 3 voting disk(s).
In the "View" detail section of the report for this check a "FAILURE" example will be similar to:

A database server quorum disk configuration is applicable to this system.
But an optimal Quorum disk setup is not found as seen below.
An optimal quorum disk setup should include 2 quorum disks along with 5 voting files, with 2 of the voting files placed on the 2 quorum disks and the 3 remaining voting files on 3 different cells.

##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   5da7f33dc5f64f64bfb43434787a6b48 (o/; [RECOC1]
 2. ONLINE   1eefa3ec1dffe4d3bf8933ca0c587e13 (o/; [RECOC1]
 3. ONLINE   de0d94da4fc94f52332dr2dbc46a3603 (o/; [RECOC1]
Located 3 voting disk(s).
If the result is a "FAILURE..." message, follow the steps provided to add database server quorum disks in the "Adding Quorum Disks to Database Servers" section of the "Oracle® Exadata Database Machine Maintenance Guide"
NOTE: If after corrective actions are completed, you wish to run this one check without a full exachk run execute the following command as the "root" userid in the directory in which exachk was installed:
./exachk -check 339FE456FBDC3549E0530D98EB0AD21F
Verify Oracle Clusterware files are placed appropriately
Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
Any supported version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Any supported version
Linux x86-64

Benefit / Impact:
Oracle Clusterware files should always be placed in a high redundancy diskgroup with the exception of voting files for the following cases.
i) For environments with less than 5 storage cells and running any Exadata software release prior to, the voting files need to be placed in a normal redundancy diskgroup.
ii) For environments with less than 5 storage cells , running any Exadata software release or above and running any Oracle Grid Infrastructure version prior to, the voting files need to be placed in a normal redundancy diskgroup.
Oracle Clusterware files placed on a normal redundancy diskgroup are exposed to the risk of of being lost in the event of diskgroup failures due to a double partner storage failure. Having the clusterware files on a high redundancy diskgroup mitigates this risk. The voting files are the only Clusterware files that are mandated to be stored in a normal redundancy diskgroup under the 2 conditions mentioned above. However, even if we lose the voting files due to a double partner storage failure under the above 2 conditions, they can be easily recreated unlike all other Clusterware files which require restore from backups.
Action / Repair:
Execute the script provided below as the Grid Infrastructure owner to check if the Clusterware files are placed appropriately.
# #
# Purpose: Check the placement of Oracle CLusterware Files #
# #
## Function declarations
export GRID_HOME=$(grep ^"+ASM" /etc/oratab|awk -F ":" '{print $2}')
export ORACLE_SID=$(grep ^"+ASM" /etc/oratab|awk -F ":" '{print $1}')
echo "Usage: [-o check|report] [-h]";
HighRedExists=$($GRID_HOME/bin/asmcmd lsdg --suppressheader|awk '{print $2}'|grep -q HIGH && echo "1")
if [ "$HighRedExists"x == "x" ]
OCRdgName=$($GRID_HOME/bin/ocrcheck|grep "Device/File Name"|awk -F":" '{print $2}'|awk -F"+" '{print $2}')
OCRDGRedundancy=$($GRID_HOME/bin/asmcmd lsdg --suppressheader $OCRdgName|awk '{print $2}'|grep -q HIGH && echo "1")
if [ "$OCRDGRedundancy"x == "x" ]
OCRRec="Please relocate the OCR to a high redundancy diskgroup using $GRID_HOME/bin/ocrconfig as described in the link below\n"
ASMspfiledgName=$($GRID_HOME/bin/asmcmd spget|awk -F"/" '{print $1}'|awk -F"+" '{print $2}')
ASMspfileDGRed=$($GRID_HOME/bin/asmcmd lsdg --suppressheader $ASMspfiledgName|awk '{print $2}'|grep -q HIGH && echo "1")
if [ "$ASMspfileDGRed"x == "x" ]
ASMspRec="Please relocate the ASM spfile to a high redundancy diskgroup using '$GRID_HOME/bin/asmcmd spcopy -u' as described in the link below.\nAfter relocating the spfile, if possible restart the Grid Infrastructure in a rolling manner.\nIf a rolling grid infrastructure restart is not permitted, repeat the steps for relocating the spfile to the high redundancy diskgroup every time an initialization parameter modification to the ASM spfile is required until the Grid Infrastructure is restarted in a rolling manner.\n"
ASMpwfiledgName=$($GRID_HOME/bin/srvctl config asm|grep "Password"|awk -F":" '{print $2}'|awk -F"/" '{print $1}'|awk -F"+" '{print $2}')
ASMpwfileDGRed=$($GRID_HOME/bin/asmcmd lsdg --suppressheader $ASMpwfiledgName|awk '{print $2}'|grep -q HIGH && echo "1")
if [ "$ASMpwfileDGRed"x == "x" ]
ASMpwRec="Please relocate the ASM passwordfile to a high redundancy diskgroup using '$GRID_HOME/bin/asmcmd pwmove' as described in the link below.\n"
Dgsfound=$($GRID_HOME/bin/asmcmd lsdg |awk -F"/" '{print $1}'|awk '{print $13,$2}')
if [[ $HighRedExists -eq 1 ]]
if [[ $OCRHighRedundancy -eq 0 ]] || [[ $ASMspfileHighRed -eq 0 ]] || [[ $ASMpwfileHighRed -eq 0 ]]
repText="\nClusterware files placement check failed. \nThe clusterware files are not all placed in a high redundancy diskgroup.\n"
repCmdOutput0="The Diskgroups found are \n=========================\n $Dgsfound\n"
repCmdOutput1="$(echo "OCR is stored in :" $OCRdgName)\n"
repCmdOutput2="$(echo "ASM spfile is stored in :" $ASMspfiledgName)\n"
repCmdOutput3="$(echo "ASM password file is stored in :" $ASMpwfiledgName)\n"
repText="\nClusterware files placement check passed\n"
repCmdOutput0="The Diskgroups found are \n============================\n $Dgsfound\n"
repCmdOutput1="$(echo "OCR is stored in :" $OCRdgName)\n"
repCmdOutput2="$(echo "ASM spfile is stored in :" $ASMspfiledgName)\n"
repCmdOutput3="$(echo "ASM password file is stored in :" $ASMpwfiledgName)\n"
repText="\nClusterware files placement check passed\n"
repCmdOutput0="The Diskgroups found are \n============================\n $Dgsfound\n"
repCmdOutput1="$(echo "OCR is stored in :" $OCRdgName)\n"
repCmdOutput2="$(echo "ASM spfile is stored in :" $ASMspfiledgName)\n"
repCmdOutput3="$(echo "ASM password file is stored in :" $ASMpwfiledgName)\n"
echo $exit_code
echo -e $repText
echo -e "$repCmdOutput0"
echo -e "$repCmdOutput1"
echo -e "$repCmdOutput2"
echo -e "$repCmdOutput3"
if [ $exit_code -ne 0 ]
[ -z "$OCRRec" ] || echo -e "$OCRRec\n$OCRRecLink"
[ -z "$ASMspRec" ] || echo -e "$ASMspRec\n$ASMspRecLinks"
[ -z "$ASMpwRec" ] || echo -e "$ASMpwRec\n$ASMpwRecLink"
if [ $NumArgs -lt 1 ]
echo "Invalid or missing command line arguments..."
exit 1
while getopts "o:h" opt;
case "${opt}" in
h) usage;
exit 0
*) echo "Invalid or missing command line arguments..."
exit 1
if [ $swch == "check" ]
elif [ $swch == "report" ]
echo "Invalid or missing command line arguments..."
exit 1
The expected output is:
SUCCESS: Clusterware files placement check passed
- OR -
WARNING: Clusterware files placement check failed. The clusterware files are not all placed in a high redundancy diskgroup.

Verify "_reconnect_to_cell_attempts=9" on database servers which access X6 storage servers
Alert Level
Engineered System
Exadata - User Domain,
Exadata - Physical,
<23713702>- exachk
<23713702>- exachk
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Linux x86-64,
Benefit / Impact:
For optimal high availability, the cellinit.ora parameter file on database servers which access X6 storage servers must contain "_reconnect_to_cell_attempts=9".
The impact of verifying the this setting is minimal. The impact of adding the parameter to the cellinit.or file on the database servers is minimal, but after including the parameter on the database side, the cell server process (CELLSRV) on each X6 storage server must be restarted to activate the change.
If the cellinit.ora parameter file on database servers which access X6 storage servers does not contain "_reconnect_to_cell_attempts=9" brownout duration may be lengthened.
Action / Repair:
EXAchk runs the appropriate validation based upon the discovered environment configuration, run EXAchk and review the provided report.
The expected output in the EXAchk report should be as follows:
In the "Findings Passed" summary section of the report, the overall result should be "PASS":
PASS OS Check _reconnect_to_cell_attempts parameter in cellinit.ora is set to recommended value All Database Servers View
In the "View" detail section of the report for each individual database server:
Status on randomadm01:
PASS => _reconnect_to_cell_attempts parameter in cellinit.ora is set to recommended value
If the parameter is not set as expected, the overall result will be "FAIL" and more information will be listed in the "View" detail section.
To correct a "FAIL" result, do:
1) As the "root" userid on each database server that requires correction, edit the cellinit.ora file with vi and add "_reconnect_to_cell_attempts=9".
2) As the "root" userid on each storage server that communicates with the database servers in 1), restart the cell server process.
NOTE: If after corrective actions are completed, you wish to run just this verification without a full EXAchk run, as the "root" userid in the directory in which EXAchk was installed, execute the following:
./exachk -check 39E9CC7370B42BF6E0530E98EB0AC7A5

Verify passwordless SSH connectivity for Enterpise Manager (EM) agent owner userid to target component userids
Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Linux x86-64
Benefit / Impact:
EM agent monitoring requires passwordless SSH connectivity between the userid running the EM agent on each database server where an EM agent is running and specific userids for each target component that particular EM agent is monitoring. Component replacement or other maintenance work may destroy the passwordless SSH configuration and cause monitoring to fail.
Users would not be notified if there are issues on the EM target components.
Action / Repair:
To verify that the necessary passwordless SSH exists, do the following on each database server where an EM agent is running:
1. Determine which database servers have EM agents installed using the EM console.
2. For each EM agent, determine the components for which it is responsible to monitor in the agent home page of the EM console.
3. Login to each database server where an EM agent is running as the operating system userid that launched the EM agent and execute the following for each monitored component determined in 1) and 2):

For a database server EM target:
ssh -o 'PreferredAuthentications=publickey' <AGENT OS USERID>@<Database_Server_Name> "echo Success"
For a storage server EM target:
ssh -o 'PreferredAuthentications=publickey' cellmonitor@<Storage_Server_Name> "echo Success"
For an InfiniBand switch EM target:
ssh -o 'PreferredAuthentications=publickey' nm2user@<IB_Switch_Name> "echo Success"
For a Cisco switch EM target:
ssh -o 'PreferredAuthentications=publickey' admin@<Cisco_Switch_Name> "echo Success"
For each component, the expected output should be:
If "Permission denied (publickey,gssapi-with-mic,password)" is returned then the ssh configuration is not correct.

For a database server:
4. To correct the "Permission denied..." case:
a. Check to see if /home/oracle/.ssh/id_dsa and files exist on the affected agent host. If either file does not exist follow the steps in: Enterprise Manager Oracle Exadata Database Machine Getting Started Guide, Chapter 8: Troubleshooting the Exadata Plug-in, Section: Establish SSH Connectivity
b. If so append the contents of /home/oracle/.ssh/ on the computer node host to /home/oracle/.ssh on the affected database3 server(s).
c. Ensure the permission on /home/oracle/.ssh/authorized_keys is set to 600 and owned by the oracle user

For a storage server:
4. To correct the "Permission denied..." case:
a. Check to see if /home/oracle/.ssh/id_dsa and files exist on the affected agent host. If either file does not exist follow the steps in: Enterprise Manager Oracle Exadata Database Machine Getting Started Guide, Chapter 8: Troubleshooting the Exadata Plug-in, Section: Establish SSH Connectivity
b. If so append the contents of /home/oracle/.ssh/ on the agent host to /home/cellmonitor/.ssh on the affected storage server(s).
c. Ensure the permission on /home/cellmonitor/.ssh/authorized_keys is set to 600 and owned by the cellmonitor user

For an InfiniBand switch:
4. To correct the "Permission denied..." case:
a. Check to see if /home/oracle/.ssh/id_dsa and files exist on the affected agent host. If either file does not exist follow the steps in: Enterprise Manager Oracle Exadata Database Machine Getting Started Guide, Chapter 8: Troubleshooting the Exadata Plug-in, Section: Establish SSH Connectivity
b. If so append the contents of /home/oracle/.ssh/ on the agent host to /home/nm2user/.ssh/authorized_keys on the affect IB switch(s).
c. Ensure the permission on /home/nm2user/.ssh/authorized_keys is set to 600 and owned by the nm2user user

For the Cisco switch:
4. To correct the "Permission denied..." case:
a. Check to see if /home/oracle/.ssh/id_dsa and files exist on the affected agent host. If either file does not exist follow the steps in: Enterprise Manager Oracle Exadata Database Machine Getting Started Guide, Chapter 8: Troubleshooting the Exadata Plug-in, Section: Establish SSH Connectivity

Login to the switch as admin and issue the commands below to add keys

Switch hostname>enable
Switch hostname#configure terminal
Switch hostname(config)#ip ssh pubkey-chain
Switch hostname(conf-ssh-pubkey)#username admin
Switch hostname(conf-ssh-pubkey-user)#key-string
Switch hostname(conf-ssh-pubkey-data)#< Enter you keyfile contents here >
Switch hostname(conf-ssh-pubkey-data)#< Enter your keyfile contents here >

** The key may need to be entered on multiple lines as the maximum line length is 254 characters.

Now exit the switch
Switch hostname(conf-ssh-pubkey-data)#exit
Switch hostname(conf-ssh-pubkey-user)#exit
Switch hostname(conf-ssh-pubkey)#exit
Switch hostname(config)#exit
Switch hostname#exit
5. Repeat step 2 and verify connectivity
If some message other than "Success" or "Permission denied...." is returned, investigate for root cause based on the message keywords and take corrective action.
Check /EXAVMIMAGES on dom0s for possible over allocation by sparse files
Alert Level
Engineered System
Engineered System
Exadata - Management Domain
Bug 25688952 - Exachk
Bug 25520385 - Exachk
DB Version
DB Type
DB Role
DB Mode
Exadata Version
OS & Version
Validation Tool Version
MAA Scorecard Section
To use dom0 disk space efficiently, two space saving techniques are used for disk image files in /EXAVMIMAGES, sparse files and reflinks. Sparse files do not allocate blocks on disk for empty space. OCFS2 reflinks allow disk image copies to share blocks on disk until one of the copies changes, at which time a new block on disk is allocated. The result of these space saving features is the amount of disk space consumed is less than the apparent size of the user domain disk image files reported by the "du -sS --apparent-size " command. However, as a user domain is used and files are changed, created, and removed, the disk space consumed from the /EXAVMIMAGES file system will continually grow while the actual space used by disk image files could remain the same. This check warns when the total apparent size of all files in /EXAVMIMAGES exceeds the size of file system.
Impact: The impact of this check is minimal
A failure does not occur when the apparent size exceeds the size of the /EXAVMIMAGES file system. It may be normal in many environments that benefit from sparse files and reflinks heavily. However, over time as changes are made to user domain disks (e.g. by applying Exadata, Grid Infrastructure, or Database patches), allocated space in the /EXAVMIMAGES file system increases. If the allocated space reaches /EXAVMIMAGES file system size in dom0, then an out of space error will occur within the user domain, even though df output within the user domain shows there is available space. This can cause unpredictable behavior, such as an unbootable user domains, or corrupted files that were being changed at the time the out of space error occurred.
Action/Repair: Execute the script as root on a dom0.
To validate /EXAVMIMAGES on dom0s for possible over allocation by sparse files, run exachk and review the provided report.
The expected output in the exachk report should be as follows:

In the "Findings Passed" summary section of the report, the overall result should be "PASS":
PASS   OS Check   /EXAVMIMAGES on dom0s has enough free space   All Database Servers   View
In the "View" detail section of the report for each individual database server:
Status on randomadm01:
PASS => /EXAVMIMAGES on dom0s has enough free space


/EXAVMIMAGES space has not been over allocated and the space usage is under the threshold.
If there are issues discovered, the overall result will be "FAIL" and more information will be listed in the "View" detail section. Investigate the reported issues for root cause and take appropriate corrective action.
NOTE: If after corrective actions are completed, you wish to run this one check without a full exachk run execute the following command as the "root" userid in the directory in which exachk was installed:
./exachk -check 3F15EA417EBB5C15E0530A98EB0A8124

  Verify active kernel version matches expected version for installed Exadata Image

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL11/28/18<Name> ProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
ALL28826182 - exachk
26337714 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/A12. or higherLinuxexachk 18.1.4N/A
Benefit / Impact:
Beginning with Exadata version, the "imageinfo" command includes data on the active kernel version and the expected kernel version for the installed version of the Exadata image. The active and expected kernel versions should match.
Having an active kernel version that does not match the expected version could adversely impact upgrade operations.
Action / Repair:
To verify active kernel version matches expected version for the installed Exadata image, as the "root" userid on each database server, execute the following command set:
RAW_DATA=$(imageinfo | egrep "Kernel|kernel")
ACTIVE_KERNEL_VERSION=$(echo "$RAW_DATA" | egrep "Kernel" | cut -d":" -f2 | cut -d"#" -f1 | tr -d '[[:space:]]')
EXPECTED_KERNEL_VERSION=$(echo "$RAW_DATA" | egrep "kernel" | cut -d":" -f2 | tr -d '[[:space:]]')
AKV_OFFSET=$(echo "$ACTIVE_KERNEL_VERSION" | egrep -b -o "\.el" | cut -d":" -f1)
EKV_OFFSET=$(echo "$EXPECTED_KERNEL_VERSION" | egrep -b -o "\.el" | cut -d":" -f1)
     echo -e "SUCCESS: The kernel versions match:\n"
     echo -e "Active kernel version:\t\t$ACTIVE_KERNEL_VERSION_SHORT"
     echo -e "Expected kernel version:\t$EXPECTED_KERNEL_VERSION_SHORT"
     echo -e "FAILURE: The kernel versions should match:\n"
     echo -e "Active kernel version:\t\t$ACTIVE_KERNEL_VERSION_SHORT"
     echo -e "Expected kernel version:\t$EXPECTED_KERNEL_VERSION_SHORT" 
The expected output should be similar to:
SUCCESS: The kernel versions match:

Active kernel version:          2.6.39-400.284.1
Expected kernel version:        2.6.39-400.284.1
Example of a "FAILURE" message:
FAILURE: The kernel versions should match:

Active kernel version:          2.6.39-400.284.1
Expected kernel version:        2.6.39-500.284.1
If a "FAILURE: ..." message appears, corrective actions will depend upon the kernel versions and the reasons for which the mismatch was introduced. Please open an SR for diagnostic and corrective assistance.

Verify Storage Server user "CELLDIAG" exists

PriorityAlert LevelDateOwnerStatusEngineered System   Bug(s)      
CriticalFAIL10/26/16<Name>ProductionExadata - Physical,
Exadata - Management Domain
25520477 - exachk
24958292 - exachk
Reference: 23039723
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-812. x86-64exachk TBD 
Benefit / Impact:
Beginning with Exadata Storage Server Software version, the storage server user "CELLDIAG" is created during deployment which allows access to diagnostics without using a more privileged user. The benefit of creating and using the "CELLDIAG" user is improved security. The impact of verifying that the "CELLDIAG" user is created is minimal, as is the impact of creating the user if it does not exist.
Not creating and using the storage server user "CELLDIAG" fails to utilize a security improvement.
Action / Repair:
To Verify the storage server user "CELLDIAG" exists, as the "root" userid storage server, execute the following command set:
USER=`cellcli -e list user where name = 'CELLDIAG'`
if [ $RET -eq 0 -a -n "$USER" ]; 
  echo "SUCCESS: CELLDIAG user exists"
   echo "FAILURE: CELLDIAG user does not exist"
The expected output should be similar to:
Example of a "FAILURE" message (there is no output from the command--the absence of the CELLDIAG output is the failure condition):
FAILURE: CELLDIAG user does not exist
If a "FAILURE: ..." message appears, create the user and role on each cell in cellcli using commands like these:
create user CELLDIAG password="SomeGood42Password";  
create role celldiagrole;  
grant privilege create on diagpack to role celldiagrole;  
grant privilege list on diagpack to role celldiagrole;  
grant privilege download on diagpack to role celldiagrole;  
grant role celldiagrole to user CELLDIAG;
NOTE: The "CELLDIAG" user is created during the Exadata Storage Server Software version or higher deployment process. It is not created during an upgrade from an older release.

NOTE: the user detail for a properly configured "CELLDIAG" userid should look like:
CellCLI> list user CELLDIAG detail
         name:                   CELLDIAG
         roles:                  role=celldiagrole
                                 object=diagpack, verb=create, attributes=all attributes, options=all options
                                 object=diagpack, verb=download, attributes=all attributes, options=all options
                                 object=diagpack, verb=list, attributes=all attributes, options=all options

NOTE: Creation of the "CELLDIAG" storage server user is not mandatory. The automatic diagnostic gathering process continues to function without it and the packaged diagnostics are accessed using one of the other storage server users.

  Verify installed rpm(s) kernel type match the active kernel version

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
ProductionExadata - Physical,
Exadata - User Domain
ALL28740049 - exachk
26396389 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.4.0N/A
Benefit / Impact:
Verifying installed rpm(s) kernel type match the active kernel version helps avoid update failures due to dependency conflicts between older rpm versions and newer versions being installed. The impact of verifying that installed rpm(s) kernel type match the active kernel version is minimal. The impact of correction depends upon why the mismatched rpm(s) was/were installed and cannot be estimated here.
If installed rpm(s) kernel type do not match the active kernel, there may be update interruptions caused by dependency conflicts between older rpm versions and newer versions being installed.
To verify the installed rpm(s) kernel type match the active kernel version, execute the following code as the "root" userid on each database server:
UNAME_DATA=$(uname -r)
START=$(echo "$UNAME_DATA" | awk 'END{print index($0,"el")}')
END=$(expr $START + 2)
case "$KERNEL_TYPE" in
    MISMATCHED_RPMS=$(rpm -aq | egrep "\.el5|\.el6")
    MISMATCHED_RPMS=$(rpm -aq | grep "\.el5|\.el7")
    MISMATCHED_RPMS=$(rpm -aq | grep "\.el6|\.el7")
    ERROR_MESSAGE=$(echo "Unrecognized kernel type:  $KERNEL_TYPE")
if [ -n "$ERROR_MESSAGE" ]
  echo -e "\nFAILURE:  $ERROR_MESSAGE"
  if [ -n "$MISMATCHED_RPMS" ]
  else    MISMATCH_COUNT=0
  if [ -z "$MISMATCHED_RPMS" ]
    echo -e "\nSUCCESS:  There were no mismatched rpms found.\n\nKernel type:\t\t$KERNEL_TYPE\nMismatch count:\t\t$MISMATCH_COUNT"
    echo -e "\nFAILURE:  One or more mismatched rpms were found.\n\nKernel type:\t\t$KERNEL_TYPE\nMismatch count:\t\t$MISMATCH_COUNT\nMismatched rpms:\n$MISMATCHED_RPMS"

The expected output should be similar to:

SUCCESS:  There were no mismatched rpms found.

Kernel type:            el6
Mismatch count:         0

Examples of "FAILURE" results:

FAILURE:  One or more mismatched rpms were found.

Kernel type:            el5
Mismatch count:         37   
Mismatched rpms:
<output truncated>

FAILURE:  Unrecognized kernel type:  25.el

If the output is not "SUCCESS", investigate for root cause and take corrective action based on root cause findings. 

Verify Flex ASM Cardinality is set to "ALL"

Alert Level
Engineered System
Exadata - Physical,
Exadata - User Domain
- exachk
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Linux x86-64
Benefit / Impact:
By default, Flex ASM cardinality is set to 3. The impact of verifying that Flex ASM Cardinality is set to "ALL" is minimal. The impact of setting the Flex ASM cardinality to "ALL" from a lower value is minimal and can be done online; ASM will bring up the additional instances required to fullfil the cardinality setting.
Not having Flex ASM cardinality set to "ALL" could result in a higher number of client (DB) connections on some ASM instances and may result in longer client reconnection times should an ASM instance crash.
Action / Repair:
To verify Flex ASM Cardinality is set to "ALL", as the Oracle home owner userid with the environment properly set, execute the following command set on one database server in the cluster where an ASM instance is executing:
RAW_DATA=$($ORACLE_HOME/bin/srvctl config asm -detail)
FLEX_MODE=$($ORACLE_HOME/bin/asmcmd showclustermode | cut -d" " -f6)
if [ "$FLEX_MODE" = "disabled" ]
echo -e "INFO: ASM is not in Flex mode: $FLEX_MODE, check not executed."
CARDINALITY=$(echo "$RAW_DATA" | grep count | cut -d" " -f4)
if [ "$CARDINALITY" = "ALL" ];
echo -e "SUCCESS: Flex ASM cardinality is set to: $CARDINALITY."
echo -e "FAILURE: Flex ASM cardinality is set to: $CARDINALITY.\n\n$RAW_DATA"
The expected output should be:
SUCCESS: Flex ASM cardinality is set to: ALL.
-- OR --
INFO:  ASM is not in Flex mode: disabled, check not executed. 
Example of a "FAILURE" message:
FAILURE: Flex ASM cardinality is set to: 3.

ASM home: <CRS home> 
Password file: +DBFS_DG/orapwASM 
Backup of Password file:  
ASM listener: LISTENER ASM is enabled. 
ASM is individually enabled on nodes:  
ASM is individually disabled on nodes:  
ASM instance count: 3 Cluster 
If a "FAILURE: ..." message appears, adjust the Flex ASM cardinality to "ALL" using the following command:
srvctl modify asm -count ALL
After making the change to ASM cardinality, verify that each node has an ASM instance running using the following command:
$ srvctl status asm -detail | grep "is running"
ASM is running on exadb06,exadb05,exadb08,exadb07,exadb02,exadb01,exadb04,exadb03
ASM instance +ASM2 is running on node exadb02  
ASM instance +ASM1 is running on node exadb01  
ASM instance +ASM4 is running on node exadb04  
ASM instance +ASM3 is running on node exadb03  
ASM instance +ASM5 is running on node exadb05  
ASM instance +ASM6 is running on node exadb06  
ASM instance +ASM7 is running on node exadb07  
ASM instance +ASM8 is running on node exadb08

Verify "downdelay" is correctly set for bonded client interfaces

Alert Level
Engineered System
Exadata - Physical,
Exadata - Management Domain
Bug 25520669 - exachk
   Bug 25144261 - exachk
DB Version
DB Role
Engineered System Platform
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8
Linux x86-64
exachk TBD
Benefit / Impact:
When using the default "downdelay" settings, an undesired VIP failover or brownout may be seen depending upon the timing of a single client network interface failure. To avoid this possibility, the "downdelay" parameter of the client network interface should be set to 2000 when using active-backup mode bonding and to 200 when using LACP mode bonding.
The impact of verifying "downdelay" attributes for bonded client interfaces is minimal. The recommended corrective action includes a reboot.
Not verifying "downdelay" attributes for bonded client interfaces increases the risk of unwanted VIP failover or brownouts in the event of a single client network interface failure.
To verify "downdelay" attributes for bonded client interfaces, as the root userid execute the script below on each database server:

#                                                                            #
#  Purpose: Check downdelay is set appropriately for the bonded interfaces          #
#                                                                            #
## Variable declarations
## Function Definitions
  echo "Usage: [-o check|report] [-h]";
while read bonintf
  bondingType=$(grep "^Bonding Mode:" $bonintf|awk -F ":" '{print $2}')
  downdelaySet=$(grep "^Down Delay (ms):" $bonintf|awk '{print $4}')
  if [ "${bondingType}" == " fault-tolerance (active-backup)" ]
    if [ $downdelaySet -ne $downDelayActiveBackup ]
      downdelayFailMsgTmp="Down delay not set to 2000 for the active-backup bonded interface $(echo $bonintf|awk -F"/" '{print $NF}')"
      downdelayFailMsg=$(printf "$downdelayFailMsgTmp\n$downdelayFailMsg")
  elif [ "${bondingType}" == " IEEE 802.3ad Dynamic link aggregation" ]
    if [ $downdelaySet -ne $downDelayLACP ]
      downdelayFailMsgTmp="Down delay not set to 200 for the LACP bonded interface $(echo $bonintf|awk -F"/" '{print $NF}')"
      downdelayFailMsg=$(printf "$downdelayFailMsgTmp\n$downdelayFailMsg")
  if [ $exit_code -eq 0 ]
    downdelayPassMsg="Down delay correctly set to correct value(s) for all bonded interfaces"
done << EOF
$(ls -1 /proc/net/bonding/bondeth*)
  echo $exit_code
  if [ $exit_code -eq 0 ]
    printf "\n$downdelayPassMsg\n"
      printf "\n$downdelayFailMsg\n"
if [ $NumArgs -lt 1 ]
  echo "Invalid or missing command line arguments..."
  exit 1
while getopts "o:h" opt;
  case "${opt}" in
    h) usage;
       exit 0
    *) echo "Invalid or missing command line arguments..."
       exit 1
if [ $swch = "check" ]
elif [ $swch == "report" ]
  echo "Invalid or missing command line arguments..."
  exit 1
The expected output should be:
Down delay correctly set to correct value(s) for all bonded interfaces
Example of a failure:
Down delay not set to 2000 for the active-backup bonded interface bondeth0
If failures are reported, as the root userid on the database server which has the failure, execute the following command followed by a reboot:
For active-backup mode - sed -i 's/downdelay=<existing value>/downdelay=2000/' /etc/sysconfig/network-scripts/ifcfg-<client network interface name>
For LACP mode - sed -i 's/downdelay=<existing value>/downdelay=200/' /etc/sysconfig/network-scripts/ifcfg-<client network interface name>
NOTE: It is possible to temporarily set the value in the active kernel as the root userid using this command:
echo 2000 > /sys/class/net/<client network interface name>/bonding/downdelay - For active-backup bonding
echo 200 > /sys/class/net/<client network interface name>/bonding/downdelay - For LACP bonding
However, this will not survive a reboot. The "sed" command followed by a reboot is the preferred method.
Verify ExaWatcher is executing

PriorityAlert LevelDate OwnerStatusEngineered SystemBug(s)
CriticalFAIL02/15/17<Name> ProductionExadata - Physical,
Exadata - User Domain,
Bug 25543623 - exachk
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD, X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, SL611. x86-64,
Sparc Linux

Benefit / Impact:
ExaWatcher collects data on key metrics for both database and storage servers, which can be used for both troubleshooting and performance analysis. There is minimal impact to verify that ExaWatcher is executing, or from starting ExaWatcher if it is not executing.
If ExaWatcher is not executing, valuable data for analysis is not collected.
Action / Repair:
To verify that ExaWatcher is executing, as the "root" userid execute the following command set on each database and storage server in the cluster:
NUM_OF_EXAWATCHERS=$(ps -ef | grep -i exawatcher | grep -v grep | wc -l)
if [[ $NUM_OF_EXAWATCHERS -gt 0 ]]
  echo -e "SUCCESS: ExaWatcher is executing.  Number of processes: $NUM_OF_EXAWATCHERS"
  echo -e "FAILURE: ExaWatcher is not executing.  Number of processes: $NUM_OF_EXAWATCHERS"
The output should be similar to:
SUCCESS: ExaWatcher is executing.  Number of processes: 15
NOTE: The number of processes may vary depending upon the site-specific configuration.
If ExaWatcher is not executing, please refer to the "System Diagnostics Data Gathering with sosreports and Oracle ExaWatcher" section of the "Oracle® Exadata Storage Server Software User's Guide" that is for your specific installed version of Oracle Exadata Storage Server software.
Verify non-Default services are created for all Pluggable Databases
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalWARN02/15/17Frank KobylanskiProductionExadata - Physical,
Exadata - Management Domain,
Exadata - User Domain
ALLBug 25520385 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
Benefit / Impact:
Oracle recommends that non-default services should be created for application and end user access to pluggable databases (PDBs). This provides access control along with automated opening of the PDB as part of container database (CDB) startup.
PDBs may not open automatically at instance startup and applications and users may have access to PDBs through default services at inappropriate times.
Action / Repair:
Note that only PDBs that are open and not in MIGRATE/UPGRADE mode will be checked. Since a PDB may not be open on all instances the following script should be executed on each instance of each CDB.
To verify that all PDBs in a CDB have at least one non-default service created for them, as the CDB ownerid on each database server:
                      1. Set your environment for a CDB
                      2. Run the script below
Repeat steps 1 and 2 for each CDB running on the database server, then move onto the next database server.
PDB_SERVICES=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set head off lines 80 feedback off timing off serveroutput on
select name from v\$pdbs p
where not in ('PDB\$SEED','CDB\$ROOT')
 and p.open_mode not in ('MOUNTED','MIGRATE')
 and not in (select s.pdb from containers(service\$) s
                    where bitand(s.flags,128) != 128
                     and deletion_date is null
                     and != ('SYS.SCHEDULER\$_EVENT_QUEUE')
                     and not like ('SYS\$%'));
if [ `echo $PDB_SERVICES| grep ORA- | wc -w` = 0 ]
  if [ `echo $PDB_SERVICES| wc -w` = 0 ]
      echo -e SUCCESS: all open PDBs have non-default services defined or there are no open PDBs;
      echo -e WARNING: the following open PDBs do not have non-default services defined: $PDB_SERVICES;
  echo -e WARNING: Issues were detected while trying to access the database: $PDB_SERVICES;
If the all PDBs that can be checked have non-default services defined, the following be returned:
SUCCESS: all open PDBs have non-default services defined or there are no open PDBs
If there are PDBs found that do not have non-default services defined for them, a message similar to the following will be returned.
WARNING: the following open PDBs do not have non-default services defined: TESTPDB4 TESTPDB2 TESTPDB3 TESTPDB5 TESTPDB1
To resolve the warning, create services for these PDBs using either:
                    1. srvctl in Grid Infrastructure or Oracle Restart based environments
                    2. The DBMS_SERVICE.create_service package in environments where srvctl is not available.
Verify Automatic Storage Management Cluster File System (ACFS) file systems do not contain critical database files

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 08/14/19 Irfan Alvi Production Exadata - Physical,
Exadata - User Domain 
ALL 29411526 - exachk
26268345 - exachk
26143661 - OEDA 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section or higher ASM N/A N/A ALL Linux X86-64 exachk 19.3.0 N/A 
Benefit / Impact:
ACFS disk groups created on Exadata should not contain any critical database files to isolate operational maintenance and configuration changes.
The impact of verifying (ACFS) file systems do not contain critical database files is minimal and can be done online.
The impact of moving critical database files out of ACFS disk groups varies by the type of file involved, and cannot be estimated here.

NOTE: For more information on ACFS use cases and recommended disk group attributes on Exadata, please see: Oracle ACFS Support on Oracle Exadata Database Machine (Linux only) (Doc ID 1929629.1)
Any ACFS maintenance or configuration change could potentially impact the availability of database files residing on the same disk group as ACFS.
Action / Repair:
To verify (ACFS) file systems do not contain critical database files, run exachk and review the provided report.
The expected output in the exachk report should be as follows:
In the "View" detail section of the report for this check the expected output should be similar to:
Example of a "FAILURE" message: Output in the exachk report
In the "View" detail section of the report for this check a "FAILURE" example will be similar to:
If a "FAILURE: ..." message appears, either relocate ACFS to a new dedicated disk group following How to Relocate an ACFS Filesystem to Another Diskgroup in Exadata (Doc ID 2133396.1) MOS note or move the database files out of the ACFS disk group.
NOTE: If after corrective actions are completed, you wish to run just this check manually without a full exachk run, as the "root" userid in the directory where exachk was installed, execute the following:
 Verify the ownership and permissions of the "oradism" file
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL07/12/17<Name> ProductionExadata - Physical,
Exadata - User Domain
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool Version
Benefit / Impact:
Maintaining the correct ownership and permissions of the "oradism" file is essential for the proper operation of Direct NFS and achieving the highest possible throughput. The file should be owned the the "root" userid and have the setuid bit enabled in the permissions mask. The impact of validating file ownership and permission is minimal. Changing the file ownership and permissions requires a restart of the Oracle stack running out of the adjusted $ORACLE_HOME.
If the ownership and permissions of the "oradism" file are not correct, the performance of Direct NFS will be severely impacted.
Action / Repair:
To verify the ownership and permissions of the "oradism" file, as the appropriate oracle home owner userid on each database server, execute the following command set on each $ORACLE_HOME:
OWNER_USERID=$(ls -l $ORACLE_HOME/bin/oradism |awk '{print $3}')
SETUID_BIT=$(ls -l $ORACLE_HOME/bin/oradism | cut -c4)
DETAIL=$(echo -e "owner userid:\t$OWNER_USERID\nsetuid bit:\t$SETUID_BIT")
if [[ $OWNER_USERID = "root" && $SETUID_BIT = "s" ]]
  echo -e "SUCCESS: \"oradism\" file is correctly configured:\n$DETAIL"
  echo -e "FAILURE: \"oradism\" file is not correctly configured:\n$DETAIL"
The output should be similar to:
SUCCESS: "oradism" file is correctly configured:
owner userid:   root
setuid bit:     s
Examples of "FAILURE" results:
FAILURE: "oradism" file is not correctly configured:
owner userid:   root
setuid bit:     x

FAILURE: "oradism" file is not correctly configured:
owner userid:   oracle
setuid bit:     x
If the output is a "FAILURE" result, investigate and take corrective action.

Verify the SYSTEM, SYSAUX, USERS and TEMP tablespaces are of type bigfile

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL07/12/17<Name>ProductionExadata - Physical,
Exadata - User Domain
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool Version
Benefit / Impact:
Configuring the SYSTEM, SYSAUX, USERS, and TEMP tablespaces to be of type bigfile simplifies maintenance and operations which involve these tablespaces. The impact of verifying the SYSTEM, SYSAUX, USERS, and TEMP tablespaces are of type bigfile is minimal.
If the SYSTEM, SYSAUX, USERS, and TEMP tablespaces are not of type bigfile, maintenance operations are more complicated and a tablespace running out of free space is more possible.
Action / Repair:
To verify the SYSTEM, SYSAUX, USERS, and TEMP tablespaces are of type bigfile, as the ORACLE_HOME owner userid on one database server in the cluster, execute the following command set once for each database running out of a given ORACLE_HOME, with the environment properly configured to access each given database:
BIGFILE_DATA=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set newpage none lines 80 feedback off timing off serveroutput on
SELECT tablespace_name, bigfile FROM dba_tablespaces
WHERE tablespace_name in ('SYSTEM', 'SYSAUX', 'USERS', 'TEMP');
if [ `echo "$BIGFILE_DATA" | grep -ic "NO"` -gt 0 ]
     echo -e "FAILURE: One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile:\n\n$BIGFILE_DATA"
     echo -e "SUCCESS: SYSTEM, SYSAUX, USERS, TEMP tablespaces are of type bigfile:\n\n$BIGFILE_DATA" 
The output should be similar to:
SUCCESS: the SYSTEM, SYSAUX, USERS, and TEMP tablespaces are of type bigfile:

TABLESPACE_NAME                BIG
------------------------------ ---
SYSTEM                         YES
SYSAUX                         YES
TEMP                           YES
USERS                          YES
Examples of a "FAILURE" result:
FAILURE: One or more of SYSTEM, SYSAUX, USERS, TEMP tablespaces are not of type bigfile:

TABLESPACE_NAME                BIG
------------------------------ ---
SYSTEM                         NO
SYSAUX                         NO
TEMP                           NO
USERS                          NO
If the output is a "FAILURE" result, investigate and take corrective action.

Verify the storage servers in use configuration matches across the cluster

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL12/19/18<Name>ProductionExadata - Physical,
Exadata - User Domain
ALLBug 29061438 - exachk
Bug 27541151 - exachk
Bug 26365216 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.5.0N/A
Benefit / Impact:
Verifying the storage servers in use configuration matches across the cluster can prevent potential issues ranging from impaired performance to a node eviction.
The impact of verifying the storage servers in use configuration matches across the cluster. The impact of making corrections varies depending upon the root cause of the difference.
If the storage servers in use configuration does not match across the cluster, there is risk of impaired performance, node eviction, and perhaps data loss with multiple hardware failures over time.
Action / Repair:
NOTE: This check will only pass if the following are both true:
1) For each database server, the md5sum for the cellip.ora file matches the md5sum from the list of storage servers accessed by kfod.
2) The md5sum from 1) matches across the cluster.
To verify the storage servers in use configuration matches across the cluster, run exachk and review the provided report.
The expected output in the exachk report should be as follows:
In the "Cluster Wide" section of the report, the overall result should be "PASS":
PASS   Cluster Wide Check   The storage servers in use configuration matches across the cluster   Cluster Wide   View
In the "View" detail section of the report for this check the expected output should be similar to:
SUCCESS: The storage servers in use configuration matches:
DBSRVR:                 <Host Name>
DBSRVR_CELLIP_MD5SUM:   d2144e88f4249a5d267691b85ed2ae49
DBSRVR_KFOD_MD5SUM:     d2144e88f4249a5d267691b85ed2ae49
DBSRVR_BASE_MD5SUM:     d2144e88f4249a5d267691b85ed2ae49
DBSRVR:                 <Host Name>
DBSRVR_CELLIP_MD5SUM:   d2144e88f4249a5d267691b85ed2ae49
DBSRVR_KFOD_MD5SUM:     d2144e88f4249a5d267691b85ed2ae49
DBSRVR_BASE_MD5SUM:     d2144e88f4249a5d267691b85ed2ae49
A "FAILURE" example:
In the "Cluster Wide" section of the report, the overall result will be "FAIL":
FAIL    Cluster Wide Check   The storage servers in use configuration should match across the cluster   Cluster Wide   View
In the "View" detail section of the report for this check the expected output should be similar to:
FAILURE: The storage servers in use configuration does not match:
DBSRVR:                 randomadm01vm01
DBSRVR_CELLIP_MD5SUM:   acd6ad6d153ea1ec1ecf9a5aa19cf4a7
DBSRVR_KFOD_MD5SUM:     d41d8cd98f00b204e9800998ecf8427e
DBSRVR_BASE_MD5SUM:     acd6ad6d153ea1ec1ecf9a5aa19cf4a7
DBSRVR:                 randomadm02vm01
DBSRVR_CELLIP_MD5SUM:   acd6ad6d153ea1ec1ecf9a5aa19cf4a7
DBSRVR_KFOD_MD5SUM:     d41d8cd98f00b204e9800998ecf8427e
DBSRVR_BASE_MD5SUM:     acd6ad6d153ea1ec1ecf9a5aa19cf4a7
NOTE: In the "FAILURE:" example, the md5sum for the results reported from kfod on the running system does not match the cellip.ora md5sum.
If the result is not as expected, investigate for root cause and take appropriate corrective action.
NOTE: If after corrective actions are completed, you wish to run this one check without a full exachk run execute the following command as the "root" userid in the directory in which exachk was installed:
./exachk -check 5D6AC87BF4669BF2E053D498EB0AFC19,5D691B1A8146F67CE053D398EB0A8822

Verify "asm_power_limit" is greater than zero

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalCRITICAL07/26/17<Name>ProductionExadata - Physical,
Exadata - User Domain
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool Version
Benefit / Impact:
Setting "asm_power_limit=0" disables rebalance operations. Verifying that "asm_power_limit" is greater than zero confirms that rebalance operations are enabled. The impact of verifying that "asm_power_limit" is greater than zero is minimal, as is the impact of setting it to a value greater than zero.
NOTE: Changing the default value via the initialization parameter "asm_power_limit" is not the same as changing the power for an actively running rebalance operation.
"asm_power_limit=0" disables rebalance operations, which can lead to data loss in the event of multiple hardware failures over time.
Action / Repair:
To verify "asm_power_limit" is greater than zero, as the grid home owner userid, execute the following command set once for each ASM instance with the environment properly configured to access that given instance:
NOTE: This code will not execute properly if executed on a database server in a flex ASM environment where an ASM instance is not running.
ASMPL_PARAM_DATA=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set newpage none heading off lines 80 feedback off timing off serveroutput on
select value from v\$parameter where name = 'asm_power_limit';
ASMPL_QUEUE_DATA=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set newpage none heading off lines 80 feedback off timing off serveroutput on
select count(*) from gv\$asm_operation where power=0 or actual=0;
if [[ $ASMPL_PARAM_DATA -gt 0 && $ASMPL_QUEUE_DATA -eq 0 ]]
  echo -e "SUCCESS: \"asm_power_limit\" is set to $ASMPL_PARAM_DATA and there are no rebalance operations in gv\$asm_operation with the attribute POWER or ACTUAL = 0"
  echo -e "FAILURE:"
  if [ $ASMPL_PARAM_DATA -eq 0 ]
    echo -e "The intitialization parameter \"asm_power_limit\" is set to zero"
  if [ $ASMPL_QUEUE_DATA -gt 0 ]
    echo -e "There are rebalance operation(s) in gv\$asm_operation with the attribute POWER or ACTUAL = 0"
The output should be similar to:
SUCCESS: "asm_power_limit" is set to 32 and there are no rebalance operations in gv$asm_operation with the attribute POWER or ACTUAL = 0
Examples of "FAILURE" results:

The intitialization parameter "asm_power_limit" is set to zero
There are rebalance operation(s) in gv$asm_operation with the attribute POWER or ACTUAL = 0

The intitialization parameter "asm_power_limit" is set to zero

There are rebalance operation(s) in gv$asm_operation with the attribute POWER or ACTUAL = 0

If the output is a "FAILURE" result, investigate and take corrective action.
Verify the recommended patches for Adaptive features are installed
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical INFO 06/05/19 <Name> Production Exadata - Physical,
Exadata - User Domain 
Exadata 29849595 - exachk
26681554 - exachk 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section only Normal, CDB, PDB Primary, Physical Standby Open ALL Linux exachk 19.3.0 N/A 
Benefit / Impact:
Adaptive features are a set of capabilities that enable the optimizer to make run-time adjustments to execution plans and to adjust plans for future executions based on the results of previous executions. For Oracle version only, to maximize performance and reliability it is recommended that the default configuration for 12.2.x be used. Installing patches 22652097 and 21171382 configures those defaults.
Without patches 22652097 and 21171382 Oracle version may experience poor performance and potential instability.
Action / Repair:
To verify the recommended patches for Adaptive features are installed, as the owner userid of a given Oracle home, and with the environment set to access that Oracle home on each database server, execute the following code set:
opatch_return_code=$($ORACLE_HOME/OPatch/opatch lsinventory -oh $ORACLE_HOME -local >/dev/null 2>&1;echo $?)
if [ $opatch_return_code -eq 0 ]
  RAW_LSPATCHES=$($ORACLE_HOME/OPatch/opatch lsinventory -oh $ORACLE_HOME -local -bugs_fixed 2>&1)
  RAW_LSPATCHES=$(cat $ORACLE_HOME/inventory/ContentsXML/comps.xml);
IS_22652097_PRESENT=$(echo "$RAW_LSPATCHES" | grep -wc 22652097)
IS_21171382_PRESENT=$(echo "$RAW_LSPATCHES" | grep -wc 21171382)
if [[ $IS_22652097_PRESENT -eq 1 && $IS_21171382_PRESENT -eq 1 ]]
  echo -e "SUCCESS: patches 22652097 and 21171382 are installed in $ORACLE_HOME"
  echo -e "INFO: patches 22652097 and 21171382 are not installed in $ORACLE_HOME"
The expected output should be:
SUCCESS: patches 22652097 and 21171382 are installed in /u01/app/oracle/product/
Example of a "INFO:" result:
INFO: patches 22652097 and 21171382 are not installed in /u01/app/oracle/product/
If the output is not as expected, install the recommended patches.

Verify initialization parameter cluster_database_instances is at the default value
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System PlatformBug(s)
CriticalFAIL11/08/2017<Name>ProductionExadata - physical
Exadata - User Domain
ALL  Bug 27055638 - Exachk
Bug 26844705 - base
GI/DB VersionDB TypeDB RoleDB ModeExadata VersionOS & Version Validation Tool VersionMAA Scorecard Section
 < 19.1 ALLALLOPENALLLinuxexachk
Benefit / Impact:
cluster_database_instances should not be changed from the default value for performance and stability. The impact of verifying initialization parameter cluster_database_instances is at the default value is minimal. The impact of removing a set value should include a database restart to make sure the change survives database shutdown and startup.
If cluster_database_instances is modified from the default, dynamic remastering can be impacted potentially causing poor performance or stability.
Action / Repair:
To verify cluster_database_instances is at the default value, as the owner of the oracle home for a given database and with the environment set to access that database, execute the following command set:
ISDEFAULT_VALUE=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set head off lines 80 feedback off timing off serveroutput on
select upper(isdefault) from v\$parameter where name ='cluster_database_instances';
     echo -e "SUCCESS: cluster_database_instances is at the default value"
     echo -e "FAILURE: cluster_database_instances should be at the default value: \"isdefault\" column value = "$ISDEFAULT_VALUE"" 

The expected output should be:
SUCCESS: cluster_database_instances is at the default value
Example of a "FAILURE" result:
FAILURE: cluster_database_instances should be at the default value: "isdefault" column value = FALSE

To correct a failure condition, with the environment properly set to access the target database, unset cluster_database_instances database parameter using
SQL> alter system reset cluster_database_instances scope=spfile sid='*';
Restart the instance and verify the change survives startup and shutdown.

Verify the database server NVME device configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System PlatformBug(s)
CriticalFAIL11/29/2017<Name> ProductionX7-8ExadataBug 27123748 - exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section

Benefit / Impact:
Proper configuration of NVME devices is necessary for reliable and efficient operation of a database server. The impact of verifying the database server NVME device configuration is minimal. The impact of making any required corrections or adjustment varies depending upon the root issue, and cannot be estimated here.
An improper NVME device configuration could lead to unreliable operation, poor performance, or impact upgrade operations.
Action / Repair:
NOTE: This check will pass on a database server only if the following are both true:
1) There are four NVME devices discovered.
2) Every device has a status of "normal".
To verify the database server NVME device configuration, as the "root" userid, execute the following code set on each database server:

RAW_OUTPUT=$(dbmcli -e "list physicaldisk attributes name,status")
# Is count correct?
if [ $(echo "$RAW_OUTPUT" | wc -l) -eq 4 ]
# Is the status normal?
if [ $(echo "$RAW_OUTPUT" | awk '{print $2}' | grep -icv normal) -eq 0 ]
# Analyze:
if [[ $(echo $COUNT_CORRECT) -eq 1 && $(echo $STATUS_NORMAL) -eq 1 ]]
  echo "SUCCESS: The NVME device configuration is correct."
  echo -e "FAILURE: The NVME device configuration is not correct.\nDetails:\n$RAW_OUTPUT"
The expected output should be:
SUCCESS: The NVME device configuration is correct.
Example of a "FAILURE:" result:
FAILURE: The NVME device configuration is not correct.
         FLASH_15_1      failed - dropped for replacement
         FLASH_15_2      failed - dropped for replacement
         FLASH_1_1       normal
         FLASH_1_2       normal
NOTE: The "FAILURE:" example is such because two devices have failed and been dropped.
If the output is not as expected, determine root cause and take appropriate correct action for same.

Verify that Automatic Storage Management Cluster File System (ACFS) uses 4K metadata block size

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL01/17/18<Name>ProductionExadata - Physical,
Exadata - User Domain
ALLBug 27298631 - exachk
Bug 27403057 - OEDA
DB/GI VersionDB TypeDB RoleDB Mode Exadata VersionOS & VersionValidationTool VersionMAA Scorecard Section or higherASMN/AN/AALLLinuxexachk 18.2.0N/A



Benefit / Impact:
Starting with Grid Infrastructure, Oracle ACFS supports I/O requests in multiples of 4K logical sector sizes as well as continued support for 512-byte logical sector size I/O requests. The size of the metadata blocks is not set directly, but derived from the logical sector size. Using a 4k metadata block size helps improve performance and stability.
On ACFS files systems where the metadata block size is not 4k, applications that frequently access large numbers of files stored on the ACFS file system can experience severe poor performance, and possilby a storage server outage.
Action / Repair:
To verify that the Automatic Storage Management Cluster File System (ACFS) uses 4K metadata block size, on one database server in the cluster as the owner userid of the Grid home, and with the environment set to access the ASM instance on that database server, execute the following code:

# acfs check metadata block size
# ORACLE_HOME should  be the Grid Infrastucture ORACLE_HOME
isacfsused=$(asmcmd volinfo --all|sed -e 's/ //g'|head -n 1)

if [ $isacfsused = 'novolumesfound' ] ; then
   echo -e "ACFS is not used"
   exit 1

version=$(acfsutil info fs|grep 'ACFS Version'|sort -u|awk -F: '{print $2}'|awk -F. '{print $1$2}')

if [ $version -lt 122 ] ; then
   echo -e "WARNING: This check only is valid when GI version is 12.2 or higher"
   exit 1

for  vol in $(acfsutil info fs|egrep 'metadata block size|primary volume'|awk -F: '{print $1"="$2}'|sed -e 's/ //g')

   attr=$(echo $vol |awk -F= '{print $1}')
   attrval=$(echo $vol |awk -F= '{print $2}')

   if  [ $attr = 'metadatablocksize' ] ; then
     if  [ $attrval -eq 512 ] ; then
     else NO4KMETABLK=0
   elif  [ $attr = 'primaryvolume' ]  &&  [ $NO4KMETABLK -eq 1 ]  ;  then
         ACFSNO4K=(${ACFSNO4K[@]} $attrval)

if [  ${#ACFSNO4K[@]} -eq 0 ] ; then
   printf "%s \n"  "SUCCESS: ALL the ACFS filesystem are using metadata block size 4096"
   printf "%s \n"  "WARNING: There are ACFS filesystem  NOT USING  metadata block size 4096"
   printf "\t %s \n" "The list of the primary volume is: "
   printf "\t %s \n" "${ACFSNO4K[@]}"
   printf "\t %s \n" "To get the complete details of each filesystem, please execute command acfsutil info fs"
The expected output should be:
SUCCESS: ALL the ACFS filesystem are using metadata block size 4096
Example of a "FAILURE" result:
WARNING: There are ACFS filesystem  NOT USING  metadata block size 4096 
    The list of the primary volume is:  
    To get the complete details, please execute command acfsutil info fs 
An ACFS file system created using Grid Infrastructure or higher, by default will use metadata block size 4k.
An ACFS file system created using Grid Infrastructure before, it requires reformatting the ACFS volume, following those steps:
  • Create a backup of the filesystem
  • Deregister (if required) the file system using acfsutil registry -d command
  • Dismount the filesystem
  • Remove the file system using acfsutil rmfs command
  • Reformat the volume using mkfs -t acfs -i 4096 <dev path> command
  • Mount the file system
  • Restore the files
  • Optionally register the file system using acfsutil registry command.

Evaluate Automated Maintenance Tasks configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemsEngineered System PlatformBug(s)
CriticalWARN01/31/18<Name>DevelopmentSSC, Exadata - Physical,
Exadata - User Domain
ALLBug 27471238 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS &  VersionValidation Tool VersionMAA Scorecard Section
11.2 or higherALL ALLALLALLLinux, Solarisexachk 18.2.0N/A

Benefit / Impact:
Some automated maintenance tasks are enabled by default with default settings at database creation time. It is recommended that these automated tasks be allowed to run, but that they are reviewed and adjusted if necessary to provide the most benefit for a given environment's workload. Benefits are provided by improving the overall efficiency of an environment, and also from not having the automated maintenance tasks themselves negatively impact the environment's specific workload.
Leaving automated maintenance tasks at their default values, or disabling them completely may significantly impact a given environment's specific workload performance.
Action / Repair:
To see basic information on automated maintenance tasks, as the owner of the oracle home for a given database and with the environment set to access that database, execute the following command set:
FORMATTED_OUTPUT=$($ORACLE_HOME/bin/sqlplus -s "/ as sysdba" <<EOF
set newpage none head off lines 80 feedback off timing off serveroutput on
select client_name,status from DBA_AUTOTASK_CLIENT;
ENABLED_COUNT=$(echo "$FORMATTED_OUTPUT" | egrep -ic enabled)
  echo -e "INFO: all automated maintenance tasks are enabled."
  echo -e "Please review configuration appropriateness for this environment."
  echo -e "WARNING: one or more automated maintenance tasks are not enabled."
  echo -e "Please enable all and review configuration appropriateness for this environment.\nDetails:\n$FORMATTED_OUTPUT" 
The expected output should be similar to: 
INFO: all automated maintenance tasks are enabled.
Please review configuration appropriateness for this environment.
Example of a "WARNING" result: 
WARNING: one or more automated maintenance tasks are not enabled.
Please enable all and review configuration appropriateness for this environment.
sql tuning advisor                                               ENABLED
auto optimizer stats collection                                  ENABLED
auto space advisor                                               DISABLED
Oracle recommends that Oracle supplied automated maintenance tasks be utilized and tuned for each individual database and it's associated workload.
For more information, please see:
Database Administrator's Guide, 11g Release 2, Managing Automated Database Maintenance Tasks
Database Administrator's Guide, 12c Release 1, Managing Automated Database Maintenance Tasks

Verify proper ACFS drivers are installed for Spectre v2 mitigation

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System PlatformBug(s)
CriticalFAIL05/08/2018<Name>ProductionExadata - Physical,
Exadata - Management Domain
ALL Bug 27989056- Exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AallLinuxexachk 18.2.0N/A
Benefit / Impact:
On Exadata database servers that have an Exadata version installed that provides mitigation for Spectre v2 vulnerability, proper ACFS drivers or other customer-installed kernel drivers must be installed in order for the proper Spectre v2 mitigation to be used.
The impact of verification is minimal. Installing proper ACFS drivers requires Clusterware restart. The impact of installing proper customer-installed kernel drivers cannot be estimated here.
Not using the proper ACFS drivers or other customer-installed kernel drivers can prevent the desired Spectre v2 mitigation, which can lead to reduced performance.
Action / Repair:
To verify proper ACFS drivers are installed for Spectre v2 mitigation, execute the following command set as the "root" userid on all database servers:
# CPU model numbers (/proc/cpuinfo)
# V2:26 X2-2:44 X2-8:46 X2-8M2:47 X3:45 X4:62 X5:63 X6:79 X7:85
thisModel=$(egrep "^model[[:space:]]*:" /proc/cpuinfo | sort -u | awk '{print $NF}')
# kernels without spectrev2 mitigation will not have this file
if [[ ! -e /sys/devices/system/cpu/vulnerabilities/spectre_v2 ]]; then
 echo "WARNING: System is not capable of Spectre v2 mitigation. See minimum version requirements in MOS document 2356385.1."
 # dom0 should use retpoline for all hardware
 # X6 and older should use retpoline
if (  [[ -d /proc/xen/capabilities ]] && grep -q 'control_d' /proc/xen/capabilities ) || \
   echo "$thisModel" | egrep -q "$modelsUseRetpoline"; then
 if [[ $wantRetpoline == yes ]]; then
  if ! echo $v2mitigation | grep -qi retpoline; then
   echo "FAIL: Spectre v2 mitigation is expected to be retpoline, but is not."
   if dmesg | grep -q 'Disabling Spectre v2 mitigation retpoline'; then
    echo "Spectre v2 mitigation retpoline was disabled after system boot."
    # look for modules not compiled with retpoline
    badmodules=$(dmesg | grep 'loading module not compiled with retpoline compiler' | awk -F '[]:]' '{print $2}' | tr '\012' ' ')
    echo "Modules loaded not compiled with retpoline compiler: $badmodules. These modules must be updated."
    if [[ $badmodules =~ oracleoks ]]; then
     echo "oracleoks module will be updated by installing updated ACFS drivers. See MOS document 2356385.1."
   echo "SUCCESS: Spectre v2 mitigation is using $v2mitigation"
  echo "SUCCESS: Spectre v2 mitigation is using $v2mitigation"
The expected output is:
SUCCESS: Spectre v2 mitigation is using Mitigation: Full generic retpoline, IBRS_FW, IBPB
SUCCESS: Spectre v2 mitigation is using Mitigation: IBRS, IBRS_FW, IBPB
Example of a "WARNING" result:
WARNING: System is not capable of Spectre v2 mitigation.  See minimum version requirements in MOS document 2356385.1.
In the above "WARNING" example, the system should be upgraded per the MOS note.
Example of a "FAIL" result:
FAIL: Spectre v2 mitigation is expected to be retpoline, but is not.  Spectre v2 mitigation retpoline was disabled after system boot.  Modules loaded not compiled with retpoline compiler: oracleoks. These modules must be updated.  oracleoks module will be updated by installing updated ACFS drivers.  See: MOS document 2356385.1.
In the above FAIL, the system was expected to be using retpoline mitigation for Spectre v2, but was not. The system initially booted with retpoline mitigation, but it was disabled when an improper kernel module was loaded that caused retpoline mitigation to be disabled.

Verify Exafusion Memory Lock Configuration

PriorityAlert LevelDateOwnerStatusEngineered SystemBug(s)
CriticalFAIL06/27/18<Name>ProductionExadata - ALLBug 23253697 - exachk
DB VersionDB RoleEngineered System PlatformExadata VersionOS & VersionValidation Tool VersionTBD
ALLN/AALLALLLinux X86-64  

Benefit / Impact:
Having memlock set correctly is required for a successful upgrade to releases 12.2 and higher, and also to prevent ORA- errors associated with IPC context initialization. The impact of verifying the Exafusion memory lock configuration is minimal. Following any modifications to the limits.conf settings, a logout/login is required for the OS user to ensure the changes take effect.

NOTE: The memlock settings should be correct according to script recommendations regardless of whether Exafusion is actually being used or not (it is enabled by default in 12.2).
Instance startup will fail, and/or clients will fail to connect if memlock settings are insufficient.
Action / Repair:
To verify Exafusion memory lock configuration, on each database server, as the owner userid of each unique RDBMS home, place the following code into a script and execute it.
#      Parse limits settings under /etc/security and produce an FAILURE if the
#      required memlock settings for Exadata are missing.
#      If non-standard settings are found, produce an FAILURE if the configured
#      limits are below the minimum requirement, else produce a WARNING.
#    amorimur    04/09/18 - Creation

# Parse the given memlock setting string and see if it is satisfactory
check_memlock () {
  local L=$*

  # Error if we don't see the correct format
  if [ $(echo "$L" | wc -w) -ne 5 ] ; then
    echo "FAILURE: Invalid entry found ($L)"

    # The oracle user must have an unlimited limit
    local LUSR=$(echo "$L" | sed 's/\*/all_users/g' | awk '{print $1}')
    if [ $LUSR = $RDBMS_OWNER ] ; then
      local LVAL=$(echo "$L" | awk '{print $4}')
      if [ $LVAL != 'unlimited' ] ; then
        echo "FAILURE: $RDBMS_OWNER must have an unlimited setting ($L)"

    # All others must have the minimum limit
    # Even if the limit settings are satisfactory, print a warning for all of these non-standard entries
      local LVAL=$(echo "$L" | awk '{print $4}' | sed "s/unlimited/$MINLIMIT/g")

      if [ $LVAL -lt $MINLIMIT ] ; then
        echo "FAILURE: Found the following entry with memlock limit less than $MINLIMIT ($L)"
        echo "WARNING: Found a non-standard memlock limit entry ($L)"

# Check the limits.conf file
# See if the file exists & is readable
if [ -r $LIMITSFILE ] ; then
  # Generate a reference file
  cat <<! >> $REFFILE_BASE
* soft memlock $MINLIMIT
* hard memlock $MINLIMIT
$RDBMS_OWNER soft memlock unlimited
$RDBMS_OWNER hard memlock unlimited

  # Sort the contents
  # Extract the limits.conf settings on this system, exclude comments, and sort (duplicates are ok)
  grep memlock $LIMITSFILE | egrep 'soft|hard' | awk '{print $1, $2, $3, $4}' | grep -v ^# | sort | uniq > $MYFILE
  # Find settings missing on this system, missing settings will produce an FAILURE
  if [ -s $TMPFILE ] ; then
    echo "FAILURE: the following required memlock settings are missing in $LIMITSFILE"
    echo "------"
    cat $TMPFILE
    echo "------"
  # Find non-standard settings on this system
  # An FAILURE is raised when the memlock setting is below the minimum requirement, otherwise a WARNING is raised
  if [ -s $TMPFILE ] ; then
    # Parse results one by one
    while read L ; do
      check_memlock "$L file:$LIMITSFILE"
    done < $TMPFILE
  # Debug
  if [ $DEBUG -eq 1 -a $SUCCESS -ne 1 ] ; then
    echo "-----"
    echo "Debug: reference file"
    cat $REFFILE
    echo "-----"
    echo "Debug: local file"
    cat $MYFILE
    echo "-----"

  echo "FAILURE: Unable to open $LIMITSFILE for reading"
# Check for memlock settings under limits.d
for F in $(grep -rl memlock $LIMITSDDIR/*) ; do
  grep memlock $F | egrep 'soft|hard' | awk '{print $1, $2, $3, $4}' | grep -v ^# > $TMPFILE

  if [ -s $TMPFILE ] ; then

    # Parse results one by one
    while read L ; do
      check_memlock $L file:$F
    done < $TMPFILE
# Clean up

# Success
if [ $SUCCESS -eq 1 ] ; then
  echo "SUCCESS: Memlock settings meet the Oracle best practices"
The expected output is:
SUCCESS: Memlock settings meet the Oracle best practices
Example of a "FAILURE" result:
FAILURE: the following required memlock settings are missing in /etc/security/limits.conf
* hard memlock 32768
oracle hard memlock unlimited
oracle soft memlock unlimited
* soft memlock 32768
WARNING: Found a non-standard memlock limit entry (grid hard memlock 237778560 file:/etc/security/limits.conf)
WARNING: Found a non-standard memlock limit entry (grid soft memlock 237778560 file:/etc/security/limits.conf)
FAILURE: oracle must have an unlimited setting (oracle hard memlock 237778560 file:/etc/security/limits.conf)
FAILURE: oracle must have an unlimited setting (oracle soft memlock 237778560 file:/etc/security/limits.conf)
If a "FAILURE" or "WARNING" message appears, make the necessary edits to "/etc/security/limits.conf" and files under "/etc/security/limits.d/" as directed.

Verify there are no unhealthy InfiniBand switch sensors

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL08/08/18<Name> ProductionExadata - Physical,
Exadata - Management Domain,
ALLBug 28279223 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.4.0N/A
Benefit / Impact:
For maximum functionality and alert notifications, all InfiniBand switch sensors should be functioning properly. The impact of verifying there are no unhealthy InfiniBand switch sensors is minimal. The impact of correcting failed sensors varies by failed component.
InfiniBand switch functionality may be reduced depending upon which components have failed.
Action / Repair:
To verify there are no unhealthy InfiniBand switch sensors, as the "root" userid on each InfiniBand switch execute the following code set:
if [ $(echo "$RAW_OUTPUT" | egrep -ic "WARNING|FAILURE") -eq 0 ]
  echo -e "SUCCESS: there are no unhealthy InfiniBand switch sensors"
  echo -e "FAILURE: there are one or more unhealthy InfiniBand switch sensors.  Details:\n\n$RAW_OUTPUT"
The expected output is the following:
SUCCESS: there are no unhealthy InfiniBand switch sensors
Example of a FAIL result:
FAILURE: there are one or more unhealthy InfiniBand switch sensors.  Details:

WARNING PSU 1 present AC Loss
FAILURE - 1 sensors NOT OK
Corrective actions vary depending upon the failed component. Refer to the appropriate switch documentation, and if necessary open an SR for assistance.

Refer to MOS 1682501.1 if non-Exadata components are in use on the InfiniBand fabric

Refer to MOS 1682501.1 if non-Exadata components are in use on the InfiniBand fabric 

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical WARN 09/05/18 <Name> Production Exadata - Physical,
Exadata - Management Domain,
ALL Bug 28108851 - exachk 
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/A N/A N/A N/A ALL Linux exachk 18.4.0 N/A 
Benefit / Impact:
If non-Exadata components are in use on the same InifiniBand fabric as an Exadata environment, then there are additional configuration considerations between the components. Verifying these additional considerations helps to ensure the InfiniBand fabric is stable and performs well.
Not referring to MOS 1682501.1 can result in potential InfiniBand fabric instability and poor performance which may cause components in the Exadata environment to crash. Problems during patching can also occur.
Action / Repair:
To determine if non-Exadata components are discovered on the InfiniBand fabric execute the following code set as the "root" userid on one database server in the Exadata environment:
DETECTED_LINE_NUMBER=$(echo "$VT_OUTPUT" | egrep -ni "detected and ignored" | cut -d":" -f1)
NONEXADATA_OUTPUT=$(echo "$VT_OUTPUT" | egrep -i "detected and ignored" -A $SPAN | grep -v "^Detected")
  echo -e "SUCCESS: There were no non-Exadata InfiniBand components discovered."
  echo -e "WARNING: One or more non-Exadata InfiniBand components were discovered:\n\n$NONEXADATA_OUTPUT"

The expected output is the following:

SUCCESS: There were no non-Exadata InfiniBand components discovered.
Example of a "WARNING" result:

WARNING: One or more non-Exadata IB components were discovered:
Ca      : 0x0010e0605308c000 ports 2 "SUN IB QDR GW switch <host>-sw-ib2  Bridge 0"
Ca      : 0x0010e0605308c040 ports 2 "SUN IB QDR GW switch <host>-sw-ib2  Bridge 1"
Ca      : 0x0010e00001757140 ports 2 "<host>-bda10-adm BDA xx.xx.xx.200 HCA-1"
Ca      : 0x0010e0000178e640 ports 2 "<host>-bda09-adm BDA xx.xx.xx.199 HCA-1"
Ca      : 0x0010e0000187b6e8 ports 2 "<host>-bda12 BDA HCA-1"
Ca      : 0x0010e00001757ad0 ports 2 "<host>-bda11-adm BDA xx.xx.xx.201 HCA-1"
Ca      : 0x0010e00001878808 ports 2 "<host>-bda13 BDA HCA-1"
Ca      : 0x0010e000017723d0 ports 2 "<host>-bda14 BDA HCA-1"
Ca      : 0x0010e0000187a638 ports 2 "<host>-bda15 BDA HCA-1"
Ca      : 0x0010e00001757050 ports 2 "<host>-bda16 BDA HCA-1"
Ca      : 0x0010e00001757090 ports 2 "<host>-bda17 BDA HCA-1"
Ca      : 0x0010e0000178e5f0 ports 2 "<host>-bda18-adm BDA xx.xx.xx.152 HCA-1"
Ca      : 0x0010e0000178e600 ports 2 "<host>-bda08-adm BDA xx.xx.xx.198 HCA-1"
Ca      : 0x0010e00001756fa0 ports 2 "<host>-bda07-adm BDA HCA-1"
Ca      : 0x0010e00001757070 ports 2 "<host>-bda05-adm BDA xx.xx.xx.195 HCA-1"
Ca      : 0x0010e000017573d0 ports 2 "<host>-bda06-adm BDA xx.xx.xx.196 HCA-1"
Ca      : 0x0010e000017572a0 ports 2 "<host>-bda03-adm BDA xx.xx.xx.193 HCA-1"
Ca      : 0x0010e00001756fb0 ports 2 "<host>-bda04-adm BDA xx.xx.xx.194 HCA-1"
Ca      : 0x0010e0000174f0e0 ports 2 "<host>-bda01-adm BDA xx.xx.xx.191 HCA-1"
Ca      : 0x0010e0000174e170 ports 2 "<host>-bda02-adm BDA xx.xx.xx.192 HCA-1"
Ca      : 0x0010e0602e08c000 ports 2 "SUN IB QDR GW switch <host>-sw-ib3  Bridge 0"
Ca      : 0x0010e0602e08c040 ports 2 "SUN IB QDR GW switch <host>-sw-ib3  Bridge 1"
If a "WARNING" result is returned, please refer to: Setting up the Subnet Manager in a multi-rack cabling configuration containing Exalogic/Big Data Appliance and Exadata/SuperCluster (Doc ID 1682501.1)

<strong><a name="verify_ib_sdp_not_loaded" class="mceItemAnchor"></a>Verify the ib_sdp module is not loaded into the kernel

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalFAIL02/20/19<Name>ProductionExadata - Physical,
Exadata - Management Domain,
ALLBug 29157366 - exachk
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 19.1.0N/A
Benefit / Impact:
The Socket Direct Protocol (SDP) developed by the OpenFabric Enterprise Distribution (OFED) group Mellanox is no longer supported. There are open issues with SDP and operating system stability that will not be resolved.
For performance and stability, the ib_sdp module should not be loaded into the kernel. The impact of verifying the ib_sdp module is not loaded into the kernel is minimal. Modifying a system to not load the ib_sdp module requires a reboot.

NOTE: for Exadata versions or greater, the ib_sdp module should not be loaded into the kernel.
NOTE: for Exadata versions 12.1.x.x.x or lower, it is recommended the ib_sdp module not be loaded into the kernel. However, if the ib_sdp module is loaded against this recommendation, then the option "sdp_apm_enable" must be set to "0". While the original Automatic Path Migration (APM) issue was reported when Exalogic application servers were accessing an Oracle Exadata Database Machine using SDP, ANY client requesting a connection using SDP with APM enabled to an Oracle Exadata Database Machine will eventually cause the connection to hang on the database server.
System instability, poor performance, and potential node evictions are likely if the ib_sdp module is loaded into the kernel.

Action / Repair:
To verify the ib_rds module is not loaded, as the "root" userid on each database server execute the following code set:
EXADATA_VERSION=$(imageinfo -version | cut -d"." -f1-5 | tr -d .)
LSMOD_DATA=$(/sbin/lsmod | egrep -i ^ib_sdp)
if [[ $EXADATA_VERSION -ge 122000 ]]
  if [ -z "$LSMOD_DATA" ] 
    echo -e "SUCCESS: The ib_sdp module is not loaded into the kernel"
    echo -e "FAILURE: The ib_sdp module is loaded into the kernel.  Details:\n$LSMOD_DATA"
  if [ -z "$LSMOD_DATA" ] 
    echo "SUCCESS: The ib_sdp module is not loaded into the kernel"
    CODE_LINE=$(echo $EXADATA_VERSION | cut -c1-2)
    KERNEL_TYPE=$(uname -r | cut -d"." -f6)
    if [ $KERNEL_TYPE = "el5uek" ]
    elif [ $KERNEL_TYPE = "el6uek" ]
      echo -e "ERROR: unable to determine IB_SDP_FILE: $KERNEL_TYPE"
    IB_SDP_FILE_OUTPUT=$(egrep "ib_sdp" $IB_SDP_FILE)
    if [ -s /sys/module/ib_sdp/parameters/sdp_apm_enable ]
      IB_SDP_KERNEL_OUTPUT_RSLT=$(cat /sys/module/ib_sdp/parameters/sdp_apm_enable)
      IB_SDP_KERNEL_OUTPUT_RSLT="/sys/module/ib_sdp/parameters/sdp_apm_enable not found"
    if [[ $CODE_LINE -eq 11 && $EXADATA_VERSION -lt 112331 || $CODE_LINE -eq 12 && $EXADATA_VERSION -lt 121111 ]]
      if [ $(echo "$IB_SDP_FILE_OUTPUT" | egrep "sdp_apm_enable*.=0" | wc -l) -eq 1 ]    
      if [[ "$IB_SDP_FILE_OUTPUT_RSLT" = 0 && "$IB_SDP_KERNEL_OUTPUT_RSLT" = 0 ]]
        echo -e "SUCCESS: ib_sdp is loaded and sdp_apm_enable is set to 0 in $IB_SDP_FILE and running kernel."
        echo -e "$IB_SDP_FILE:  $IB_SDP_FILE_OUTPUT"
        echo -e "Running Kernel:  $IB_SDP_KERNEL_OUTPUT_RSLT"
        echo -e "FAILURE: ib_sdp is loaded and sdp_apm_enable should be set to 0 in $IB_SDP_FILE and running kernel."
        echo -e "$IB_SDP_FILE: $IB_SDP_FILE_OUTPUT"
        echo -e "Running Kernel:  $IB_SDP_KERNEL_OUTPUT_RSLT"
      if [ $(echo "$IB_SDP_FILE_OUTPUT" | egrep "sdp_apm_enable*.=0" | wc -l) -eq 0 ]    
      if [[ "$IB_SDP_FILE_OUTPUT_RSLT" = 0 && "$IB_SDP_KERNEL_OUTPUT_RSLT" = 0 ]]
        echo -e "SUCCESS: ib_sdp is loaded and sdp_apm_enable is not set in $IB_SDP_FILE and is set to "0" in the running kernel."
        echo -e "$IB_SDP_FILE:  $IB_SDP_FILE_OUTPUT"
        echo -e "Running Kernel:  $IB_SDP_KERNEL_OUTPUT_RSLT"
        echo -e "FAILURE: ib_sdp is loaded and sdp_apm_enable should not be set in $IB_SDP_FILE and should be "0" in the running kernel."
        echo -e "$IB_SDP_FILE: $IB_SDP_FILE_OUTPUT"
        echo -e "Running Kernel:  $IB_SDP_KERNEL_OUTPUT_RSLT"
The expected output is the following:
SUCCESS: The ib_sdp module is not loaded into the kernel
Example of a FAIL result:
FAILURE: ib_sdp is loaded and sdp_apm_enable should be set to 0 in /etc/modprobe.conf and running kernel.
Running Kernel:  0
NOTE: To correct a "FAILURE" result, place the text "SDP_LOAD=no" into the file "/etc/rdma/rdma.conf" and reboot the database server. 
Verify all voting disks are online
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 05/29/19 Vern Wagman Production Exadata - Physical,
Exadata - User Domain 
ALL 29779386 - exachk 
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section or higher ASM N/A N/A N/A Linux exachk 19.3.0 N/A 
Benefit / Impact:
Voting disks help ensure a stable cluster. The impact of verifying all voting disks are online is minimal. The impact of bringing a given voting disk back online depends upon the reason why it went offline, and cannot be estimated here.
Not having all expected voting disks online increases the risk of node eviction or cluster crash.
Action / Repair:
To verify all voting disks are online, as the grid home owner userid, and with CRS_HOME and SID set to access the ASM instance, execute the following code on one database server in the cluster:
VOTEDISK_OUTPUT=$($CRS_HOME/bin/crsctl query css votedisk)
LOCATED_COUNT=$(echo "$VOTEDISK_OUTPUT" | egrep "^Located" | cut -d" " -f2)
  echo -e "SUCCESS: all voting disks are online."
  echo -e "FAILURE: not all voting disks are online.\nDETAILS:\n$VOTEDISK_OUTPUT"
The expected output should be: 
SUCCESS: all voting disks are online.
Example of a "FAILURE" case: 
FAILURE: not all voting disks are online.
##  STATE    File Universal Id                File Name Disk group
--  -----    -----------------                --------- ---------
 1. ONLINE   a07c741f08194f71bf7f4d14c7d67a15 (/dev/exadata_quorum/QD_DATAC1_RANDOM05ADM05) [DATAC1]
 2. ONLINE   d1327820402f4f2fbffca97cbdef72d7 (/dev/exadata_quorum/QD_DATAC1_RANDOM05ADM06) [DATAC1]
 3. ONLINE   748b53cfb1a64f6cbff0f71de2de89b3 (o/; [DATAC1]
 4. ONLINE   5fbc672724094f82bfcd4ea220ab824a (o/; [DATAC1]
 5. OFFLINE   e9efd3be40ad4f64bfd034233f3e37d3 (o/; [DATAC1]
If a "FAILURE" result is returned, investigate to determine root cause and take appropriate corrective action.
Verify available ksplice fixes are installed
PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
Critical FAIL 08/14/19 Doug Utzig Production Exadata - Physical,
Exadata - Management Domain,
Exadata - User Domain, RA 
ALL 30185190 - exachk 
DB VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
ALL ALL ALL ALL >=, >= Linux exachk 19.3.0 N/A 
Benefit / Impact:
On Exadata systems some Oracle Linux operating system updates are delivered via ksplice. All available ksplice updates should be installed to ensure issues fixed in the installed Exadata release are not encountered.
Not having all available ksplice updates installed can lead to unexpected behavior caused by encountering issues that are expected to be fixed in the installed Exadata release. The risk of checking that all available ksplice updates are installed is minimal.
Action / Repair:
To verify all available ksplice updates are installed run the following command set as the root user on each storage and database server in the cluster:
The expected output is the following:
-- OR --
Example of a FAIL result:
If there are available ksplice updates not installed then run uptrack-install as the root user, as follows:

The Flash 20 card supports ESM lifetime to enable proactive replacement before failure.
The impact of verifying that the ESM lifetime is within specification is minimal. Replacing an ESM requires a storage server outage. The database and application may remain available if the appropriate grid disks are properly inactivated before and activated after the storage server outage. Refer to MOS Note 1188080.1 and "Shutting Down Exadata Storage Server" in Chapter 7 of "Oracle® Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-14" for additional details.
Failure of the ESM will put the Flash 20 card in WriteThrough mode which has a high impact on performance.
To verify the ESM lifetime value, use the following command on the storage servers:
for RISER in RISER1/PCIE1 RISER1/PCIE4 RISER2/PCIE2 RISER2/PCIE5; do ipmitool sunoem cli "show /SYS/MB/$RISER/F20CARD/UPTIME"; done | grep value -A4

The output will be similar to:
 value = 3382.350 Hours
 upper_nonrecov_threshold = 17500.000 Hours
 upper_critical_threshold = 17200.000 Hours
 upper_noncritical_threshold = 16800.000 Hours
 lower_noncritical_threshold = N/A
 -- <output truncated>

If the "value" reported exceeds the "upper_noncritical_threshold" reported, schedule a replacement of the relevant ESM.
NOTE: There is a bug in ILOM firmware version which may report "Invalid target..." for "RISER1/PCIE4". If that happens, consult your site maintenance records to verify the age the ESM Module.

NOTE: For Aura II (F20 M2) cards, the CPLD reports the End of Life indication on the F20 M2 cards, so the thresholds for UPTIME sensor are not needed. The threshold values are replaced with "N/A". The ILOM will fault the system when it's time to replace the F20 M2's ESM. Beginning with 2.1.3, exachk does not execute this check on F20 M2 cards. Beginning with 2.1.5, exachk posts a message in the html report detail that the card is an F20M2 model and the check is not applicable.

Verify Database Server Disk Controller Configuration (ARCHIVE)

Archive Date: 10/01/12
Archive Reason: Beginning with the configuration of the database server disk drives was changed to have all available disk drives in a RAID-5 configuration with no hot spare.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8Linux11.2.x +11.2.x +

Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the RAID devices is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the RAID devices increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server disk controller configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Device Present" -A 8 

For X2-2, the output will be similar to:

 Device Present
 Virtual Drives : 1
 Degraded : 0
 Offline : 0
 Physical Devices : 5
 Disks : 4
 Critical Disks : 0
 Failed Disks : 0 

The expected output is 1 virtual drive, none degraded or offline, 5 physical devices (controller + 4 disks), 4 disks, and no critical or failed disks.
For X2-8, the output will be similar to:
 Device Present
 Virtual Drives : 1
 Degraded : 0
 Offline : 0
 Physical Devices :11
 Disks : 8
 Critical Disks : 0
 Failed Disks : 0 

The expected output is 1 virtual drive, none degraded or offline, 11 physical devices (1 controller + 8 disks + 2 SAS2 expansion ports), 8 disks, and no critical or failed disks.
On X2-8, there is a SAS2 expander on each NEM, which takes in the 8 ports from the Niwot REM and expands it out to both the 8 physical drive slots through the midplane and the 2 SAS2 expansion ports external on each NEM. See output below from the MegaRaid? FW event log.
If the reported output differs, investigate and correct the condition.

Verify Database Server Virtual Drive Configuration (ARCHIVE)

Archive Date: 10/01/12
Archive Reason: Beginning with the configuration of the database server disk drives was changed to have all available disk drives in a RAID-5 configuration with no hot spare.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8Linux11.2.x +11.2.x +

Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the virtual drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the virtual drives increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server virtual drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Virtual Drive:";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Number Of Drives";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "^State" 

For X2-2 the output should be similar to:

Virtual Drive: 0 (Target Id: 0)
Number Of Drives : 3
State : Optimal

The expected result is that the virtual device has 3 drives and a state of optimal.
For X2-8, the output should be similar to:
Virtual Drive: 0 (Target Id: 0) 
Number Of Drives : 7 
State : Optimal 

The expected result is that the virtual device has 7 drives and a state of optimal.
If the reported output differs, investigate and correct the condition.
NOTE: The virtual device number reported may vary depending upon configuration and version levels.
NOTE: If a bare metal restore procedure is performed on a database server without using the "dualboot=no" configuration, that database server may be left with three virtual devices for X2-2 and 7 for X2-8. Please see My Oracle Support note 1323309.1 for additional information and correction instructions.

Verify Database Server Physical Drive Configuration (ARCHIVE)

Archive Date: 10/01/12
Archive Reason: Beginning with the configuration of the database server disk drives was changed to have all available disk drives in a RAID-5 configuration with no hot spare.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8Linux11.2.x +11.2.x +

Benefit / Impact:
For X2-2, there are 4 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 3 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
For X2-8, there are 8 disk drives in a database server controlled by an LSI MegaRAID SAS 9261-8i disk controller. The disks are configured RAID-5 with 7 disks in the RAID set and 1 disk as a hot spare. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the physical drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the physical drives increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server physical drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 PDList -aALL | grep "Firmware state"

The output for X2-2 will be similar to:
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Hotspare, Spun down

There should be three lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". The ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.
The output for X2-8 will be similar to:
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Hotspare, Spun down

There should be seven lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". The ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.
If the reported output differs, investigate and correct the condition.
NOTE: Modified 03/21/12
Occasionally in normal operation, the "Hotspare" physical drive may be brought to a state of "Online, Spun Up". Thirty minutes (default) after the operation that brought the drive to "Online, Spun Up" has completed, the drive should spin down due to the powersaving feature. There is no harm for the drive to be "Online, Spun Up" if there are no other errors reported in the disk drive configuration checks.

For additional information, please reference My Oracle Support note "Exadata: Hot Spares Not Spinning Down (Doc ID 1403613.1)"

Verify Peripheral Component Interconnect (PCI) Bridges are Configured for Generation II on Storage Servers (ARCHIVE)

Archive Date: 10/24/12
Archive Reason: Beginning with the X4270 M3 storage servers shipped with the X3-2 and X3-8 database machines, there is a different PCI architecture and this issue is not relevant to the new hardware.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical09/13/11X2-2(4170), X2-2, X2-8Linux, Solaris11.2.x +11.2.x +

Benefit / Impact:
The storage server PCI bridges (19:0.0 and 27:0.0) should be configured for generation II for maximum performance.
There is minimal impact to verify the PCI Bridges configuration.
If the PCI bridges are not configured for generation II, performance will be sub-optimal.
Action / Repair:
To verify the current PCI bridges configuration, execute the following command as the root userid on all storage servers:
for BUS_NUM in 19:0.0 27:0.0; do echo $BUS_NUM `lspci -xxx -s $BUS_NUM | grep ^50 | cut -d" " -f4`; done

The output should be similar to:
19:0.0 82
27:0.0 82

If any of the storage server PCI bridges do not return "82", there are three possible corrective actions:
If the value returned is "81" you may upgrade to Exadata storage server software version or greater, or refer to MOS note1351559.1.
If neither the value "81" nor "82" is returned, contact oracle support for further assistance.
NOTE: PCI Bridge generation I will return the value "81".
[NOTE: INTERNAL ONLY - manual instructions are also listed in exachk bug 12756149.]

Verify Database Server Disk Controller Configuration (ARCHIVE)

Archive Date: 03/06/13
Archive Reason: Beginning with the Exadata software version, the reclamation of the hotspare device mandated in, was made optional for those customers upgrading from a version below directly to or higher.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical10/1/2012X2-2(4170), X2-2, X2-8, X3-2, X3-8Linux11. +11.2.x +

Benefit / Impact:
An X3-2 or X2-2 database server contains 4 disk drives in a RAID-5 configuration. An X3-8 or X2-8 database server contains 8 disk drives in a RAID-5 configuration. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the RAID devices is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the RAID devices increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server disk controller configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 AdpAllInfo -aALL | grep "Device Present" -A 8 

For an X3-2 or X2-2 database server, the output will be similar to:

 Device Present
 Virtual Drives : 1
 Degraded : 0
 Offline : 0
 Physical Devices : 5
 Disks : 4
 Critical Disks : 0
 Failed Disks : 0 

The expected output is 1 virtual drive, none degraded or offline, 5 physical devices (controller + 4 disks), 4 disks, and no critical or failed disks.
For an X3-8 or X2-8 database server, the output will be similar to:
 Device Present
 Virtual Drives : 1
 Degraded : 0
 Offline : 0
 Physical Devices :11
 Disks : 8
 Critical Disks : 0
 Failed Disks : 0 

The expected output is 1 virtual drive, none degraded or offline, 11 physical devices (1 controller + 2 SAS2 expansion ports+ 8 disks), 8 disks, and no critical or failed disks.
If the reported output differs, investigate and correct the condition.

NOTE: If additonal virtual drives or a "hot spare" is present, it may be that the procedure to reclaimdisks was not executed at deployment time or that a bare metal restore procedure was performed without using the "dualboot=no" qualifier. Please refer to the "Reclaiming Disks for the Linux Operating System" section of "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)".

Verify Database Server Virtual Drive Configuration (ARCHIVE)

Archive Date: 03/06/13
Archive Reason: Beginning with the Exadata software version, the reclamation of the hotspare device mandated in, was made optional for those customers upgrading from a version below directly to or higher.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical10/1/2012X2-2(4170), X2-2, X2-8, x3-2, x3-8Linux11. +11.2.x +

Benefit / Impact:
An X3-2 or X2-2 database server contains 4 disk drives in a RAID-5 configuration. An X3-8 or X2-8 database server contains 8 disk drives in a RAID-5 configuration. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the virtual drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the virtual drives increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server virtual drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Virtual Drive:";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "Number Of Drives";/opt/MegaRAID/MegaCli/MegaCli64 CfgDsply -aALL | grep "^State" 

For an X3-2 or X2-2 database server, the output will be similar to:
Virtual Drive: 0 (Target Id: 0)
Number Of Drives : 4
State : Optimal

The expected result is that the virtual device has 4 drives and a state of optimal.
For an X3-8 or X2-8 database server, the output will be similar to:
Virtual Drive: 0 (Target Id: 0) 
Number Of Drives : 8 
State : Optimal 

The expected result is that the virtual device has 8 drives and a state of optimal.
If the reported output differs, investigate and correct the condition.
NOTE: The virtual device number reported may vary depending upon configuration and version levels.NOTE: If additonal virtual drives or a "hot spare" is present, it may be that the procedure to reclaimdisks was not executed at deployment time or that a bare metal restore procedure was performed without using the "dualboot=no" qualifier. Please refer to the "Reclaiming Disks for the Linux Operating System" section of "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)".

NOTE: If the database server was upgraded to or higher, this check may fail because the reported number of drives is "3" or "7". Please see the "Known Issues" #5 "Hotspare removed for compute nodes" in My Oracle Support note 1468877.1 for corrective action.

Verify Database Server Physical Drive Configuration (ARCHIVE)

Archive Date: 03/06/13
Archive Reason: Beginning with the Exadata software version, the reclamation of the hotspare device mandated in, was made optional for those customers upgrading from a version below directly to or higher.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical10/1/2012X2-2(4170), X2-2, X2-8, X3-2, X3-8Linux11. +11.2.x +

Benefit / Impact:
An X3-2 or X2-2 database server contains 4 disk drives in a RAID-5 configuration. An X3-8 or X2-8 database server contains 8 disk drives in a RAID-5 configuration. There is 1 virtual drive created across the RAID set. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of validating the physical drives is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the physical drives increases the chance of a performance degradation or an outage.
Action / Repair:
To verify the database server physical drive configuration, use the following command:
/opt/MegaRAID/MegaCli/MegaCli64 PDList -aALL | grep "Firmware state"

For an X3-2 or X2-2 database server, the output will be similar to:
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up

There should be 4 lines of output showing a state of "Online, Spun Up".
For an X3-8 or X2-8 database server, the output will be similar to:
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up 
Firmware state: Online, Spun Up

There should be 8 lines of output showing a state of "Online, Spun Up".
If the reported output differs, investigate and correct the condition.
NOTE: If additonal virtual drives or a "hot spare" is present, it may be that the procedure to reclaimdisks was not executed at deployment time or that a bare metal restore procedure was performed without using the "dualboot=no" qualifier. Please refer to the "Reclaiming Disks for the Linux Operating System" section of "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)".

NOTE: If the database server was upgraded to or higher, this check may fail because one of the devices shows a state of: "Unconfigured(good), Spun Up". Please see the "Known Issues" #5 "Hotspare removed for compute nodes" in My Oracle Support note 1468877.1 for corrective action.

Verify processor.max_cstate=1 on database servers

Archive Date: 03/13/13
Archive Reason: Beginning with the Exadata software version fresh install or upgrade to, the ILOM version went to and this issue was resolved. This also does not apply to the current X3 series hardware.

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL04/17/12Dan NorrisProductionExadata14153949- exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-211.2.x+ (ILOM < - 11, Linux x86-64 UEK5.8exachk 2.2.2 

Benefit / Impact:
The benefit of these settings is avoiding uncorrectable memory errors related to the deep C state features on Nahalem processors.
NOTE: Fresh images or higher automatically include these fixes. Systems upgraded from older original images should be manually upgraded by following the upgrade notes.

Without the proper configuration settings, memory errors may be reported.
Action / Repair:
If the database server has been upgraded to version or higher, it should be running ILOM version which includes fix for CR 7036024. Once that fix is installed, the kernel parameter is no longer required as the ILOM/BIOS incorporates the fix directly. Rather than checking for an image version, the proper check should be against the ILOM version directly.
To verify that processor.max_cstate=1 if required, as the "root" userid execute the following code on each database server:
##### begin script 
UNAME_S=`/bin/uname -s` 
DMIDECODE=`/usr/sbin/dmidecode -s system-product-name` 
### this fixes weirdness with the way dmidecode returns its data 
### check basic requirements 
if [ "$UNAME_S" = "Linux" -a "$DMIDECODE" = "SUN FIRE X4170 M2 SERVER" ]; then 
 ### verify the ILOM version - if or newer, can exit 
 ILOM_VER=`ipmitool sunoem cli version | grep firmware | egrep -v 'build number|date:' | awk '{print $3}'` 
 ILOM_VER1=`echo $ILOM_VER | awk -F. '{print $1}'` 
 ILOM_VER2=`echo $ILOM_VER | awk -F. '{print $2}'` 
 ILOM_VER3=`echo $ILOM_VER | awk -F. '{print $3}'` 
 ILOM_VER4=`echo $ILOM_VER | awk -F. '{print $4}'` 
 if [ "$ILOM_VER1" -le 9 ]; then ILOM_VER1="0$ILOM_VER1"; fi 
 if [ "$ILOM_VER2" -le 9 ]; then ILOM_VER2="0$ILOM_VER2"; fi 
 if [ "$ILOM_VER3" -le 9 ]; then ILOM_VER3="0$ILOM_VER3"; fi 
 if [ "$ILOM_VER4" -le 9 ]; then ILOM_VER4="0$ILOM_VER4"; fi 
 if [ $TARGET_ILOM_VER_X4170 -gt $LOCALVER ]; then 
 ### now we need to check for the parameter in /proc/cmdline 
 PARAM_PRESENT=`grep processor.max_cstate=1 /proc/cmdline | wc -l ` 
 if [ $PARAM_PRESENT -eq 1 ]; then 
 ### don't have fix via ILOM version, but have cmdline param 
 echo "PASSED due to cmdline param" 
 else ### don't have fix via ILOM, don't have fix via kernel cmdline param, failed check 
 echo "FAILED" 
 ### already have the minimum ILOM version, so passed the check 
 echo "PASSED due to minimum ILOM version" 
 echo "This check is only for Linux-based X4170 M2 database servers, exiting" 
#### end script 

The expected output is not "FAILED".
To correct a "FAILED" condition:
1) Upgrade to newer versions of Exadata Software not impacted by this issue.
2) If an upgrade is not possible, to configure the proper settings, the kernel boot option "processor.max_cstate=1" should be added to the /boot/grub/grub.conf file on the "kernel" line so that it looks like this:
kernel /vmlinuz-2.6.18- root=LABEL=DBSYS ro bootarea=dbsys loglevel=7 panic=60 debug rhgb numa=off console=ttyS0,115200n8 console=tty1 crashkernel=128M@16M audit=1 processor.max_cstate=1 nomce

After this change, a system reboot is required to pick up the new setting.

Verify Software on Storage Servers ( (ARCHIVE)

Archive Date: 06/26/13
Archive Reason: Beginning with the Exadata software version fresh install or upgrade to, has been desupported by Exadata development.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8Linux, Solaris11.2.x +11.2.x +

Benefit / Impact:
Verifying the software configuration after initial deployment, upgrades, or patching and before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the software modifications.
The overhead for these verification steps is minimal.
If the software is not validated, inconsistencies can lead to problems and outages.
Action / Repair:
To verify the storage server software configuration execute the following command as the root userid:
/opt/oracle.SupportTools/ -c 

The output will be similar to:
[INFO] SUCCESS: Meets requirements of operating platform and installed software for 
[INFO] below listed releases and patches of Exadata and of corresponding Database. 
[INFO] Check does NOT verify correctness of configuration for installed software.

Exadata: OracleDatabase: 

If any result other than "SUCCESS" is returned, investigate and correct the condition.
ravindra.dani: This is not correct for database hosts all the time. SW checker is only useful on fresh imaged db nodes. Also this check is going to be retired by This check should not be run on the cells and though not folded in cellcli,s ay at validate config it should be.

Verify Software on InfiniBand Switches (

Archive Date: 06/26/13
Archive Reason: Beginning with the Exadata software version fresh install or upgrade to, has been desupported by Exadata development.

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
CriticalN/AX2-2(4170), X2-2, X2-8Linux, Solaris11.2.x +11.2.x +

Benefit / Impact:
Verifying the software configuration after initial deployment, upgrades, or patching and before the Oracle Exadata Database Machine is placed into or returned to production status can avoid problems related to the software modifications.
The overhead for these verification steps is minimal.
If the software is not validated, problems may occur when the machine is utilized.
Action / Repair:
The commands required to verify the InfiniBand switches software configuration vary slightly by the physcial configuration of the Oracle Exadata Database Machine. The key difference is whether or not the physical configuration includes a designated spine switch.
To verify the InfiniBand switches software configuration for a X2-8, a full rack Oracle Exadata Database Machine X2-2 or a late production model half rack Oracle Exadata Database Machine X2-2, with a designated spine switch properly configured per the "Oracle Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-15" with "sm_priority=8", and the name "RanDomsw-ib1", execute the following command as the "root" userid on one of the database servers:
/opt/oracle.SupportTools/ -I IS_SPINERanDomsw-ib1,RanDomsw-ib3,RanDomsw-ib2 

Where "RanDomsw-ib1, RanDomsw-ib3, and RanDomsw-ib2" are the switch names returned by the "ibswitches" command.
NOTE: There is no space between the "IS_SPINE" qualifier and the name of the designated spine switch.

The output will be similar to:
Checking if switch RanDomsw-ib1 is pingable...
Checking if switch RanDomsw-ib3 is pingable...
Checking if switch RanDomsw-ib2 is pingable...
Use the default password for all switches? (y/n) [n]: y
[INFO] SUCCESS Switch RanDomsw-ib1 has correct software and firmware version:
 SWVer: 1.3.3-2
[INFO] SUCCESS Switch RanDomsw-ib1 has correct opensm configuration:
 controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=8 

[INFO] SUCCESS Switch RanDomsw-ib3 has correct software and firmware version:
 SWVer: 1.3.3-2
[INFO] SUCCESS Switch RanDomsw-ib3 has correct opensm configuration:
 controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS Switch RanDomsw-ib2 has correct software and firmware version:
 SWVer: 1.3.3-2
[INFO] SUCCESS Switch RanDomsw-ib2 has correct opensm configuration:
 controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 

[INFO] SUCCESS All switches have correct software and firmware version:
 SWVer: 1.3.3-2
[INFO] SUCCESS All switches have correct opensm configuration:
 controlled_handover=TRUE polling_retry_number=5 routing_engine=ftree sminfo_polling_timeout=1000 sm_priority=5 for non spine and 8 for spine switch5 

To verify the InfiniBand switches software configuration for an early production model half rack Oracle Exadata Database Machine X2-2 (may not have shipped with a designated spine switch), or a quarter rack Oracle Exadata Database Machine X2-2 properly configured per the "Oracle Exadata Database Machine Owner's Guide 11g Release 2 (11.2) E13874-15", execute the following command as the "root" userid on one of the database servers:
/opt/oracle.SupportTools/ -I RanDomsw-ib3,RanDomsw-ib2 

Where "RanDomsw-ib3 and RanDomsw-ib2" are the switch names returned by the "ibswitches" command.
The output will be similar to the output for the first command, but there will be no references to a spine switch and all switches will have "sm_priority" of 5.
In either command case, the expected output is to return "SUCCESS". If anything else is returned, investigate and correct the condition.

Verify storage server network configuration with ipconf (ARCHIVE)

Archive Date: 05/13/15
Archive Reason: This storage server only check was replaced by "Verify active system values match those defined in configuration file "cell.conf" which executes on both storage and database servers with broader scope.

PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL05-Mar-2013Doug UtzigProductionExadata, SSC 
DB VersionDB RoleEngineered SystemExadata VersionOS & VersionValidation Tool VersionTBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Alln/a  

Benefit / Impact:
Exadata Storage Server network configuration is maintained in both operating system level configuration files and in Exadata-specific configuration files. The configuration defined in the two sets of files must match. To ensure proper configuration and consistency, network configuration changes to an Exadata Storage Server must be performed with the ipconf utility, as documented in the Oracle Exadata Storage Server Software User's Guide.
The impact of verifying that storage server network configuration is correct and consistent is minimal.
If operating system level configuration files and Exadata-specific configuration files are inconsistent, then maintenance activities like software patching may fail, or previous configuration may be restored without warning.
Action / Repair:
To verify operating system level configuration files and Exadata-specific configuration files are consistent, run the following ipconf command on storage servers:
# /usr/local/bin/ipconf -verify -semantic 

The output should be similar to:
Verifying of Exadata configuration file /opt/oracle.cellos/cell.conf Done. Configuration file /opt/oracle.cellos/cell.conf passed all verification checks 

If the output reports FAILED for any check, investigate to find the root cause, and then use only the ipconf utility to make the necessary corrections to the storage server network configuration. Refer to the Oracle Exadata Storage Server Software User's Guide for details of the ipconf utility. ASM Instance Initialization Parameters (ARCHIVE)

Archive Date: 05/08/15
Archive Reason: is fully desupported. Please see: "Release Schedule of Current Database Releases (Doc ID 742060.1)"
Priority: Critical
Benefit / Impact: Experience and testing has shown that certain ASM initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these ASM initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are specific to the ASM instances. Unless otherwise specified, the value is for both 2 socket and 8 socket Database Machines. The impact of setting these parameters is minimal.
Risk: If the ASM initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value.
Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):

ParameterRecommended ValuePriorityNotes
cluster_interconnectsBondib0 IP address for 2 socket servers
Colon delimited Bondib* IP addresses for 8 socket servers
1This is used to avoid the Clusterware HAIP address as its use is not supported on Exadata (the only exception being with RAC One Node)
asm_power_limit41This is Exadata default to mitigate application performance impact during ASM rebalance. Please evaluate application performance impact before using a higher ASM_POWER_LIMIT.
Memory_target1040M1This avoids issues with to upgrade. This is the default setting for Exadata.
processesFor < 10 instances per node,
50 * (DB instances per node + 1)
For >= 10 instances per node,
{(50 * MIN (db_instances_per_node +1, 11) }+ {10 * MAX (db_instances_per_node - 10, 0)}
1This avoids issues observed when ASM hits max # of processes.
NOTE: "instances" means "non-ASM" instances
[Internal] Note that bug 11842806 can cause excessive connections that even a properly configured processes parameter can't handle so the fix should be applied

Correct any Priority 1 parameter that is not set as recommended. Evaluate and correct any Priority 2 parameter that is not set as recommended.

Verify Common Instance Database Initialization Parameters (ARCHIVE)

Archive Date: 08/22/12
Archive Reason: This section was created to account for database initialization parameters that become deprecated at various release levels.
Critical, 08/02/11
Benefit / Impact: Experience and testing has shown that certain database initialization parameters should be set at specific values. These are the best practice values set at deployment time. By setting these database initialization parameters as recommended, known problems may be avoided and performance maximized. The parameters are common to all database instances. The impact of setting these parameters is minimal. The performance related settings provide guidance to maintain highest stability without sacrificing performance. Changing the default performance settings can be done after careful performance evaluation and clear understanding of the performance impact. Risk: If the database initialization parameters are not set as recommended, a variety of issues may be encountered, depending upon which initialization parameter is not set as recommended, and the actual set value. Action / Repair: To verify the database initialization parameters, compare the values in your environment against the table below (* = default value):

ParameterRecommended ValuePriorityNotes
_lm_rcvr_hang_allow_time1401This parameter protects from corner case timeouts lower in the stack and prevents instance evictions
Archive Reason: Deprecated with or higher boundary. exachk bug 14526144
_kill_diagnostics_timeout1401This parameter protects from corner case timeouts lower in the stack and prevents instance evictions
Archive Reason: Deprecated with or higher boundary. exachk bug 14526155

Verify RAID Controller Battery Condition (ARCHIVE)
Archive Date: 04/06/16
Archive Reason: This check became obsolete with the release of X5 series hardware
Priority Added Machine Type OS Type Exadata Version Oracle Version
Critical 03/02/11
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2, X4-8 Linux, Solaris
11.2.x + 11.2.x +
[Bug(s): 11828407 (Storage Server), 11832924 (EM Storage Server Plugin), 11832981 (EM Agent)]
The RAID controller battery loses its ability to support cache over time. Verifying the battery charge and condition allows proactive battery replacement.
The impact of verifying the RAID controller battery condition is minimal.
A failed RAID controller battery will put the RAID controller into WriteThrough mode which significantly impacts write I/O performance.
Execute the following command as the "root" userid on all servers:
if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 | egrep "Full Charge|Max Error|BatteryType" | sort | head -3
/opt/MegaRAID/MegaCli -AdpBbuCmd -a0 | egrep "Full Charge|Max Error|BatteryType" | sort | head -3
The output will be similar to:
BatteryType: iBBU08
Full Charge Capacity: 1272 mAh
Max Error: 0 %
Proactive battery replacement should be performed within 60 days for any batteries that meet the following criteria:
1) "Full Charge Capacity" less than or equal to 800 mAh and "Max Error" less than 10%.
Immediately replace any batteries that meet either of the following criteria:
1) "Max Error" is 10% or greater (battery deemed unreliable regardless of "Full Charge Capacity" reading)
2) "Full Charge Capacity" less than 674 mAh regardless of "Max Error" reading
[NOTE: The complete reference guide for LSI disk controller batteries used in Exadata can be found in MOS 1329989.1 (INTERNAL ONLY)]

Verify all "BIGFILE" tablespaces have non-default "MAXBYTES" values set (ARCHIVE)

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical11-Nov-2011X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux, [WIP:VW]Solaris11.2.x +11.2.x +

Benefit / Impact
"MAXBYTES" is the SQL attribute that expresses the "MAXSIZE" value that is used in the DDL command to set "AUTOEXTEND" to "ON". By default,
for a bigfile tablespace, the value is "3.5184E+13", or "35184372064256". The benefit of having "MAXBYTES" set at a non-default value for
"BIGFILE" tablespaces is that a runaway operation or heavy simultaneous use (e.g., temp tablespace) cannot take up all the space in a diskgroup.

The impact of verifying that "MAXBYTES" is set to a non-default value is minimal. The impact of setting the "MAXSIZE" attribute to a non-default
value "varies depending upon if it is done during database creation, file addition to a tablespace, or added to an existing file.


The risk of running out of space in a diskgroup varies by application and cannot be quantified here. A diskgroup running out of space may impact the entire database as well as ASM operations (e.g., rebalance operations).

Action / Repair

To obtain a list of file numbers and bigfile tablespaces that have the "MAXBYTES" attribute at the default value, enter the following sqlplus command logged into the database as sysdba:
select file_id, a.tablespace_name, autoextensible, maxbytes
from (select file_id, tablespace_name, autoextensible, maxbytes from dba_data_files where autoextensible='YES' and maxbytes = 35184372064256) a, 
(select tablespace_name from dba_tablespaces where bigfile='YES') b
where a.tablespace_name = b.tablespace_name
select file_id,a.tablespace_name, autoextensible, maxbytes
from (select file_id, tablespace_name, autoextensible, maxbytes from dba_temp_files where autoextensible='YES' and maxbytes = 35184372064256) a, 
(select tablespace_name from dba_tablespaces where bigfile='YES') b
where a.tablespace_name = b.tablespace_name;
The output should be:
no rows returned 
If you see output similar to:
---------- ------------------------------ --- ----------
1 TEMP YES 3.5184E+13
3 UNDOTBS1 YES 3.5184E+13
4 UNDOTBS2 YES 3.5184E+13
Investigate and correct the condition.

Ensure Temporary Tablespace is correctly defined (ARCHIVE)

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
 N/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x +11.2.x +
The temporary tablespace should be
  1. A BigFile Tablespace
  2. Located in DATA or RECO, whichever one is not HIGH redundancy
  3. Sized 32GB Initially
  4. Configured with AutoExtend on at 4GB
  5. Configured with a Max Size defined to limit out of control growth.

Verify "" is not executing

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System
CriticalWARN04/26/17Dib Chatterjee, Jaime FigueroaProductionRA, Exadata - Physical,
Exadata - User Domain
ALLbug 27376516 - Exachk
bug 25960055 - OEDA
bug 25955127 - Exachk
DB/GI VersionDB TypeDB RoleDB ModeExadata VersionOS & VersionOS & VersionMAA Scorecard Section Linux exachk 18.2.0 N/A


Benefit / Impact:

Starting with version, by default the Cluster Health Monitor (CHM) framework executes continuously the file "/u01/app/". Under certain conditions, this script executes the "pstack" command against key grid infrastructure processes. The output of "pstack" can be useful for diagnosing grid infrastructure issues, but the "pstack" command execution and locking can lead these key grid infrastructure processes to hang (especially ocssd) which can trigger node reboots. It is recommended that "" not execute continuously, and that the "pstack" command is only used when other diagnostics indicate a benefit.
The impact of verifying that "" is not executing is minimal, as is the impact of stopping it's execution.
Continuously executing "" may lead to node reboots that might have otherwise been avoided.
Action / Repair:
To verify that "" is not executing, as the owner userid of the grid home, and with the environment properly set to access the grid home, execute the following code set on each database server:


function chkdiagsnap
    if [ $DIAGSNAP_EXECUTING -gt 0 ]
        echo -e "WARNING: \"\" is executing on this database server.  Recommendation is to stop the process:\nDetails: $DIAGSNAP_OUTPUT"
           echo -e "SUCCESS: \"\" is not executing on this database server.\n"
    exit 0

function repair
  $CRS_HOME/bin/oclumon manage -disable diagsnap

DIAGSNAP_OUTPUT=$(ps -ef | grep $CRS_HOME | grep diagsnap | grep -v grep)
DIAGSNAP_EXECUTING=$(echo "$DIAGSNAP_OUTPUT" | grep -c diagsnap)

The expected output is:
SUCCESS: "" is not executing on this database server.
example of a "FAILURE:" result:
WARNING: "" is executing on this database server:
Details: root     386456 378366  0 Apr03 ?        00:30:17 /u01/app/ /u01/app/ start
NOTE: If a "WARNING:" result is returned, to stop the file "" from executing, as the owner userid of the grid home, and with the environment variables properly set, execute the following command:
$CRS_HOME/bin/oclumon manage -disable diagsnap

Verify memlock is 90% of phys ram when huge pages are enabled
Alert Level
Rene Kundersma
DB Version
DB Role
Engineered System
Exadata Version
OS & Version
Validation Tool Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Linux x86-64

Benefit / Impact:
Oracle recommends that the maximum locked memory be at least 90 percent of the installed physical memory when huge pages are enabled. Refer to the operating system documentation or issue the command ''man limits.conf'' for details. The impact of verifying this value is minimal. Also see
Incorrect resource settings can cause instability and performance problems.
Action / Repair:
Obtain hard and soft value for memlock from /etc/security/limits.conf. Verify this value is at least 90% of physical memory. When hugepages are configured (which should be true) - and this value is less than 90% we should print a warning and suggest the user to update the values

Verify RAID Controller Battery Temperature

Machine Type
OS Type
Exadata Version
Oracle Version
X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2
Linux, Solaris
11.2.x +
11.2.x +

Maintaining proper temperature ranges maximizes RAID controller battery life.
The impact of verifying RAID controller battery temperature is minimal.
A reported temperature of 60C or higher causes the battery to suspend charging until the temperature drops and shortens the service life of the battery, causing it to fail prematurely and put the RAID controller into
WriteThrough mode which significantly impacts write I/O performance.
To verify the RAID controller battery temperature, execute the following command as the "root" userid on all servers:
if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 -nolog| grep BatteryType;
/opt/MegaRAID/MegaCli/MegaCli64 -AdpBbuCmd -a0 -nolog | grep -i temper;
/opt/MegaRAID/MegaCli -AdpBbuCmd -a0 -nolog| grep BatteryType;
/opt/MegaRAID/MegaCli -AdpBbuCmd -a0 -nolog| grep -i temper;
The output will be similar to:
BatteryType: iBBU08
Temperature: 38 C
Temperature : OK
Over Temperature : No
If the battery temperature is equal to or greater than 55C, investigate and correct the environmental conditions.

NOTE: Replace Battery Module after 3 Year service life assuming the battery temperature has not exceeded 55C. If the temperature has exceeded 55C (battery temp shall not exceed 60C), replace the battery every 2 years.
[NOTE: The complete reference guide for LSI disk controller batteries used in Exadata can be found in MOS Unpublished Note 1329989.1 (INTERNAL ONLY)]

Verify Database Server Disk Controller Configuration

PriorityAlert LevelDateOwnerStatusEngineered System Engineered System
CriticalFAIL03/17/18Dib ChatterjeeProductionExadata - Physical,
Exadata - Management Domainl
X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-2Bug 27525145- exachk
Bug 26775963- exachk
Bug 24533088- exachk
Bug 20557656- exachk
DB VersionDB Type DB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
Linuxexachk 18.2.0N/A
Benefit / Impact:
The recommended configuration for a newly deployed (or upgraded from database server varies according to the hardware type and Exadata software version. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of verifying the database server disk controller configuration is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the database server disk controller configuration increases the chance of a performance degradation or an outage.
Action / Repair:
exachk contains all the logic necessary to identify the various correct configurations. To verify the database server disk controller configuration, run exachk and evaluate the results.
To manually verify the database server disk controller configuration, execute the following command set as the "root" userid on each database server or the management domain of a virtualized environment:
NOTE: This check is not applicable to X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
if [[ -d /proc/xen && ! -f /proc/xen/capabilities ]]
  echo -e "\nThis check will not run in a user domain of a virtualized environment.  Execute this check in the management domain.\n"
  if [ -x /opt/MegaRAID/storcli/storcli64 ]
    export CMD=/opt/MegaRAID/storcli/storcli64
    export CMD=/opt/MegaRAID/MegaCli/MegaCli64
  RAW_OUTPUT=$($CMD AdpAllInfo -aALL -nolog | grep "Device Present" -A 8);
  echo -e "The database server disk controller configuration found is:\n\n$RAW_OUTPUT";
The output will be similar to:
                Device Present
  Virtual Drives    : 1
    Degraded        : 0
    Offline         : 0
  Physical Devices  : 5
    Disks           : 4
    Critical Disks  : 0
    Failed Disks    : 0  
The output should match one of the combinations of entries in this table:

Database Server Disk Controller Configurations
X2-2(4170), X2-2 < 
X2-8 11 < 
X2-2(4170), X2-2, X3-2, X4-2, X5-2, X6-2, X7-2 >= 
X5-2, X6-2, X7-2 (Disk Expansion Kit) >= 
X2-8, X3-8 11 >= 
X4-8 >= 
X5-8, X6-8 >= 
NOTE: The Disk Expansion Kit is only applicable to X5-2, X6-2, and X7-2 database servers.

Verify Database Server Virtual Drive Configuration

PriorityAlert LevelDateOwnerStatusEngineered System Engineered System
CriticalFAIL03/07/18Dib ChatterjeeProductionExadata-Management Domain,
  X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-2Bug 27533289- exachk
Bug 26775963- exachk
Bug 24533222- exachk
Bug 20557656- exachk
DB VersionDB TypeDB RoleDB ModeExadata versionOS & Version Validation Tool VersionMAA Scorecard Section
Linux exachk 18.2.0 N/A
Benefit / Impact:
The recommended configuration for a newly deployed (or upgraded from database server varies according to the hardware type and Exadata software version. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of verifying the database server virtual drive configuration is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the virtual drives increases the chance of a performance degradation or an outage.
Action / Repair:
exachk contains all the logic necessary to identify the various correct configurations. To verify the database server disk controller configuration, run exachk and evaluate the results.
To manually verify the database server disk controller configuration, execute the following command set as the "root" userid on each database server or the management domain of a virtualized environment:
NOTE: This check is not applicable to X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
if [[ -d /proc/xen && ! -f /proc/xen/capabilities ]]
  echo -e "\nThis check will not run in a user domain of a virtualized environment.  Execute this check in the management domain.\n"
  if [ -x /opt/MegaRAID/storcli/storcli64 ]
    export CMD=/opt/MegaRAID/storcli/storcli64
    export CMD=/opt/MegaRAID/MegaCli/MegaCli64
  RAW_OUTPUT=$($CMD CfgDsply -aALL -nolog | egrep "Virtual Drive:|Number Of Drives|^State");
  echo -e "The database server virtual drive configuration found is:\n\n$RAW_OUTPUT";
The output will be similar to:
Virtual Drive: 0 (Target Id: 0)
Number Of Drives    : 4
State               : Optimal
The output should match one of the combinations of entries in this table:

Database Server Virtual Drive Configurations
Number of
Virtual Drives
StateNumber of
Physical Drives
X2-2(4170), X2-2 Optimal < 
X2-8 Optimal < 
X2-2(4170), X2-2, X3-2, X4-2, X5-2, X6-2, X7-2 Optimal >= 
X5-2, X6-2, X7-2 (Disk Expansion Kit) Optimal >= 
X2-8, X3-8 Optimal >= 
X4-8 Optimal >= 
X5-8, X6-8 Optimal >= 
NOTE: The virtual device number reported may vary depending upon configuration and version levels.NOTE: The Disk Expansion Kit is only applicable to X5-2, X6-2, and X7-2 database servers.
NOTE: If additonal virtual drives are present, it may be that the procedure to reclaimdisks was not executed at deployment time or that a bare metal restore procedure was performed without using the "dualboot=no" qualifier. Please refer to the "Reclaiming Disks for the Linux Operating System" section of "Oracle® Exadata Database Machine Owner's Guide, 11g Release 2 (11.2)". See also "Verify Database Server Physical Drive Configuration".

NOTE: If the database server was upgraded to, this check may fail because the reported number of drives is "3" or "7". Please see the "Known Issues" #5 "Hotspare removed for compute nodes" in My Oracle Support note 1468877.1 for corrective action.

Verify Database Server Physical Drive Configuration

PriorityAlert LevelDateOwnerStatusEngineered System   Engineered System
CriticalFAIL03/07/2018/Dib ChatterjeeProductionExadata - Physical,
Exadata - Management Domain
 X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-2Bug 27533421- exachk
Bug 26775963- exachk
Bug 24533293- exachk
Bug 20557656- exachk
DB VersionDB TypeDB RoleDB ModeExadata Version OS & Version Validation Tool VersionMAA Scorecard Section
Linux exachk 18.2.0 N/A
Benefit / Impact:
The recommended configuration for a newly deployed (or upgraded from database server varies according to the hardware type and Exadata software version. Verifying the status of the database server RAID devices helps to avoid a possible performance impact, or an outage.
The impact of verifying the database server physical drive configuration is minimal. The impact of corrective actions will vary depending on the specific issue uncovered, and may range from simple reconfiguration to an outage.
Not verifying the physical drives increases the chance of a performance degradation or an outage.
Action / Repair:
exachk contains all the logic necessary to identify the various correct configurations. To verify the database server physical drive configuration, run exachk and evaluate the results.
To manually verify the database server physical drive configuration, execute the following command set as the "root" userid on each database server or the management domain of a virtualized environment:
NOTE: This check is not applicable to X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
if [[ -d /proc/xen && ! -f /proc/xen/capabilities ]]
  echo -e "\nThis check will not run in a user domain of a virtualized environment.  Execute this check in the management domain.\n"
  if [ -x /opt/MegaRAID/storcli/storcli64 ]
    export CMD=/opt/MegaRAID/storcli/storcli64
    export CMD=/opt/MegaRAID/MegaCli/MegaCli64
  RAW_OUTPUT=$($CMD PDList -aALL -nolog | grep "Firmware state");
  echo -e "The database server physical drive configuration found is:\n\n$RAW_OUTPUT";
The output will be similar to:

Recommended Configuration 

The database server physical drive configuration found is:

Firmware state: Online, Spun Up 
<output truncated for brevity> 
Firmware state: Online, Spun Up
The output should match one of the combinations of entries in this table:

Database Server Physical Drive Configurations
OnlineSpun UpHotspareSpun DownExadata
X2-2(4170), X2-2 < 
X2-8 < 
X2-2(4170), X2-2, X3-2, X4-2, X5-2, X6-2, X7-2 >= 
X5-2, X6-2, X7-2 (Disk Expansion Kit) >= 
X2-8, X3-8 >= 
X4-8 >= 
X5-8, X6-8 >= 
If the reported output differs, investigate and correct the condition.
NOTE: The Disk Expansion Kit is only applicable to X5-2, X6-2, and X7-2 database servers.
NOTE: If the database server was upgraded to, this check may fail because one of the devices shows a state of: "Unconfigured(good), Spun Up". Please see the "Known Issues" #5 "Hotspare removed for compute nodes" in My Oracle Support note 1468877.1 for corrective action.

 Alternate Configuration

For an X2-2(4170), X2-2, or X2-8 database server which is running an Exadata software version lower than that is being upgraded to an Exadata software version of or higher, an alternate configuration is permitted. The alternate configuration for an X2-2(4170) or X2-2 uses 3 disks in the RAID set with 1 disk as a hot spare. The alternate configuration for an X2-8 uses 7 disks in the RAID set with 1 disk as a hot spare.
The output should be similar to:
Firmware state: Online, Spun Up 
<output truncated for brevity>
Firmware state: Hotspare, Spun down
For an X2-2(4170) or X2-2, the expected output should contain three lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". For an X2-8, the expected output should contain seven lines of output showing a state of "Online, Spun Up", and one line showing a state of "Hotspare, Spun down". In either case, the ordering of the output lines is not significant and may vary based upon a given database server's physical drive replacement history.
If the reported output differs, investigate and correct the condition.
NOTE: Modified 03/21/12Occasionally in normal operation, the "Hotspare" physical drive may be brought to a state of "Online, Spun Up". Thirty minutes (default) after the operation that brought the drive to "Online, Spun Up" has completed, the drive should spin down due to the powersaving feature. There is no harm for the drive to be "Online, Spun Up" if there are no other errors reported in the disk drive configuration checks.

For additional information, please reference My Oracle Support note "Exadata: Hot Spares Not Spinning Down (Doc ID 1403613.1)"

Verify database server disk controllers use writeback cache

PriorityAlert LevelDateOwnerStatusEngineered SystemEngineered System PlatformBug(s)
CriticalFAIL03/07/18 ProductionExadata - Physical,
Exadata - Management Domain
X2-2, X2-8, X3-2, X3-8, X4-2, X4-8, X5-2, X5-8, X6-2, X6-8, X7-2Bug 27523948 - exachk
DB Version DB Type DB RoleDB ModeExadata VersionOS & VersionValidation Tool VersionMAA Scorecard Section
N/AN/AN/AN/AALLLinuxexachk 18.2.0N/A
Benefit / Impact:
Database servers use an internal RAID controller with a battery-backed cache to host local filesystems. For maximum performance when writing I/O to local disks, the battery-backed cache should be in "WriteBack" mode.
The impact of configuring the battery-backed cache in "WriteBack" mode is minimal.
Not configuring the battery-backed cache in "WriteBack" mode will result in degraded performance when writing I/O to the local database server disks.
Action / Repair:
To verify that the disk controller battery-backed cache is in "WriteBack" mode, run the following set of commands as the "root" userid on all database servers:
NOTE: This check is not applicable to X7-8 Oracle Exadata Database Servers as they contain no conventional disk drives!
if [ -x /opt/MegaRAID/storcli/storcli64 ]
  export CMD=/opt/MegaRAID/storcli/storcli64
  export CMD=/opt/MegaRAID/MegaCli/MegaCli64
RAW_OUTPUT=$($CMD -CfgDsply -a0 -nolog | egrep -i "Virtual Drive:|Current Cache Policy:" | grep -v Number | sed 'N;s/\n/ /')
NON_WRITEBACK=$(echo -n "$RAW_OUTPUT" | grep -vi writeback)
if [ -z "$NON_WRITEBACK" ]
  echo -e "SUCCESS: All virtual drives have \"Current Cache Policy\" set to \"WriteBack\"."
  echo -e "FAILURE: One or more virtual drives do not have \"Current Cache Policy\" set to \"WriteBack\".  Details:\n\n$NON_WRITEBACK"
The output should be:
SUCCESS: All virtual drives have "Current Cache Policy" set to "WriteBack".
Example of a "FAILURE:" result:
FAILURE: One or more virtual drives do not have "Current Cache Policy" set to "WriteBack".  Details:

Virtual Drive: 0 (Target Id: 1) Current Cache Policy: WriteThrough, ReadAheadNone, Direct, No Write Cache if Bad BBU
If the battery-backed cache is not in "WriteBack" mode, run these commands as the "root" userid on the effected database server to place the battery-backed cache into "WriteBack" mode:
if [ -x /opt/MegaRAID/storcli/storcli64 ]
  export CMD=/opt/MegaRAID/storcli/storcli64
  export CMD=/opt/MegaRAID/MegaCli/MegaCli64
$CMD -LDSetProp WB  -Lall  -a0 -nolog
$CMD -LDSetProp NoCachedBadBBU -Lall  -a0 -nolog
$CMD -LDSetProp NORA -Lall  -a0 -nolog
$CMD -LDSetProp Direct -Lall  -a0 -nolog
NOTE: No settings should be modified on Exadata storage cells. The mode described above applies only to database servers in an Exadata database machine.

Verify that "Disk Cache Policy" is set to "Disabled"

PriorityAddedMachine TypeOS TypeExadata VersionOracle Version
Critical06/13/11X2-2(4170), X2-2, X2-8, X3-2, X3-8, X4-2Linux11.2.x +11.2.x +

Benefit / Impact:
"Disk Cache Policy" is set to "Disabled" by default at imaging time and should not be changed because the cache created by setting "Disk Cache Policy" to "Enabled" is not battery backed. It is possible that a replacement drive
has the disk cache policy enabled so its a good idea to check this setting after replacing a drive.
The impact of verifying that "Disk Cache Policy" is set to "Disabled" is minimal. The impact of suddenly losing power with "Disk Cache Policy" set to anything other than "Disabled" will vary according to each specific case,
and cannot be estimated here.
If the "Disk Cache Policy" is not "Disabled", there is a risk of data loss in the event of a sudden power loss because the cache created by "Disk Cache Policy" is not backed up by a battery.
Action / Repair:
To verify that "Disk Cache Policy" is set to "Disabled" on all servers, use the following command as the "root" userid on the first database server in the cluster:
unset TMP_RSLT;
TMP_RSLT='dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root "if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]; then /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL -nolog; else /opt/MegaRAID/MegaCli -LdPdInfo -aALL -nolog; fi;" | grep -i 'Disk Cache Policy' | grep -v Disabled | wc -l'
if [ $TMP_RSLT = 0 ]
echo -e "\nSUCCESS\n"
echo -e "\nFAILURE:";
dcli -g /opt/oracle.SupportTools/onecommand/all_group -l root "if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]; then /opt/MegaRAID/MegaCli/MegaCli64 -LdPdInfo -aALL -nolog; else /opt/MegaRAID/MegaCli -LdPdInfo -aALL -nolog; fi;" | grep -i 'Disk Cache Policy' | grep -v Disabled;
echo -e "\n";
The output should be:
If anything other than "SUCCESS" is returned, identify the LUN(s) in question and reset the "Disk Cache Policy" to "Disabled" using the following commands as the "root" userid on the server that reported the issue (where Lx= the lun in question, for example: L2):
if [ -x /opt/MegaRAID/MegaCli/MegaCli64 ]
export TMP_CMD=/opt/MegaRAID/MegaCli/MegaCli64
export TMP_CMD=/opt/MegaRAID/MegaCli
$TMP_CMD -LDSetProp -DisDskCache -Lx -a0 -nolog
Note: The "Disk Cache Policy" is completely separate from the disk controller caching mode of "WriteBack". Do not
confuse the two. The cache created by "WriteBack" cache mode is battery-backed, the cache created by "Disk Cache Policy" is not!

Verify service exachkcfg autostart status
PriorityAlert LevelDateOwnerStatusScopeBug(s)
CriticalFAIL05/14/2014<Name>ProductionExadata, SSC, Exalogic18735585- exachk
DB VersionDB RoleEngineered SystemExadata VersionOS & Version Validation Tool Version TBD
N/AN/AX2-2(4170), X2-2, X2-8, X3-2, X3-8, X4- x86-64exachk 2.2.5 
Benefit / Impact:
Verifying the exachkcfg service autostart status helps to avoid an unexpected modification attempt and possibly lengthened boot sequence. The Impact of verifying the exachkcfg service autostart status is minimal.
On either a database or storage server, a required maintenance operation or an incorrect configuration change might be missed.
Action / Repair:
To verify the exachkcfg service autostart status, execute the following command as the "root" userid on all storage and database servers:
chkconfig --list exachkcfg;
The output should be similar to:
exachkcfg 0:off 1:off 2:off 3:on 4:off 5:off 6:off
For either a database or storage server, run level 3 should be "on" (3:on).
It should be rare to find this not set as expected. Should a correction be required, as the root userid, use the "chkconfig --level" command. For example, to set the run level "3" for exachkcfg to "on" for a database server with exadata image version >=
[root@randomdb03 ~]# chkconfig --level 3 exachkcfg on
For another example, to set the run level "3" for exachkcfg to "off" for a database server with exadata image version <
[root@randomdb03 ~]# chkconfig --level 3 exachkcfg off

NOTE: At exadata image versions below, on a database server all run levels should be set to "off", and on a storage server, at least one run level should be set to "on" (number varies by exadata software version).

Check alerthistory for test open stateless alerts

Alert Level
Engineered System 
Engineered System
Exadata - Physical,
Exadata - Management Domain
26651210 - exachk
21299854 - exachk
GI/DB Version
DB Type
DB Role
DB Mode
Exadata Version
OS & Version
Validation Tool Version
MAA Scorecard Section

Benefit / Impact
There are two types of alerts maintained in the alerthistory of a storage or database server, stateful and stateless.
A stateless alert is not cleared automatically. They will not age out of the alerthistory until the alert is manually investigated and the "examinedby" field set manually to a non-null value, typically the name of the person who reviewed the stateless alert and corrected or otherwise acted upon the information provided.
The benefit of checking for for test open stateless alerts is a less cluttered alerthistory. The impact of acknowledging any test open stateless alert is minimal.
Unnecessary test alerts maintained in the alerthistory.
Action / Repair:
To verify there are no test open stateless alerts, as the root userid on each storage and database server execute the following commands:
unset SID
unset ACTION
if [ `egrep -i node.type /opt/oracle.cellos/cell.conf | grep -i db | wc -l` -eq 1 ]
  then NODE_TYPE=db
IMAGE_VERSION=$(imageinfo -version |tr -d '.'|cut -c1-6)
if [ $NODE_TYPE = "cell" ]
  if [ $IMAGE_VERSION -ge 121211 ]
    then COMMAND_NAME=dbmcli
if [ -n "$COMMAND_NAME" ]
  NAME_ARRAY=$($COMMAND_NAME -e list alerthistory attributes name,alertmessage where alerttype=stateless and examinedby=\'\' | grep -iw test | sed -e 's/^[ \t]*//' | cut -d" " -f1);
  if [ -z "$NAME_ARRAY" ]
    echo -e "SUCCESS: there are no test open stateless alerts."
      NAME_RECORD=$($COMMAND_NAME -e "list alerthistory attributes alertsequenceid,severity,alertMessage,alertAction where name=$INDIVIDUAL_NAME" | awk '{$2=$2};1')
      SID=$(echo "$NAME_RECORD" | cut -d" " -f1)
      SEVERITY=$(echo "$NAME_RECORD" | cut -d" " -f2)
      MESSAGE=$(echo "$NAME_RECORD" | cut -d'"' -f2)
      ACTION=$(echo "$NAME_RECORD" | cut -d'"' -f4)
      OUTPUT_ARRAY+=$(echo -e "\n";echo -e "SID:\t\t$SID";echo -e "NAME:\t\t$INDIVIDUAL_NAME";echo -e "SEVERITY:\t$SEVERITY";echo -e "MESSAGE:\t$MESSAGE";echo -e "ACTION:\t\t$ACTION")
    echo -e -n "FAILURE: there are one or more test open atateless alerts that have not been cleared. Details:"
    echo -e "${OUTPUT_ARRAY[@]}"
  echo "alerthistory is not available on database servers at image versions below $NODE_TYPE $IMAGE_VERSION"
The output should be similar to:
SUCCESS: there are no test open stateless alerts.
- OR -
alerthistory is not available on database servers at image versions below db 112322
If the output is not as expected, examine the full details for each name that has not been cleared and follow the recommendations.
Example of a FAILURE result:
FAILURE: there are one or more test open atateless alerts that have not been cleared. Details:
SID:            2
NAME:           2
SEVERITY:       info
MESSAGE:        "This is a test trap"
To acknowledge a test open stateless alert, manually set the "examinedby" field with a command similar to the following (command name is either cellcli or dbmcli, depending upon whether a storage or database server is involved):
CellCLI> alter alerthistory 2 examinedby="jdoe"
Alert 2 successfully altered
Where jdoe is the name of the person who verified the test open stateless alert, and the number is the name of the stateless alert. Note that double quotes are used around the value to be set, but not the name of the stateless alert.

 Revision History

Nov 23 2016Hidden Parameters  Table MAA Nov 23 2016
Nov 16 2016MAA Nov 16 2016  Verify There Are No Memory (ECC) Errors
Oct 5 2016Check /EXAVMIMAGES on dom0s for possible over allocation by sparse files
Oct 5 2016Verify InfiniBand Address Resolution Protocol (ARP) Configuration on Database Servers
Aug 22 2016Verify "_reconnect_to_cell_attempts=9" on database servers which access X6 storage servers
April 6 2016Detect duplicate files in /etc/*init* directories
Verify Initialization parameters and diskgroup attributes
Verify RAID disk controller Cache Valur Capacitor condition
verify Exadata Smart Flash Cache is created
March 23 2016Verify Ambient Air Temperature – improved existing section
March 16 2016Verify database server file systems have "Maximum mount count" = "-1"
Verify database server file systems have "Check interval" = "0"
February 17 2016Verify Datafiles are Placed on Diskgroups consisting of griddisks with unset cachedBy attribute– updated to only check when flashcache in WriteBack mode
February 16 2016Adding Validate key sysctl.conf parameters on database servers
February 5 2016Adding:
Verify storage server data (non-system) disks have no partitions
Verify db_unique_name is used in I/O Resource Management (IORM) interdatabase plans
Verify Datafiles are Placed on Diskgroups consisting of griddisks with cachingPolicy = DEFAULT
Verify Datafiles are Placed on Diskgroups consisting of griddisks with unset cachedBy attribute
January 26 2016Adding X4-8/X5-2 to the list of supported platforms
Feb 10 2017Consolidation Parameters Reference Table – updates to parallel parameters row, and removal of unneeded Exadata platform specific resource references
Mar 01 2017Verify "downdelay" is correctly set for bonded client interfaces – improved with more checks
Verify Storage Server user "CELLDIAG" exists – improved with prompt for password
Mar 14 2017(1) Verify RDS Protocol over InfiniBand Network is used – existing section improved
(2) Verify all Database and Storage Servers are synchronized with the same NTP server – existing section improved
Mar 27 1017(1) Check /EXAVMIMAGES on dom0s for possible over allocation by sparse files – converted to new style using exachk -check
Apr 4 2017 (1) Verify ExaWatcher is executing
(2) Verify non-Default services are created for all Pluggable Databases
Apr 27 2017(1) Verify "" is not executing
Jun 7 2017(1) Verify Hidden Initialization Parameter Usage – updated version for _parallel_adaptive_max_users to include 12.2
(2) Verify IP routing configuration on database servers
(3) Verify Grid Infrastructure Management Database (MGMTDB) configuration
Jun 29 2017(1) Verify Automatic Storage Management Cluster File System (ACFS) is on a separate Disk Group
 July 12 2017(1) Ensure Temporary Tablespace is correctly defined (ARCHIVE) –archived; confirmed deployment template has key attributes that are still valid today
(2) Verify ASM Diskgroup Attributes for 12.2.0.x –new
(3) Verify the SYSTEM, SYSAUX, USERS and TEMP tablespaces are of type bigfile –new
(4) Verify ASM Diskgroup Attributes for 12.1.0.x –updated to have “>=” for repair timers
(5) Verify the ownership and permissions of the "oradism" file –updated to execute as software owner instead of root
(6) Verify all "BIGFILE" tablespaces have non-default "MAXBYTES" values set (ARCHIVE) –archived; relying on other tools like EM to handle the problem this was originally created to solve
July 19 2017 
July 26 2017 
Sep 9 2017  (1)
Verify Hidden Initialization Parameter Usage – added _asm_max_connected_clients as acceptable in
Oct 10 2017(1) Verify the recommended patches for Adaptive features are installed
(2) Verify that griddisks are distributed as expected across celldisks
(3) Verify Exadata Smart Flash Cache is Created
(4) Verify Database Server Disk Controller Configuration
(5) Verify Database Server Virtual Drive Configuration
Oct 28 2017 
(1) Verify Database Server Physical Drive Configuration
(2) Verify Grid Infrastructure Management Database (MGMTDB) configuration (ARCHIVE) – archived
Nov 22 2017  
(1) Verify that griddisks are distributed as expected across celldisks – update; added exception for griddisk RA prefix “CATALOG”
(2) Check alerthistory for non-test open stateless alerts & Check alerthistory for test open stateless alerts – update; improved formatting
Dec 1 2017
(1) Verify that griddisks are distributed as expected across celldisks – update; added exception for griddisk RA prefix “CATALOG”
(2) Check alerthistory for non-test open stateless alerts & Check alerthistory for test open stateless alerts – update; improved formatting
(3) Verify initialization parameter
(4) cluster_database_instances is at the default value
Verify the database server NVME device configuration
(5) Verify celldisk configuration on flash memory devices
Jan 25 2018
(1) Verify "" is not executing - update; added repair operation
(2) Verify all Database and Storage Servers are synchronized with the same NTP server – update; retrofitted for exachk
(3) Verify that Automatic Storage Management Cluster File System (ACFS) uses 4K metadata block size
(4) Verify database server quorum disks configuration
Mar 08 2018
(1) Verify RAID disk controller CacheVault capacitor condition
(2) Modified - Verify the storage servers in use configuration matches across the cluster
(3) Modified - Verify database server disk controllers use writeback cache
(4) Verify Database Server Virtual Drive Configuration
(5) Verify Database Server Physical Drive Configuration
(6) Verify active system values match those defined in configuration file "cell.conf"
Mar 21 2018 
(1) Check cell BIOS state for restore pending status (ARCHIVE) – archived
Apr 21 2018
(1) Evaluate Automated Maintenance Tasks configuration -new BP added.
May 15 2018
(1) Verify proper ACFS drivers are installed for Spectre v2 mitigation
Jun 7 2018 
(1) Verify "" is not executing (ARCHIVE) – archived; we have coverage in critical issue DB41
(2) Verify memlock is 90% of phys ram when huge pages are enabled (ARCHIVE) – archived; orachk will retain memlock check for hugepages
Jun 28 2018
(1) Verify Exafusion Memory Lock Configuration
Jul 13 2018
(1) included release 18c for _asm_max_connected_clients
Aug 14 2018
(1) Verify Hidden Initialization Parameter Usage - update; consolidated all recommendations around hidden parameters into one section
(2) Verify there are no unhealthy InfiniBand switch sensors
(3) Verify RAID disk controller CacheVault capacitor condition
(4) Verify RAID Disk Controller Battery Condition
(5) Verify RAID Controller Battery Temperature (ARCHIVE)
Sep 26 2018
(1) Verify the InfiniBand Fabric Topology (verify-topology)
(2) Refer to MOS 1682501.1 if non-Exadata components are in use on the InfiniBand fabric
(3) Verify Database Server Disk Controller Configuration (ARCHIVE) - will not run in 18.1 and higher
(4) Verify Database Server Virtual Drive Configuration (ARCHIVE) - will not run in 18.1 and higher
(5) Verify Database Server Physical Drive Configuration (ARCHIVE) - will not run in 18.1 and higher
(6) Verify Common Instance Database Initialization Parameters for 12.1.0.x & Verify Common Instance Database Initialization Parameters for – expand existing audit_trail and control_files checks
Sep 27 2018
(1) Verify database server disk controllers use "WriteBack" cache (ARCHIVE) – no longer needed in Exadata 18.1 and higher
(2) Verify that "Disk Cache Policy" is set to "Disabled" (ARCHIVE) – no longer needed in Exadata 18.1 and higher
(3) Verify service exachkcfg autostart status (ARCHIVE) – no longer needed in Exadata 19.1 and higher
Oct 3 2018
(1) Verify Hidden Initialization Parameter Usage – update; adjusted _backup_disk_bufcnt, _backup_disk_bufsz, _backup_file_bufcnt, _backup_file_bufsz to only be checked with database version 12.1 and lower
Dec 18 2018
(1) Verify active kernel version matches expected version for installed Exadata Image -- OL7 support added
(2) Verify installed rpm(s) kernel type match the active kernel version -- OL7 support added
(3) Verify the Master Subnet Manager is running on an InfiniBand switch -- OL7 support added
(4) Verify the Subnet Manager is properly disabled -- OL7 support disabled.
Feb 13 2018
(1) Verify the storage servers in use configuration matches across the cluster
Apr 20 2019
(1) Verify the ib_sdp module is not loaded into the kernel
May 03 2019
(1) Verify Hidden Initialization Parameter Usage - update; improved wording for _enable_numa_support
(2) Verify the vm.min_free_kbytes configuration - update; improved logic making it numa aware and increasing value accordingly
(3) Verify all database and storage servers time server configuration - update to cover mixed ntp/chrony case
Jul 11 2019
(1) Verify all voting disks are online & Verify database server quorum disks configuration - improved existing sections
(2) Verify all database and storage servers time server configuration - update to cover mixed ntp/chrony case
(3) Verify Automatic Storage Management Cluster File System (ACFS) file systems do not contain critical database files- improved existing section
(4) Verify the recommended patches for Adaptive features are installed- improved existing section
(5) Check alerthistory for stateful alerts not cleared - improved existing section & Check alerthistory for non-test open stateless alerts - improved existing section
(6) Check alerthistory for test open stateless alerts (ARCHIVE)
Sep 18 2019
(1) Verify available ksplice fixes are installed
(2) Verify Automatic Storage Management Cluster File System (ACFS) file systems do not contain critical database files - Improved the existing section 



NOTE:1351559.1 - IDT switch on the PCI riser has a problem resulting in occasional loss of connectivity to pair of flash cards on the cells
NOTE:401749.1 - Oracle Linux: Shell Script to Calculate Values Recommended Linux HugePages / HugeTLB Configuration
NOTE:1284070.1 - Updating key software components on database hosts to match those on the cells
NOTE:1298957.1 - Manage Audit File Directory Growth with cron
NOTE:1286796.1 - rp_filter for multiple private interconnects and Linux Kernel 2.6.32+
NOTE:359515.1 - Mount Options for Oracle files for RAC databases and Clusterware when used with NFS on NAS devices
NOTE:1351036.1 - How to Validate and Fix Proper ASM Failure Group Configuration on Oracle Exadata Database Machine
NOTE:1188080.1 - Steps to shut down or reboot an Exadata storage cell without affecting ASM
Didn't find what you are looking for?

Security Checklist: 10 Basic Steps to Make Your Database Secure from Attacks (Doc ID 1545816.1)

this Document
 Step 1:  Change passwords for SYS and SYSTEM 
 Step 2:  Lock, expire, and change passwords for default or unused accounts
 Step 3:  Restrict access to the Oracle home and installation files
 Step 4:  Review database user privileges
 Step 5:  Revoke privileges from PUBLIC where not necessary
 Step 6:  Protect the data dictionary from unauthorized users
 Step 7:  Set security related parameters to their recommended values
 a. remote_os_authent = false
 b. sec_case_sensitive_logon = true
 c. global_names = true
 d. unset parameter utl_file_dir
 Step 8:  Protect listener and network connections
 Automatic instance registration and CVE-2012-1675
 Encrypt sqlnet connections using network encryption.
 Step 9:  Protect the database host
 Step 10:  Check Oracle websites for Security Alerts and critical patches
 Other Items to Consider
 Further Reading
 Online Discussion (My Oracle Support Community)


Oracle Database - Enterprise Edition - Version to [Release 8.1.7 to 12.1]
Oracle Database - Standard Edition - Version to [Release 8.1.7 to 12.1]
Information in this document applies to any platform.


This article provides a quick checklist to help enforce database security.
It serves as a starting point to help DBAs address basic security risks, and provides pointers to further reading and additional discussion.

Ask Questions, Get Help, And Share Your Experiences With This Article
Would you like to explore this topic further with other Oracle Customers, Oracle Employees, and Industry Experts?

Click here to join the discussion where you can ask questions, get help from others, and share your experiences with this specific article.

Discover discussions about other articles and helpful subjects by clicking here to access the main My Oracle Support Community page for Database Security


Step 1:  Change passwords for SYS and SYSTEM 

After database creation, if you used the default passwords for SYS and SYSTEM, then change the passwords for these administrative users immediately.
Note 1051982.6 - How to Change SYS and SYSTEM Passwords

If Database Vault is enabled, ALTER USER functionality is affected.  See also: What to Expect After Installing Database Vault

Step 2:  Lock, expire, and change passwords for default or unused accounts

The number of default accounts varies, depending on the products one chooses to install.
For a list of Oracle products and the accounts created for these products, refer to:

Note 472937.1 - Information On Installed Database Components and Schemas

It is safer to lock accounts, and/or change their default passwords, than to drop them (immediately).  For example, one can DROP USER SCOTT without issue, but one might not want to DROP USER CTXSYS if Oracle Context may be used in the future.  Components may also have dependencies, such as Database Vault (with Oracle Label Security).

For production environments, do not use default passwords for any administrative accounts, including SYSMAN and DBSNMP.

Step 3:  Restrict access to the Oracle home and installation files

The Oracle account should own all Oracle system files and installation files.  The OSDBA operating system user group should have privileges on all Oracle system and installation files.  No user outside the OSDBA group should have write access on any files or directories in the Oracle installation.

Oracle also recommends restricting symbolic links.  Ensure that when providing a path or file to the database, neither the file nor any part of the path is modifiable by an untrusted user.  The file and all components of the path should be owned by the DBA or some trusted account, such as root.

(9i only)  Secure access to trace files.  The ALTER SESSION privilege can produce trace files which may show sensitive data such as literal password changes.
Note 210317.1 - ALERT: ALTER SESSION privilege can dump trace files with possibly sensitive data

Step 4:  Review database user privileges

Practice the principle of least privilege: Users should be given only those privileges that are actually required to efficiently perform their jobs.

Note 1347470.1 - Master Note For Privileges And Roles
Note 1020286.6 - Script to Create View to Show All User Privs
Note 1050267.6 - SCRIPT: Script to show table privileges for users and roles
Note 1020176.6 - SCRIPT: Script to Generate object privilege GRANTS

Step 5:  Revoke privileges from PUBLIC where not necessary

The PUBLIC role is automatically assumed by every database user account.  By default, it has no privileges granted to it, but it does have numerous grants, mostly to Java objects.  Because all users have the PUBLIC role, any database user can exercise privileges that are granted to this role.

Security administrators and database users should grant a privilege or role to PUBLIC only if every database user requires the privilege or role.

Note 247093.1 - Be Cautious When Revoking Privileges Granted to PUBLIC
Note 234551.1 - PUBLIC Is it a User, a Role, a User Group, a Privilege ?
Note 390225.1 - Execute Privileges Are Reset For Public After Applying Patchset

Step 6:  Protect the data dictionary from unauthorized users

Oracle recommends preventing users from using the "ANY" system privileges on the data dictionary.
Ensure that O7_DICTIONARY_ACCESSIBILITY is set to FALSE.  (This is the default on versions higher than Oracle 8i.)


Step 7:  Set security related parameters to their recommended values

a. remote_os_authent = false
Setting this parameter to FALSE does not mean that users cannot connect remotely.  It simply means that the database will not trust that the client has already been authenticated, and will perform authentication checks accordingly.

Furthermore, this parameter has been deprecated as of Oracle 11g:
Note 456001.1 - ORA-32004: obsolete and/or deprecated parameter(s) specified: remote_os_authent
b. sec_case_sensitive_logon = true
This will allow case sensitive passwords and is also tied to more secure password hash algorithm, for further reading check:
note 429465.1 11g R1 New Feature : Case Sensitive Passwords and Strong User Authentication
As of version parameter sec_case_sensitive_logon is deprecated and its default value is TRUE, this means that if you set the parameter to FALSE, this error will be reported:
ORA-32004: obsolete or deprecated parameter(s)  specified for RDBMS instance
c. global_names = true
This parameter will enforce domain checking, for more information see:
note 957432.1 Health Check Alert: Consider setting GLOBAL_NAMES to TRUE
d. unset parameter utl_file_dir
Instead of using parameter utl_file_dir push your application developers to use DIRECTORY objects to mediate access to OS files, for more information check:
note 196939.1 Using CREATE DIRECTORY Instead of UTL_FILE_DIR init.ora Parameter

Step 8:  Protect listener and network connections

Because the listener acts as the database gateway to the network, it is important to consider listener security as well.
Refer to:  "Guidelines for Securing the Network Connection"

Note 1328725.1 - Deprecation of Listener Password in Oracle Database 11g Release 2
Note 260986.1 - Setting Listener Passwords With an Oracle 10g or Newer Listener
Note 364388.1 - How To Network Secure Your Oracle Database Listener in Intranet / Internet
Automatic instance registration and CVE-2012-1675
The CVE-2012-1675 vulnerability is about the security for automatic instance registration, instead of a practical attack method it was more of a proof of concept of a security researcher to highjack database connections, for affected versions implement the recommendations as advised in CVE-2012-1675 which provides authentication for the database to the listener using a wallet and certificates (COST), this mechanism also works in higher versions since it is general. However recent Oracle versions also have a feature called VNCR which secures the listener registration also without needing TCPS, for non RAC this does not need additional configuration since local databases are allowed to register with a listener on the same host by default, however for RAC you want all instances to cross register with all listeners on the cluster for load balancing etc. in that case see:
Note 1914282.1 How to Enable VNCR on RAC Database to Register only Local Instances.
If you like the COST method better you can still use: Note 1340831.1 for Oracle Database deployments that use Oracle Real Application Clusters (RAC).
Encrypt sqlnet connections using network encryption.

Consider to encrypt network traffic between clients, databases, and application servers.  For an introduction to Oracle network encryption, see "Configuring Network Data Encryption and Integrity". With the Network Encryption feature liberated from the license for the Advanced Security Option there's no longer any reason for not implementing at least native network encryption for Oracle client / server connections.

To mitigate a number of recent vulnerability issues with ssl / tls, please consider to add the following parameters to both sqlnet.ora and listener.ora:


This will have the following effect on secured connections originating or terminating from the database or oracle listener:
 - disable ssl v3 and thus cut-off any vulnerability in this deprecated protocol
 - by explicitly configuring only a limited number of cipher suites disable the use of RC4 and the dreaded export ciphers.

Step 9:  Protect the database host

Run Oracle databases behind at least one corporate firewall; do not open holes in the firewall (such as by opening port 1521 for listener connections from the Internet).  If such remote access is required, consider implementing a third-party VPN solution, to integrate a remote client network securely within the corporate intranet.

Oracle also offers a dedicated database firewall as part of Oracle Audit Vault and Database Firewall, which can be used to further protect the database, particularly against SQL Injection attacks.

Both UNIX and Windows platforms provide a variety of operating system services, most of which are not necessary for most deployments.  Such services include FTP, TFTP, TELNET, and so forth.   Be sure to close both the UDP and TCP ports for each service that is being disabled.  Disabling one type of port, and not the other, does not make the operating system more secure.

Always apply all relevant and current security patches for the operating system.

Step 10:  Check Oracle websites for Security Alerts and critical patches

Review the Oracle Technology Network page on Critical Patch Updates (also known as Security Patch Updates) and Security Alerts:

Visit My Oracle Support for details on available and upcoming security-related patches:

Note 1454618.1 - Quick Reference to Patch Numbers for Database PSU, SPU(CPU), Bundle Patches and Patchsets
Note 1074055.1 - Security Vulnerability FAQ for Oracle Database and Fusion Middleware Products
Prior to October 2012, Security Patch Update (SPU) patches were called Critical Patch Update (CPU) patches.

Other Items to Consider

  • Oracle recommends that basic password management rules (such as password length, history, complexity, and so forth) as provided by the database be applied to all user passwords and that all users be required to change their passwords periodically.
    Refer to:  Note 114930.1 - Oracle Password Management Policy
  • The Oracle database assumes certain users are trusted.  Do some people have (administrative) passwords who actually do not need them?
  • Set up monitoring to watch for suspicious activity.  At the database level, decide how one wants to audit user activity, and configure auditing accordingly.
    Refer to:  Note 1299033.1 - Master Note for Oracle Database Auditing
  • Physical security is vital as well.  The server maybe safely locked away in the datacenter, but what about backup tapes?  If individuals have access to backups, these persons can have an entire system at their disposal, for analysis, attack , or both.  Tapes are often stored in an insecure fashion; the same can be said for export files.  Consider as well the eventual retirement of hard disk drives.
    The use of Transparent Data Encryption, or TDE (available with the Advanced Security Option) can help mitigate risks associated with physical storage.

Further Reading

Online Discussion (My Oracle Support Community)

Draw on the experiences of industry professionals at the Database Security Products Community:
Security is a hot topic, and many are working towards this common goal.  The Community provides a place to share your questions and comments with your peers.
Oracle Support also monitors the Community and contributes to these discussions as well.
The window below is a live discussion of this article (not a screenshot).  We encourage you to join the discussion by clicking the "Reply" link below for the entry you would like to provide feedback on.  If you have questions or implementation issues with the information in the article above, please share that below.

Click here to open the discussion in a new browser window.

Master Note For Oracle Virtual Private Database ( VPD / FGAC / RLS ) (Doc ID 1352641.1)

his Document
 Oracle Virtual Private Database Concepts and Overview
 Oracle Virtual Private Database Configuration and Administration
 Row level VPD
 Column level VPD
 Column masking VPD
 Oracle Virtual Private Database HOWTOs
 Oracle Virtual Private Database Troubleshooting
 Oracle Database Fine Grained Access Control Documentation
 Using My Oracle Support Effectively


Oracle Database - Enterprise Edition - Version to [Release 8.1.7 to 12.1]
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Oracle Database Exadata Cloud Machine - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Information in this document applies to any platform.
***Checked for relevance on 21-Feb-2013***


This Master Note is intended to provide an index and references to the most frequently used My Oracle Support Notes with respect to Oracle Virtual Private Database. This Master Note is subdivided into categories to allow for easy access and reference to notes that are applicable to your area of interest.
NOTE: In the images and/or the document content below, the user information and data used represents fictitious data from the Oracle sample schema(s) or Public Documentation delivered with an Oracle database product. Any similarity to actual persons, living or dead, is purely coincidental and not intended in any manner.


This document is meant for use as a guide by those who are configuring or managing/troubleshooting the Oracle Virtual Private Database.


Oracle Virtual Private Database Concepts and Overview

Oracle Virtual Private Database (VPD) allows you to create security policies to control the access at the row and column level. These security policies are enforced by the database  rather than an application, which means that use of a different application will not bypass the security policy.

Oracle adds dynamically and transparently a WHERE clause( predicate ) to a SQL statement that is executed against the object (table, view or synonym) to which a VPD policy was applied. The predicate (WHERE clause) is returned by a custom function which implements the security policy. It is your responsability to write correctly this function so that it will return the expected predicates in various scenarios.

To make the implementation of a security policy easier, you have the option of using application context within a fine grained access control(FGAC) function. Virtual private database (VPD) is the term used for the combination of fine grained access control(FGAC) with the application contexts.
The VPD policies can be applied to the following statements:


Note: The VPD policies do not restrict in any way the DDL statements

Note: Database users who were granted the EXEMPT ACCESS POLICY privilege, either directly or through a database role and the users who are connecting as SYSDBA are exempt from Oracle Virtual Private Database enforcements.

Oracle Virtual Private Database Configuration and Administration

The VPD policies are created and maintained via the package DBMS_RLS.

Row level VPD

Let's assume that we want to allow database users KING and QUEEN to see only the rows pertaining to a certain department from USER1.EMP. We can create the following (very simple) policy function in which we establish what are the departments these users have access to :

create or replace function pol_emp(obj_owner in varchar2, obj_name in varchar2) return varchar2
   deptno number;
   predicate varchar2(200);

   predicate := '1=2';

   if SYS_CONTEXT('userenv','POLICY_INVOKER')= 'KING' then
     predicate := 'deptno='||10;
   end if;

   if SYS_CONTEXT('userenv','POLICY_INVOKER') = 'QUEEN' then
     predicate := 'deptno='||20;
   end if;

   return predicate;

We then create a policy for the access to table USER1.EMP and specify what is the policy function, how that policy should work, etc.:

object_schema => 'USER1',
object_name => 'emp',
policy_name => 'secure_emp',
policy_function => 'pol_emp',
statement_types => 'SELECT');

When user KING performs the following query:


... the VPD policy dynamically appends the statement with the WHERE clause(predicate) returned by the above function:

WHERE deptno = 10;

You can see more examples of using VPD in the following notes:

Note 67977.1   - Oracle8i Fine Grained Access Control - Worked Examples
Note 281829.1 - Evolution of Fine Grain Access Control FGAC Feature From 8i To 10g

Column level VPD

There are cases when only certain columns are sensitive and as such it is needed to restrict the access only when these columns are queried or modified. By specifying the sensitive column name with the sec_relevant_cols parameter of the DBMS_RLS.ADD_POLICY procedure, the security policy is applied(and the number of the returned rows is reduced) whenever the column is referenced. The policy creation procedure should specify the sec_relevant_cols parameter:

object_schema => 'USER1',
object_name => 'emp',
policy_name => 'secure_emp',
policy_function => 'pol_emp',
statement_types => 'SELECT',

The policy created with the above statement will not rewrite the query(will not add a predicate) if column COMM is not used explicitly or implicitly(using the * wildcard). See Note 250795.1 for more details.

Column masking VPD

This is a variant of Column level VPD which allows you to specify that the VPD policy should hide ( mask ) the sensitive data rather than removing entire rows from the result set. One can obtain this behaviour by specifying the sensitive column names with the sec_relevant_cols parameter and by setting parameter sec_relevant_cols_opt to DBMS_RLS.ALL_ROWS:

object_schema => 'USER1',
object_name => 'emp',
policy_name => 'secure_emp',
policy_function => 'pol_emp',
statement_types => 'SELECT',

When columns COMM or SAL are referenced in a query their values will be replaced with NULLs for those rows that are excluded by the predicate.

Views [ALL|DBA|USERS]_POLICIES contain the necessary information to determine what
policy function(s) will be applied when a statement is issued and the source for the policy function may be reviewed in views [ALL|DBA|USERS]_SOURCE:
SQL> select * from dba_policies where object_owner='USER1' and object_name='EMP';

---------- ---------- --------------- ---------- ------------ ---------- ------------------------------ --- --- --- --- --- --- --- --- ------------------------ ---


SQL> set linesize 200
SQL> select text from dba_source where name='POL_EMP';

function pol_emp(obj_owner in varchar2, obj_name in varchar2) return varchar2
deptno number;
predicate varchar2(200);

predicate := '1=2';

if SYS_CONTEXT('userenv','POLICY_INVOKER')= 'KING' then
predicate := 'deptno='||10;
end if;


if SYS_CONTEXT('userenv','POLICY_INVOKER') = 'QUEEN' then
predicate := 'deptno='||20;
end if;

16 rows selected.

Oracle Virtual Private Database HOWTOs

Note 67977.1   - Oracle8i Fine Grained Access Control - Worked Examples
Note 281829.1 - Evolution of Fine Grain Access Control FGAC Feature From 8i To 10g
Note 967042.1 - How to Investigate Query Performance Regressions Caused by VPD (FGAC) Predicates?
Note 250795.1 - 10g: Policy Enforced Only When the Relevant Column is Queried in Any Way
Note 281970.1 - 10g: Enhancement on STATIC_POLICY with POLICY_TYPE Behaviors in DBMS_RLS.ADD_POLICY Procedure
Note 315687.1 - 10g: What Is INDEX statement_type Used For In By DBMS_RLS Policies ?
Note 119335.1 - How To Solve the Problem of Circular Row Level Policies
Note 174799.1 - How to Bypass Fine-Grained Security Enforcement
Note 69573.1   - How to Determine Active Context (DBMS_SESSION.LIST_CONTEXT)
Note 162914.1 - How to Skip Tables when Exporting a User or an Entire Database
Note 99250.1   - Understanding Fine-Grained Access Control (DBMS_RLS) on INSERT
Note 174368.1 - Policies on Synonyms
Note 125511.1 - How to Generate Tracing when using Fine Grained Access Control
Note 155477.1 - Parameter DIRECT: Conventional Path Export Versus Direct Path Export
Note 187239.1 - Execution plan may change when you use Fine Grained Access Control (FGAC)
Note 1637312.1 Example: How to Move a RLS Policy Between Databases Using Datapump 

Oracle Virtual Private Database Troubleshooting

NOTE 386755.1 - How To Implement A VPD Policy Working With Materialized Views to Avoid ORA-30372
Note 69401.1   - How to resolve ORA-28110 or ORA-28112 on SELECT or DML
Note 100130.1 - ORA-1031 when setting Attribute via DBMS_SESSION.SET_CONTEXT
Note 331862.1 - ORA-28113 when a Policy Predicate is Fetched from a Context
Note 113970.1 - SELECT Statement Hangs when using Fine Grained Access Control
Note 168056.1 - Select on Table With Policy Defined on it Fails With ORA-28110
Note 175658.1 - RLS Policy Function Appears to Run in a New Session
Note 277606.1 - How to Prevent EXP-00079 or EXP-00080 Warning (Data in Table xxx is Protected) During Export
Note 567521.1 - ORA-28112: Select on a Table with FGAC Policy Enabled
Note 130652.1 - A policy does not work as defined, though UPDATE_CHECK is set to TRUE
Note 117058.1 - ORA-439 When Trying to Use DBMS_RLS
Note 179379.1 - Querying Against a Partitioned Table With FGAC Fails With ORA-01762
Note 158187.1 - Create Materialized View Fails With ORA-30372
Note 172423.1 - ORA-12015 when Creating Materialized View with Defined Fine Grain Access Control
Note 153978.1 - Oracle9i Export of Table with Row Level Security Aborts with ORA-1406 and EXP-0
Note 219911.1 - Fine Grained Access Control Feature Is Not Available In the Oracle Server Standard Edition
Note 250094.1 - How to Know the Exact Cause of an ORA-28113 Error After Setting a FGAC Policy
Note 278577.1 - FGAC Policy Causes Ora-00903 When Using A Function With UNION Operator And PK On Function Tables
Note 293301.1 - ORA-14136 When Exchanging Partition With a Table That Has a RLS Policy Enabled
Note 312030.1 -  DBMS_OUTPUT.PUT_LINE Fires Multiple Times From FGAC Policy Function
Note 361345.1 -  Ora-3001: "Unimplemented Feature" On Query Using "WITH" and FGAC
Note 422480.1 -  ORA-39181:Only Partial Table Data Exported Due To Fine Grain Access Control
Note 1090749.1- Dependent Objects Gets Invalidated When Policy Is Added Or Dropped
Note 782462.1  -ORA-28113 Policy Predicate Has Error Even When The VPD Function is Flawless

Note 2199556.1 ORA-28113 & ORA-00904 On A 12c Database with VPD, FGA And Extended Statistics(Virtual Columns) 

Script to Capture Role Grants (Doc ID 18079.1)

Script For Capturing Role Grants
Product Name, Product Version
Oracle Server Enterprise Edition
Versions 8.1.7 , 9.0.1 , 9.2.0, 10.1, 10.2, 11.1, 11.2
Date Created29-Oct-2002
Checked for relevance04-Jun-2013
Checked for relevance on 02-Apr-2007
Use sqlplus, connect AS SYSDBA.

PROOFREAD THIS SCRIPT BEFORE USING IT! Due to differences in the way text 
editors, e-mail packages, and operating systems handle text formatting (spaces, 
tabs, and carriage returns), this script may not be in an executable state
when you first receive it. Check over the script to ensure that errors of
this type are corrected.
The following is a script that once run will generate
another script that will include all the grant statements
for all grants made to roles within the database.  The
remarks should be reviewed carefully before running
this script.

Note:13615.1   Roles and Privileges Administration and Restrictions 
Note:180028.1  Set up a Secure Access to Application Data within a Database: DBAs, Schemas and Users

REM This script must be run by a user with the DBA role.
REM This script is intended to run with Oracle versions 7.3.X, 8.0.X, and 8.1.X.
REM Running this script will in turn create a script of all the grants
REM of roles to users and other roles.  This created script, grant_roles.sql,
REM must be run by a user with the DBA role.
REM Since role grants are not dependant on the schema that issued the grant,
REM the grant_roles.sql script will not issue the grant of a role by the
REM original grantor.  All grants will be issued by the user specified when
REM running this script.
REM NOTE:  Grants made to 'SYS' are not captured.
REM Only preliminary testing of this script was performed.  Be sure to test
REM it completely before relying on it.

set verify off
set feedback off
set termout off
set echo off;
set pagesize 0

set termout on
select 'Creating role grant script...' from dual;
set termout off

spool grant_roles.sql

select 'GRANT ' || lower(granted_role) || ' TO ' || lower(grantee) ||
       decode(admin_option,'YES',' WITH ADMIN OPTION;',';')
  from sys.dba_role_privs
  where grantee != 'SYS'
order by grantee

spool off


Sample Output
GRANT dba TO clubmom;                                                           
GRANT resource TO clubmom;                                                      
GRANT connect TO ctxsys;                                                        
GRANT resource TO ctxsys;                                                       
GRANT dba TO darcy;                                                             
GRANT delete_catalog_role TO dba WITH ADMIN OPTION;                             
GRANT execute_catalog_role TO dba WITH ADMIN OPTION;                            
GRANT exp_full_database TO dba;                                                 
GRANT imp_full_database TO dba;                                                 
GRANT java_admin TO dba;                                                        
GRANT plustrace TO dba WITH ADMIN OPTION;                                       
GRANT select_catalog_role TO dba WITH ADMIN OPTION;                             
GRANT connect TO dbsnmp;                                                        
GRANT resource TO dbsnmp;                                                       
GRANT snmpagent TO dbsnmp;                                                      
GRANT dba TO user1;                                                            
GRANT connect TO developer;    


Limitation of Liability


Database Options/Management Packs Usage Reporting for Oracle Databases 11.2 and later (Doc ID 1317265.1)

  Database Options/Management Packs Usage Report You can determine whether an option is currently in use in a database by running options_pa...