Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Cloud Exadata Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Oracle Database Backup Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Information in this document applies to any platform.

GOAL

The goal of this note is to provide steps to evaluate network bandwidth and experiments that will help tune operating system, Oracle Net or RMAN parallelism (only relevant for RMAN operations)

Scenario 1- Understand the existing network and evaluate tuning options prior to database migration, Data Guard deployment or RMAN operations for a large database.

It is critical that you have sufficient network bandwidth to support peak redo rates (steady state and when resolving gaps) along with any other network activity that shares the same network. Please note that your point-to-point network bandwidth will be throttled by the network segment, switch, router, and interface with the lowest network bandwidth. Using oratcptest as described below can help you determine if enough bandwidth is available given other resources using the shared network.

Scenario 2- Post deployment: experiencing a transport lag with the ASYNC transport

With enough network bandwidth, ASYNC transport can maintain pace with very high workloads, up to approximately 400MB/sec per instance. In cases where resources are constrained, the ASYNC transport can fall behind, resulting in a growing transport lag on the standby. A transport lag is the amount of data, measured in time that the standby has not received from the primary. Determine transport lag on the standby database by querying the V$DATAGUARD_STATS view using a query like the following:

SQL> select name,value,time_computed,datum_time from v$dataguard_stats where name=’transport lag’;
Scenario 3- Determine Maximum Network Bandwidth or Evaluate Potential RMAN Operation Throughput

RMAN can parallelize across many RMAN channels on the same node or across nodes in a RAC cluster. RMAN will be bounded by the available network bandwidth. Evaluating network bandwidth is an important prerequisite prior to a large database migration or when backup or restore rates are not meeting expectations.
Scenario 4- Post deployment: tuning transaction response time rates with SYNC transport

When SYNC redo transport is enabled, a remote write is introduced in addition to the regular local write for commit processing. This remote write, depending on network latency and remote I/O bandwidth, can increase commit processing time. Because commit processing takes longer, more sessions will wait on LGWR to finish its work and begin work on their commit requests. In other words, application concurrency has increased. Analyze database statistics and wait events to detect increased application concurrency.

NOTE: Redo Compression and Encryption are out of scope for this document however each can have an impact on transfer rates and transport lag and should be tested prior to implementation. It’s best to evaluate with and without compression and encryption and compare performance differences. The overhead is usually attributed for the additional work and time to compress or encrypt prior to sending redo and decompress and decrypt after receiving the redo.

SOLUTION

Installation and Usage of oratcptest

NOTE: While Oracle has tested the oratcptest tool to validate that it works as intended, users must test any Oracle provided utility in a lower environment before using in production in order to validate that it performs as intended in their environment.

oratcptest can be used as a general purpose tool for measuring network bandwidth and latency. However, oratcptest was designed specifically to help customers assess network resources that will be used by Data Guard redo transport, GoldenGate RMAN backup and restore, migration, Data Guard instantiation, database remote clone.

You can control the test behavior by specifying various options. For example,

Network message size
Delay time between messages
Parallel sending streams
Whether oratcptest-server should write messages on disk, or not.
Whether oratcptest-client should wait for ACK before it sends the next message, or not, thus simulating the ASYNC and SYNC transport used by Data Guard.

NOTE: This tool, like any Oracle network streaming transport, can simulate efficient network packet transfers from the source host to target host. Throughput can be 100 MB/sec or higher depending on the available network bandwidth between source and target servers and the invoked tool options. Take caution for any other critical applications sharing the same network.

Copy the JAR file attached to this MOS note onto both client and server hosts.
NOTE: oratcptest can be executed as any user. Root privileges are not required.
Verify that the host where you install oratcptest has JRE 6 or later.
% java -version
java version "1.7.0_09-icedtea"
OpenJDK Runtime Environment (rhel-2.3.4.1.0.1.el6_3-x86_64)
OpenJDK 64-Bit Server VM (build 23.2-b09, mixed mode)
Verify that the JVM can run the JAR file with this command on both hosts.
% java -jar oratcptest.jar -help
This command displays the help. An error will result if JVM cannot run the JAR file.
Start the test server on the receiving side.
% java -jar oratcptest.jar -server [IP of VIP in RAC configurations] -port=<port number>
NOTE: you can supply any available port for the server.
Run the test client. (Please change the server address and port number to match that of your server started in step 4.)
% java -jar oratcptest.jar <test.server.address.com or IP provided in the server command> -port=<port number> -duration=10s -interval=2s

The test will display output similar to the following :
        Message payload        = 1 Mbyte
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = NO
        Statistics interval    = 2 seconds
        Test duration          = 10 seconds
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)

(07:43:06) The server is ready.
                        Throughput
(07:43:08)              107.747 Mbytes/s
(07:43:10)              107.900 Mbytes/s
(07:43:12)              107.826 Mbytes/s
(07:43:14)              107.861 Mbytes/s
(07:43:16)              107.914 Mbytes/s
(07:43:16) Test finished.
               Socket send buffer = 526476 bytes
                  Avg. throughput = 107.819 Mbytes/s
To see a complete list of options issue the following command
$ java -jar oratcptest.jar -help

Determining Optimal Socket Buffer Size

NOTE: Use the value determined for socket buffer size by this process for all subsequent testing.

Socket buffer size

Using a per-connection basis can conserve memory on both the primary and standby. However, the parameter value cannot exceed the operating system receive socket buffer size. On Linux, the net.core.wmem_max (write buffer) and net.ipv4.tcp_wmem kernel parameters set the operating system receive socket buffer maximum sizes which, in turn, sets the TCP Window size for that connection. The net.core.rmem_max parameter controls the maximum setting of all protocol buffer sizes while the net.ipv4.tcp_rmem parameter is the one Oracle Net will use as it opens an IPv4 socket. The net.ipv4.tcp_rmem parameter will not overwrite the net.core.rmem_max parameter. Instructions for tuning these parameters are given below.

Bandwidth-delay product (BDP) is the product of the network link capacity of a channel (bandwidth) and its round trip time or latency. The minimum recommended value for socket buffer sizes is 3*BDP, especially for a high-latency, high-bandwidth network. Use oratcptest to tune the socket buffer sizes.

The tcp_rmem represents the TCP Window size. The TCP Window is how much data can be sent without a TCP Acknowledgement. TCP Windowing is used to make data transfer more efficient. Then there is the time when a packet from that window is dropped. Normal TCP configuration will detect this loss at the receiving side and will send a TCP Acknowledgement (ACK) packet to the sender with the sequence number of the last packet in the Window received. The sender will not respond to that ACK packet until it finishes the complete Window send. Since the sender of the ACK (the receiver of the Window) does not get the lost packet, the TCP ACK timer will go off. This will result in various duplicate ACK packets being sent. This is a normal sign of a dropped packet of a large TCP Window. Once the sender receives the ACK packet telling what packet was lost there are two ways it could respond. One would be to send the entire Window again. Since most TCP Windows are small, this would not pose a performance problem. However, with 16M+ Windows this would be a performance problem. There is a more efficient response and that is to configure the TCP stack to use Selective Acknowledgements. Some OS's have it configured as default while others do not. Also, firewalls in between the sender and receiver can disable it. SACK is negotiated during the TCP three-way handshake. If both sides support it, SACK will be used. If one side does not support it, or a firewall does not support it or blocks it, then it will not be used. Attached is a script that runs on Linux only which snoops the network either for SACK packets, or if the TCP Options show it is supported.

Here is a simple script that can be run on a Server to see if SACK packets are detected: SACK_detect.sh

Legal Notices and Terms of Use

Determine Optimal Socket Buffer Size

1. Set the maximum operating system send and receive buffer sizes to an initial size of 16MB.

On Linux, as root:
Display the current setting for default and max buffer sizes:

# cat /proc/sys/net/core/rmem_max
/proc/sys/net/core/rmem_max:4194304

# cat /proc/sys/net/ipv4/rmem
/proc/sys/net/ipv4/tcp_rmem:4096 87380 4194304

Set the max buffer sizes to 16MB:

# sysctl -w net.core.rmem_max=16777216
net.core.rmem_max = 16777216

# sysctl -w net.ipv4.tcp_rmem_max="4096 87380 16777216"
net.ipv4.tcp_rmem = 4096 87380 16777216

NOTE: The values for the net.ipv4.tcp_wmem parameter must be quoted. The first number is the minimum size when the whole system gets pressed for memory. The second number is the default size if a socket is not opened asking for a specific size buffer. The third number is the maximum amount of memory that can be allocated to the buffer. It can not exceed the net.core.wmem_max. If the Oracle Net parameter RECV_BUF_SIZE is used, it can be set up to the net.ipv4.tcp_wmem value.

NOTE: Any firewall or other pass-through server on the route between the source and target can reduce the effective socket buffer size of the endpoints. Be sure to increase the size of sockets buffers on these servers as well.

NOTE: Increasing these values increases system memory usage.

NOTE: Changes made with SYSCTL are not permanent. Update the /etc/sysctl.conf file to persist these changes through machine restarts. You will change the configuration file when the proper setting is determined.

2. Test socket buffer sizes with oratcptest.

Starting with 2MB socket buffers, test the socket buffer sizes using commands like the following

NOTE: It’s helpful to find the max network throughput with one connection using different socket buffer sizes.

Server (standby):

$ java -jar oratcptest.jar -server -port=<port number> -sockbuf=2097152

Client (primary):

$ java -jar oratcptest.jar <standby server> -port=<port number> -mode=async -duration=120 -interval=20s -sockbuf=2097152
[Requesting a test]
        Message payload = 1 Mbyte
        Payload content type = RANDOM
        Delay between messages = NO
        Number of connections = 1
        Socket send buffer = 2 Mbytes
        Transport mode = ASYNC
        Disk write = NO
        Statistics interval = 20 seconds
        Test duration = 2 minutes
        Test frequency = NO
        Network Timeout = NO
        (1 Mbyte = 1024x1024 bytes)
(11:39:16) The server is ready.
                Throughput
(11:39:36) 71.322 Mbytes/s
(11:39:56) 71.376 Mbytes/s
(11:40:16) 72.104 Mbytes/s
(11:40:36) 79.332 Mbytes/s
(11:40:56) 76.426 Mbytes/s
(11:41:16) 68.713 Mbytes/s
(11:41:16) Test finished.
          Socket send buffer = 2097152
             Avg. throughput = 73.209 Mbytes/s

Now that you have a baseline with a 2MB buffer, increase the socket buffer size to 4MB to assess any gain in throughput with a larger buffer size.
Server (standby):

$ java -jar oratcptest.jar -server -port=<port number> -sockbuf= 4194305

Client (primary):

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 4194305
[Requesting a test]
        Message payload = 1 Mbyte
        Payload content type = RANDOM
        Delay between messages = NO
        Number of connections = 1
        Socket send buffer = 4194303 bytes
        Transport mode = ASYNC
        Disk write = NO
        Statistics interval = 10 seconds
        Test duration = 1 minute
        Test frequency = NO
        Network Timeout = NO
        (1 Mbyte = 1024x1024 bytes)

(11:15:06) The server is ready.
                 Throughput
(11:15:16) 113.089 Mbytes/s
(11:15:26) 113.185 Mbytes/s
(11:15:36) 113.169 Mbytes/s
(11:15:46) 113.169 Mbytes/s
(11:15:56) 113.168 Mbytes/s
(11:16:06) 113.171 Mbytes/s
(11:16:06) Test finished.
          Socket send buffer = 4 Mbytes
             Avg. throughput = 113.149 Mbytes/s

The above example shows large improvements in throughput with a higher socket buffer size. Continue to increase the socket buffer sizes until you don’t see any more improvement. Increase the size of the operating system socket buffers before exceeding 16MB with oratcptest (see step 1).

Configuring Operating System Maximum Buffer Size Limits

When you determine the optimal size for socket buffers, make the settings permanent. On Linux, set it in /etc/sysctl.conf so that the changes persist through node reboots.
To make the changes permanent n Linux edit /etc/sysctl.conf:

# vi /etc/sysctl.conf

#net.core.rmem_max = 4194304 <--Comment out the existing value
net.core.rmem_max = 8388608 <-- Replace with new value

#net.core.wmem_max = 2097152 <--Comment out the existing value
net.core.wmem_max = 8388608 <-- Replace with new value

Scenario 1 - Understand the existing network and evaluate tuning options prior to database migration, Data Guard deployment or RMAN operations for a large database

NOTE: This process will put load on the network between the source and target.

Determine existing bandwidth between source and target using parallelism

RMAN for backup/recovery or Data Guard instantiation and database migration strategies utilize parallelism to maximize the transfer rate between the source and the target. Oratcptest has the ability to measure aggregate bandwidth of multiple connections using the num_conn parameter.

NOTE: Data Guard Redo transport is a single shipping process per instance and will be covered in a later section.

Initially determine the bandwidth from a single node of the source to a single node of the target. Then repeating that processes with multiple nodes of RAC systems using concurrent oratcptest commands where necessary.

Determine Bandwidth for a Single Node

1. Start on an oratcptest listener on the target:

$ java -jar oratcptest.jar -server [<IP ADDRESS or VIP in the case of Clusterware>] -port=<port number>

Note: In Exadata, the admin network through which a user connects may have limited bandwidth compared to the client network or VIP on which the listener runs. Be sure to set the IP address in these cases else the listener will be placed on the admin network. Running lsnrctl stat will tell you the IP of the local listener

2. Execute an oratcptest from the source using two (2) connections

java -jar oratcptest.jar <target IP address> -port=<port number> -duration=60s -interval=10s -mode=async [-output=<results file>] -num_conn=2
[Requesting a test]
        Message payload = 1 Mbyte
        Payload content type = RANDOM
        Delay between messages = NO
        Number of connections = 2
        Socket send buffer = (system default)
        Transport mode = ASYNC
        Disk write = NO
        Statistics interval = 10 seconds
        Test duration = 1 minute
        Test frequency = NO
        Network Timeout = NO
        (1 Mbyte = 1024x1024 bytes)
(21:07:36) The server is ready.
                 Throughput
(21:07:46) 274.679 Mbytes/s
(21:07:56) 275.705 Mbytes/s
(21:08:06) 285.120 Mbytes/s
(21:08:16) 309.288 Mbytes/s
(21:08:26) 257.822 Mbytes/s
(21:08:36) 278.173 Mbytes/s
(21:08:36) Test finished.
          Avg. socket send buffer = 2097152
        Avg. aggregate throughput = 280.104 Mbytes/s

3. Repeat step 2 increasing the num_conn parameter by two (2) until aggregate throughput no longer increases for a 3 consecutive runs. For instance if 12, 14 and 16 connections achieve approximately the same aggregate throughput, stop.
4. Reverse the roles of the source and target nodes and repeat steps 1-3.

Determine Concurrent Bandwidth of a Cluster

Note: Skip this step if the source and target are not Oracle RAC or some other cluster.

In order to determine the full bandwidth across all servers in a RAC cluster, repeat the process above using all nodes concurrently and sum the outputs of all processes. e.g node 1 of source to node 1 of target, node 2 of source to node 2 of target, etc.

Scenario 2- Post Data Guard deployment: experiencing a transport lag with ASYNC transport

Configuring Redo Transport for Optimal Network Performance

The factors that influence redo transport performance differ for ASYNC and SYNC transport because the protocols for ASYNC and SYNC differ. The one critical variable that must be known before any tuning can begin is the required network bandwidth.

Determine Required Network Bandwidth

To determine the required network bandwidth for a given Data Guard configuration, you must determine the primary database redo generation. You could use the Automatic Workload Repository (AWR) tool to find the redo generation; however, AWR snapshots are often taken in 30 minute to 60 minute intervals, which can dilute the peak redo rate that often occurs for shorter periods of time. For a more accurate picture of peak redo rates during a period of time, compile the redo rates for each redo log using the following query:

SQL> SELECT THREAD#, SEQUENCE#,
        BLOCKS*BLOCK_SIZE/1024/1024 MB,
       (NEXT_TIME-FIRST_TIME)*86400 SEC,
       (BLOCKS*BLOCK_SIZE/1024/1024)/((NEXT_TIME-FIRST_TIME)*86400) "MB/S"
   FROM V$ARCHIVED_LOG
WHERE ((NEXT_TIME-FIRST_TIME)*86400<>0)
    AND FIRST_TIME BETWEEN TO_DATE('2015/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS')
    AND TO_DATE('2015/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS')
    AND DEST_ID=2
  ORDER BY FIRST_TIME;

You should see output like the following:

   THREAD#  SEQUENCE#         MB        SEC       MB/s 
---------- ---------- ---------- ---------- ---------- 
         2       2291 29366.1963        831  35.338383 
         1       2565 29365.6553        781 37.6000708 
         2       2292 29359.3403        537  54.672887 
         1       2566 29407.8296        813 36.1719921 
         2       2293 29389.7012        678 43.3476418 
         2       2294 29325.2217       1236 23.7259075 
         1       2567 11407.3379       2658 4.29169973 
         2       2295 29452.4648        477 61.7452093 
         2       2296 29359.4458        954 30.7751004 
         2       2297 29311.3638        586 50.0193921 
         1       2568 3867.44092       5510 .701894903 

The query output above indicates that you must accommodate for a peak redo of just under 62 MB/sec/node. In general, we recommend adding an additional 30% on top of the peak rate to account for spikes. In this case that results in a peak redo rate of about 80 MB/sec/node.

NOTE: To find the peak redo rate, choose times during the highest level of processing, such as End of Quarter, End of Year, etc.

NOTE: Frequent log switches every 5 minutes or less can induce a standby apply lag, due to log switch overhead on the standby. Ensure that the logs are sized appropriately so that switches do not occur more than every 5 minutes during peak rates.

With the required bandwidth for the application understood, you can attempt to tune transport appropriately.

Assessing Network Performance for Data Guard Asynchronous (ASYNC) Redo Transport

Use oractptest to assess whether you have the required bandwidth to keep up with peak redo rates for ASYNC transport. When tuning ASYNC transport:

Determine available network bandwidth for ASYNC transport
Determine optimal socket buffer sizes
Configure Oracle Net to use the optimal socket buffer of the operating system and Oracle Net

NOTE: Always tune to a goal. Once that goal is reached, further tuning is unnecessary.

Determine Available Network Bandwidth for ASYNC Transport

Once you determine the peak redo rate you can measure the network bandwidth with the oratcptest tool to see if it can sustain that peak rate. For example, on one of the standby hosts, start the oratcptest server process:

% java -jar oratcptest.jar -server -port=<port number>

On one of the primary hosts, use oratcptest to connect to the server process on the standby host and transfer a fixed amount of data. For example, if the Data Guard transport will be using ASYNC, run a command similar to the following to determine the maximum throughput the network can provide:

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=120s -interval=20s
[Requesting a test]
        Message payload        = 1 Mbyte
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = ASYNC
        Disk write             = NO
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes) 
(11:09:47) The server is ready.
                        Throughput           
(11:10:07)        113.125 Mbytes/s
(11:10:27)        113.170 Mbytes/s
(11:10:47)        113.164 Mbytes/s
(11:11:07)        113.165 Mbytes/s
(11:11:27)        113.165 Mbytes/s
(11:11:47)        113.165 Mbytes/s
(11:11:47) Test finished.
               Socket send buffer = 2019328
                  Avg. throughput = 113.155 Mbytes/s

The above example shows that the average network throughput of 113MB/sec is enough to keep up with the database peak redo rate of 80MB/s.

NOTE: If throughput exceeds the peak rate of the application, there is no need to continue tuning.

Configuring Socket Buffer Sizes Used by Data Guard Asynchronous Redo Transport

Optimal system socket buffer sizes were previously determined and set. For Oracle Net to use the optimal socket buffer size, the parameters RECV_BUF_SIZE and SEND_BUF_SIZE must be set accordingly, in addition to the operating system parameters you set in the previous step. You can set the parameters for all connections in the sqlnet.ora file on each side of the configuration (primary and standby).

RECV_BUF_SIZE=8388608
SEND_BUF_SIZE=8388608

However, to minimize memory usage by processes other than Data Guard redo transport, we recommend that you isolate the increased buffer usage to the Oracle Net alias used for redo transport as well as the server-side listener.

The following example shows the send and receive socket buffer size set as a description attribute for a particular connect descriptor in tnsnames.ora:

<standby TNS name> =
 (DESCRIPTION=
   (SEND_BUF_SIZE=8388608)
   (RECV_BUF_SIZE=8388608)
     (ADDRESS=(PROTOCOL=tcp)
     (HOST=<stby_host>)(PORT=<PORT>))
   (CONNECT_DATA=
     (SERVICE_NAME=<standby service name>)))

These setting should be applied to both primary and standby tns descriptors

The socket buffer size parameters must be configured with the same values for all of the databases in a Data Guard configuration. On the standby side, or the receiving side, you can set them in either the sqlnet.ora or listener.ora file. In the listener.ora file, you can either specify the buffer space parameters for a particular protocol address, or for a description. Below is an example of a description. After making these changes, reload the listener.

<listener name> =
  (DESCRIPTION=
    (ADDRESS=(PROTOCOL=tcp)
    (HOST=<stby_host>)(PORT=<PORT>)
    (SEND_BUF_SIZE=8388608)
    (RECV_BUF_SIZE=8388608)))

In order to achieve the same throughput when the database roles are reversed following a switchover, configure the same settings for the opposite direction. That is, set the Oracle Net alias that the standby database uses to ship redo when it is the primary as well as the primary listener.ora.

Scenario 3: Determine Maximum Network Bandwidth or Evaluate Potential RMAN Operation Throughput

A common point of confusion is that ASYNC redo transport should be able to send up to the maximum network bandwidth. However, redo transport itself uses one process per destination. On the other hand, RMAN operations like backup, restore, duplicate or recover from service commands can leverage many processes (e.g. RMAN channels on one node or across many RAC nodes). Determining the maximum network bandwidth available and the level of parallelism to achieve that can be helpful in understanding your network bandwidth and in tuning RMAN commands. To achieve this you can leverage the num_conn oratcptest parameter or issue the oratcptest command concurrently across RAC nodes.

For example, after determining that 8MB buffer sizes provide optimal network send throughput with num_conn=1 (default), repeat with a different number of connections (parallelism) until no additional throughput is realized.

Parallelism=1

java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=1

Parallelism=2

java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=2

…

Parallelism=n

java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n

Now repeat concurrently across all RAC nodes.

For example, 2 RAC nodes at N parallelism per node running concurrent commands:

HostA:

java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n

HostB:

java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n

Note: RMAN compression and encryption may impact overall throughput.

Scenario 4- Post deployment: tuning transaction response time rates with SYNC transport

Assessing Network Performance for Data Guard Synchronous Redo Transport

Synchronous redo transport requires that a primary database transaction wait for confirmation from the standby that redo has been received and written to disk (a standby redo log file) before commit success is signaled to the application. Network latency is the single largest inhibitor in SYNC transport, and it must be consistently low to ensure minimal response time and throughput impact for OLTP applications. Due to the impact of latency, the options for tuning transport are limited, so focus on assessing the feasibility of SYNC transport for a given network link.
When assessing the performance of SYNC transport between primary and standby systems:

Set Oracle Net SDU for SYNC transport
Determine redo write size that LGWR submits to the network
Use oratcptest to assess SYNC transport performance
Implement FASTSYNC
Increase socket buffer size

Setting Oracle Net SDU for SYNC Transport

Oracle Net encapsulates data into buffers the size of the session data unit (SDU) before sending the data across the network. Adjusting the size of the SDU buffers can improve performance, network utilization, and memory consumption. With Oracle Net Services you can influence data transfer by adjusting the Oracle Net session data unit (SDU) size. Oracle internal testing has shown that setting the SDU to 65535 improves SYNC transport performance.

Setting SDU for Oracle RAC

SDU cannot be set in the TCP endpoint for SCAN/Node listeners, but SDU can be changed using the global parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file.
Set DEFAULT_SDU_SIZE in the RDBMS home sqlnet.ora file. (Not GRID home.)

DEFAULT_SDU_SIZE=2097152

Setting SDU for Non-Oracle RAC

You can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA).

tnsnames.ora

<net_service_name>=
(DESCRIPTION=
(SDU=65535)
(ADDRESS_LIST=
(ADDRESS=(PROTOCOL=tcp)(HOST=<hostname>)(PORT=<PORT>))
(CONNECT_DATA=
(SERVICE_NAME=<service name>))

listener.ora

<listener name> =
(DESCRIPTION=
(SDU=65535)
(ADDRESS=(PROTOCOL=tcp)
(HOST=<hostname>)(PORT=<PORT>)
)) 

NOTE: ASYNC transport uses a streaming protocol, and increasing the SDU size from the default has no performance benefit.

NOTE: When SDU size of the client and server differ, the lower value of the two values is used.

Use oratcptest to Assess SYNC Transport Performance

Using oratcptest, SYNC writes can be simulated over the network in order to determine bandwidth and latency. In order to do this accurately, the average redo write size is needed.

Determine Redo Write Size

The log writer (LGWR) redo write size translates to the packet size written to the network. You can determine the average redo write size using the metrics total redo size and total redo writes from an AWR report taken during peak redo rate.

total redo size	3,418,477,080
total redo writes	426,201

In this example the average redo write size is about 8k
(redo size / redo writes) = 8,020 or 8k.

The redo write size varies depending on workload and commit time. As the time to commit increases, the amount of redo waiting for the next write increases, thus increasing the next write size. Because SYNC transport increases the time to commit, you can expect the redo write size to increase as well. The degree to which the size increases depends on the latency between the primary and standby. Therefore, metrics taken from an ASYNC configuration are a starting point, and this process should be repeated once SYNC is enabled for a period of time.

Run tests with oratcptest

In addition to providing the average write size, you can also specify that the oratcptest server process write network message to the same disk location where the standby redo logs will be placed.

NOTE: ASM is currently not supported for the write location

Given that the average redo write size in the example is 8k, and if the standby redo logs will be placed on /u01/oraredo, the server command to issue would be:

$ java -jar oratcptest.jar -server -port=<port number> -file=/u01/oraredo/oratcp.tmp

On the sending side, issue the following client command to send over 8k messages with SYNC writes:

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k -write
[Requesting a test]
        Message payload        = 8 kbytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = YES
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)
 
(14:43:03) The server is ready.
                        Throughput            Latency (including disk-write)
(14:43:23)          5.662 Mbytes/s           1.382 ms
(14:43:43)          5.618 Mbytes/s           1.393 ms
(14:44:03)          5.656 Mbytes/s           1.383 ms
(14:44:23)          5.644 Mbytes/s           1.386 ms
(14:44:43)          5.680 Mbytes/s           1.377 ms
(14:45:03)          5.637 Mbytes/s           1.388 ms
(14:45:03) Test finished.
               Socket send buffer = 166400
                  Avg. throughput = 5.649 Mbytes/s
                     Avg. latency = 1.385 ms (including disk-write at server)

The lower throughput is a result of the latency of the network round-trip and the write to disk. The round-trip is a necessity with SYNC transport but the write to disk can be addressed with the following section.

NOTE: Sync transport with higher round trip latency (> 5ms) can impact application response time and throughput for OLTP applications significantly. In the same environment with batch jobs or DML operations, overall elapsed time may not be impacted as much if sufficient network bandwidth is available.

Implement FASTSYNC

As of Oracle 12c, Data Guard FASTSYNC can improve round trip time of a sync remote write by acknowledging the write when written to memory, instead of waiting for the write to disk to complete. Whether you see a benefit with FASTSYNC depends on the speed of the disk at the standby database. Enable FASTSYNC in the log_archive_dest_n parameter by setting Data Guard Broker property LogXptMode=FASTSYNC or by setting SYNC NOAFFIRM directly in the log_archive_dest_n parameter when Broker is not used.

DGMGRL> edit database standby set property LogXptMode='FASTSYNC';

SQL> alter system set log_archive_dest_2= ‘service=<standby net service name> SYNC NOAFFIRM db_unique_name=<standby unique name> net_timeout=8 valid_for=(online_logfile,all_roles)’

Test the benefits of FASTSYNC in oratcptest by running SYNC mode without the -write option.

Server(standby):

$ java -jar oratcptest.jar -server -port=<port number>

Client(primary):

$ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k
[Requesting a test]
        Message payload        = 8 kbytes
        Payload content type   = RANDOM
        Delay between messages = NO
        Number of connections  = 1
        Socket send buffer     = (system default)
        Transport mode         = SYNC
        Disk write             = NO
        Statistics interval    = 20 seconds
        Test duration          = 2 minutes
        Test frequency         = NO
        Network Timeout        = NO
        (1 Mbyte = 1024x1024 bytes)
 
(14:40:19) The server is ready.
                        Throughput            Latency
(14:40:39)         25.145 Mbytes/s           0.311 ms
(14:40:59)         24.893 Mbytes/s           0.314 ms
(14:41:19)         25.380 Mbytes/s           0.308 ms
(14:41:39)         25.101 Mbytes/s           0.312 ms
(14:41:59)         24.757 Mbytes/s           0.316 ms
(14:42:19)         25.136 Mbytes/s           0.311 ms
(14:42:19) Test finished.
               Socket send buffer = 166400
                  Avg. throughput = 25.068 Mbytes/s
                     Avg. latency = 0.312 ms

NOTE: As the redo write size increases, the throughput and latency increase. Therefore, it is important to repeat these tests with actual redo write size from metrics collected during sync redo transport.

Increase Socket Buffer Size

Socket buffers do not have the same impact on SYNC transport as they do for ASYNC; however, increased buffer sizes can help resolve gaps in redo following a standby database outage. Using the previously determined socket buffer size is recommended but a setting of 3*Bandwidth Delay Product (BDP) can be used as well.

For example, if asynchronous bandwidth is 622 Mbits and latency is 30 ms

BDP = 622,000,000 (bandwidth) / 8 x 0.030 (latency) = 2,332,500 bytes

3 x BDP = 6,997,500 bytes

Set the Linux kernel parameters net.core.rmem_max and net.core.wmem_max to this value as described above in 'Configuring Operating System Maximum Buffer Size Limits'

Command Options

When ASYNC is chosen, only the bandwidth is measured. Using the default message length (1MB) should suffice for a bandwidth calculation. The bandwidth is measured from an application point of view as it is calculated from the beginning of the message send to the start of the next message send. Using the message size and this time interval, bandwidth is calculated. The average bandwidth is the average measurement of all measurements made during the Statistics interval.

With oratcptest the latency is calculated when the SYNC option is selected. This calculation is based on the time interval from the start of a message send from the client to the application acknowledgement the client gets from the server. The statistic interval period is used to calculate the average latency from each sent and acknowledged message. This is application latency and includes the lower network protocol latency's. More than one message send occurs during the statistics interval and oratcptest tracks the time interval between all message sends and the acknowledged. If the -file and -write parameters are used, the latency includes the server's write to disk. Because oratcptest uses the interval between the start of the message write and the receipt of the acknowledgement message, latency normally increases as the size of the message increases.

The following table lists the available server options

Options	Description
-port	Listening TCP port number. Must be specified.
-file=<name>	File name for disk-write test. Default value is oratcp.tmp.
-sockbuf=<bytes>	Server socket receive buffer size. Default value is zero, which means system default receive buffer size.
-help	Displays help

The following table lists the available client options

Options	Description
-port	Listening TCP port number. Must be specified.
-write	server writes network message to disk before server replies with ACK.
-mode=[SYNC\|ASYNC]	In SYNC mode, client waits for server's ACK before it sends next message. In ASYNC mode, it doesn't wait. Default value is SYNC
-num_conn=<number>	Number of TCP connections. Default value is 1.
-sockbuf=<bytes>	Client socket receive buffer size. Default value is zero, which means system default receive buffer size.
-length=<bytes>	Message payload length. Default value is 1 Mbyte.
-delay=<milliseconds>	Delay in milliseconds between network messages. Default value is zero, which means no delay between messages.
-rtt	Round-trip-time measurement mode. Equivalent to -mode=SYNC and -length=0.
-random_length	random payload length uniformly distributed between 512 bytes and -length option value.
-random_delay	random delay uniformly distributed between zero and -delay option value.
-payload=[RANDOM\|ZERO\|<filename>]	payload content type among random data, all zeroes, or the contents of a user-specified file. Default value is RANDOM.
-interval=<time>	Statistics reporting interval. Default value is 10 seconds.
-duration=<time> or <bytes>	Test duration in time or bytes. If not specified, test does not terminate.
-freq=<time>/<time>	Test repeat frequency. For example, -freq=1h/24h means the test will repeat every 1 hour for 24 hours.
-timeout=<time>	Network timeout. Default value is zero, which means no timeout.
-output=<name>	Output file name where client stores test result statistics.
-help	display help