APPLIES TO:Oracle Cloud Infrastructure - Database Service - Version N/A and laterOracle Database Cloud Exadata Service - Version N/A and later Oracle Database Cloud Schema Service - Version N/A and later Oracle Database Backup Service - Version N/A and later Oracle Database Exadata Express Cloud Service - Version N/A and later Information in this document applies to any platform. GOALThe goal of this note is to provide steps to evaluate network bandwidth and experiments that will help tune operating system, Oracle Net or RMAN parallelism (only relevant for RMAN operations)
NOTE: Redo Compression and Encryption are out of scope for this document however each can have an impact on transfer rates and transport lag and should be tested prior to implementation. It’s best to evaluate with and without compression and encryption and compare performance differences. The overhead is usually attributed for the additional work and time to compress or encrypt prior to sending redo and decompress and decrypt after receiving the redo.
SOLUTIONInstallation and Usage of oratcptestNOTE: While Oracle has tested the oratcptest tool to validate that it works as intended, users must test any Oracle provided utility in a lower environment before using in production in order to validate that it performs as intended in their environment. oratcptest can be used as a general purpose tool for measuring network bandwidth and latency. However, oratcptest was designed specifically to help customers assess network resources that will be used by Data Guard redo transport, GoldenGate RMAN backup and restore, migration, Data Guard instantiation, database remote clone. You can control the test behavior by specifying various options. For example,
NOTE: This tool, like any Oracle network streaming transport, can simulate efficient network packet transfers from the source host to target host. Throughput can be 100 MB/sec or higher depending on the available network bandwidth between source and target servers and the invoked tool options. Take caution for any other critical applications sharing the same network.
Determining Optimal Socket Buffer SizeNOTE: Use the value determined for socket buffer size by this process for all subsequent testing. Socket buffer sizeUsing a per-connection basis can conserve memory on both the primary and standby. However, the parameter value cannot exceed the operating system receive socket buffer size. On Linux, the net.core.wmem_max (write buffer) and net.ipv4.tcp_wmem kernel parameters set the operating system receive socket buffer maximum sizes which, in turn, sets the TCP Window size for that connection. The net.core.rmem_max parameter controls the maximum setting of all protocol buffer sizes while the net.ipv4.tcp_rmem parameter is the one Oracle Net will use as it opens an IPv4 socket. The net.ipv4.tcp_rmem parameter will not overwrite the net.core.rmem_max parameter. Instructions for tuning these parameters are given below. Bandwidth-delay product (BDP) is the product of the network link capacity of a channel (bandwidth) and its round trip time or latency. The minimum recommended value for socket buffer sizes is 3*BDP, especially for a high-latency, high-bandwidth network. Use oratcptest to tune the socket buffer sizes. The tcp_rmem represents the TCP Window size. The TCP Window is how much data can be sent without a TCP Acknowledgement. TCP Windowing is used to make data transfer more efficient. Then there is the time when a packet from that window is dropped. Normal TCP configuration will detect this loss at the receiving side and will send a TCP Acknowledgement (ACK) packet to the sender with the sequence number of the last packet in the Window received. The sender will not respond to that ACK packet until it finishes the complete Window send. Since the sender of the ACK (the receiver of the Window) does not get the lost packet, the TCP ACK timer will go off. This will result in various duplicate ACK packets being sent. This is a normal sign of a dropped packet of a large TCP Window. Once the sender receives the ACK packet telling what packet was lost there are two ways it could respond. One would be to send the entire Window again. Since most TCP Windows are small, this would not pose a performance problem. However, with 16M+ Windows this would be a performance problem. There is a more efficient response and that is to configure the TCP stack to use Selective Acknowledgements. Some OS's have it configured as default while others do not. Also, firewalls in between the sender and receiver can disable it. SACK is negotiated during the TCP three-way handshake. If both sides support it, SACK will be used. If one side does not support it, or a firewall does not support it or blocks it, then it will not be used. Attached is a script that runs on Linux only which snoops the network either for SACK packets, or if the TCP Options show it is supported. Here is a simple script that can be run on a Server to see if SACK packets are detected: SACK_detect.sh Determine Optimal Socket Buffer Size1. Set the maximum operating system send and receive buffer sizes to an initial size of 16MB. On Linux, as root: # cat /proc/sys/net/core/rmem_max /proc/sys/net/core/rmem_max:4194304 # cat /proc/sys/net/ipv4/rmem /proc/sys/net/ipv4/tcp_rmem:4096 87380 4194304 Set the max buffer sizes to 16MB: # sysctl -w net.core.rmem_max=16777216 net.core.rmem_max = 16777216 # sysctl -w net.ipv4.tcp_rmem_max="4096 87380 16777216" net.ipv4.tcp_rmem = 4096 87380 16777216
NOTE: The values for the net.ipv4.tcp_wmem parameter must be quoted. The first number is the minimum size when the whole system gets pressed for memory. The second number is the default size if a socket is not opened asking for a specific size buffer. The third number is the maximum amount of memory that can be allocated to the buffer. It can not exceed the net.core.wmem_max. If the Oracle Net parameter RECV_BUF_SIZE is used, it can be set up to the net.ipv4.tcp_wmem value. NOTE: Any firewall or other pass-through server on the route between the source and target can reduce the effective socket buffer size of the endpoints. Be sure to increase the size of sockets buffers on these servers as well. NOTE: Increasing these values increases system memory usage. NOTE: Changes made with SYSCTL are not permanent. Update the /etc/sysctl.conf file to persist these changes through machine restarts. You will change the configuration file when the proper setting is determined. 2. Test socket buffer sizes with oratcptest. Starting with 2MB socket buffers, test the socket buffer sizes using commands like the following NOTE: It’s helpful to find the max network throughput with one connection using different socket buffer sizes. $ java -jar oratcptest.jar -server -port=<port number> -sockbuf=2097152 $ java -jar oratcptest.jar <standby server> -port=<port number> -mode=async -duration=120 -interval=20s -sockbuf=2097152 [Requesting a test] Message payload = 1 Mbyte Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = 2 Mbytes Transport mode = ASYNC Disk write = NO Statistics interval = 20 seconds Test duration = 2 minutes Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (11:39:16) The server is ready. Throughput (11:39:36) 71.322 Mbytes/s (11:39:56) 71.376 Mbytes/s (11:40:16) 72.104 Mbytes/s (11:40:36) 79.332 Mbytes/s (11:40:56) 76.426 Mbytes/s (11:41:16) 68.713 Mbytes/s (11:41:16) Test finished. Socket send buffer = 2097152 Avg. throughput = 73.209 Mbytes/s
Now that you have a baseline with a 2MB buffer, increase the socket buffer size to 4MB to assess any gain in throughput with a larger buffer size. $ java -jar oratcptest.jar -server -port=<port number> -sockbuf= 4194305 Client (primary): $ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 4194305 [Requesting a test] Message payload = 1 Mbyte Payload content type = RANDOM Delay between messages = NO Number of connections = 1 Socket send buffer = 4194303 bytes Transport mode = ASYNC Disk write = NO Statistics interval = 10 seconds Test duration = 1 minute Test frequency = NO Network Timeout = NO (1 Mbyte = 1024x1024 bytes) (11:15:06) The server is ready. Throughput (11:15:16) 113.089 Mbytes/s (11:15:26) 113.185 Mbytes/s (11:15:36) 113.169 Mbytes/s (11:15:46) 113.169 Mbytes/s (11:15:56) 113.168 Mbytes/s (11:16:06) 113.171 Mbytes/s (11:16:06) Test finished. Socket send buffer = 4 Mbytes Avg. throughput = 113.149 Mbytes/s The above example shows large improvements in throughput with a higher socket buffer size. Continue to increase the socket buffer sizes until you don’t see any more improvement. Increase the size of the operating system socket buffers before exceeding 16MB with oratcptest (see step 1). Configuring Operating System Maximum Buffer Size LimitsWhen you determine the optimal size for socket buffers, make the settings permanent. On Linux, set it in /etc/sysctl.conf so that the changes persist through node reboots. # vi /etc/sysctl.conf #net.core.rmem_max = 4194304 <--Comment out the existing value net.core.rmem_max = 8388608 <-- Replace with new value #net.core.wmem_max = 2097152 <--Comment out the existing value net.core.wmem_max = 8388608 <-- Replace with new value
Scenario 1 - Understand the existing network and evaluate tuning options prior to database migration, Data Guard deployment or RMAN operations for a large databaseNOTE: This process will put load on the network between the source and target. Determine existing bandwidth between source and target using parallelismRMAN for backup/recovery or Data Guard instantiation and database migration strategies utilize parallelism to maximize the transfer rate between the source and the target. Oratcptest has the ability to measure aggregate bandwidth of multiple connections using the num_conn parameter. NOTE: Data Guard Redo transport is a single shipping process per instance and will be covered in a later section. Initially determine the bandwidth from a single node of the source to a single node of the target. Then repeating that processes with multiple nodes of RAC systems using concurrent oratcptest commands where necessary. Determine Bandwidth for a Single Node1. Start on an oratcptest listener on the target: $ java -jar oratcptest.jar -server [<IP ADDRESS or VIP in the case of Clusterware>] -port=<port number>
Note: In Exadata, the admin network through which a user connects may have limited bandwidth compared to the client network or VIP on which the listener runs. Be sure to set the IP address in these cases else the listener will be placed on the admin network. Running lsnrctl stat will tell you the IP of the local listener 2. Execute an oratcptest from the source using two (2) connections java -jar oratcptest.jar <target IP address> -port=<port number> -duration=60s -interval=10s -mode=async [-output=<results file>] -num_conn=2 (21:07:36) The server is ready. 3. Repeat step 2 increasing the num_conn parameter by two (2) until aggregate throughput no longer increases for a 3 consecutive runs. For instance if 12, 14 and 16 connections achieve approximately the same aggregate throughput, stop. Determine Concurrent Bandwidth of a ClusterNote: Skip this step if the source and target are not Oracle RAC or some other cluster. In order to determine the full bandwidth across all servers in a RAC cluster, repeat the process above using all nodes concurrently and sum the outputs of all processes. e.g node 1 of source to node 1 of target, node 2 of source to node 2 of target, etc. Scenario 2- Post Data Guard deployment: experiencing a transport lag with ASYNC transportConfiguring Redo Transport for Optimal Network PerformanceThe factors that influence redo transport performance differ for ASYNC and SYNC transport because the protocols for ASYNC and SYNC differ. The one critical variable that must be known before any tuning can begin is the required network bandwidth. Determine Required Network BandwidthTo determine the required network bandwidth for a given Data Guard configuration, you must determine the primary database redo generation. You could use the Automatic Workload Repository (AWR) tool to find the redo generation; however, AWR snapshots are often taken in 30 minute to 60 minute intervals, which can dilute the peak redo rate that often occurs for shorter periods of time. For a more accurate picture of peak redo rates during a period of time, compile the redo rates for each redo log using the following query: SQL> SELECT THREAD#, SEQUENCE#, BLOCKS*BLOCK_SIZE/1024/1024 MB, (NEXT_TIME-FIRST_TIME)*86400 SEC, (BLOCKS*BLOCK_SIZE/1024/1024)/((NEXT_TIME-FIRST_TIME)*86400) "MB/S" FROM V$ARCHIVED_LOG WHERE ((NEXT_TIME-FIRST_TIME)*86400<>0) AND FIRST_TIME BETWEEN TO_DATE('2015/01/15 08:00:00','YYYY/MM/DD HH24:MI:SS') AND TO_DATE('2015/01/15 11:00:00','YYYY/MM/DD HH24:MI:SS') AND DEST_ID=2 ORDER BY FIRST_TIME; You should see output like the following: THREAD# SEQUENCE# MB SEC MB/s ---------- ---------- ---------- ---------- ---------- 2 2291 29366.1963 831 35.338383 1 2565 29365.6553 781 37.6000708 2 2292 29359.3403 537 54.672887 1 2566 29407.8296 813 36.1719921 2 2293 29389.7012 678 43.3476418 2 2294 29325.2217 1236 23.7259075 1 2567 11407.3379 2658 4.29169973 2 2295 29452.4648 477 61.7452093 2 2296 29359.4458 954 30.7751004 2 2297 29311.3638 586 50.0193921 1 2568 3867.44092 5510 .701894903 The query output above indicates that you must accommodate for a peak redo of just under 62 MB/sec/node. In general, we recommend adding an additional 30% on top of the peak rate to account for spikes. In this case that results in a peak redo rate of about 80 MB/sec/node. NOTE: To find the peak redo rate, choose times during the highest level of processing, such as End of Quarter, End of Year, etc. NOTE: Frequent log switches every 5 minutes or less can induce a standby apply lag, due to log switch overhead on the standby. Ensure that the logs are sized appropriately so that switches do not occur more than every 5 minutes during peak rates. With the required bandwidth for the application understood, you can attempt to tune transport appropriately.
Assessing Network Performance for Data Guard Asynchronous (ASYNC) Redo TransportUse oractptest to assess whether you have the required bandwidth to keep up with peak redo rates for ASYNC transport. When tuning ASYNC transport:
NOTE: Always tune to a goal. Once that goal is reached, further tuning is unnecessary. Determine Available Network Bandwidth for ASYNC TransportOnce you determine the peak redo rate you can measure the network bandwidth with the oratcptest tool to see if it can sustain that peak rate. For example, on one of the standby hosts, start the oratcptest server process: % java -jar oratcptest.jar -server -port=<port number> On one of the primary hosts, use oratcptest to connect to the server process on the standby host and transfer a fixed amount of data. For example, if the Data Guard transport will be using ASYNC, run a command similar to the following to determine the maximum throughput the network can provide: $ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=120s -interval=20s (11:09:47) The server is ready. The above example shows that the average network throughput of 113MB/sec is enough to keep up with the database peak redo rate of 80MB/s. NOTE: If throughput exceeds the peak rate of the application, there is no need to continue tuning.
Configuring Socket Buffer Sizes Used by Data Guard Asynchronous Redo TransportOptimal system socket buffer sizes were previously determined and set. For Oracle Net to use the optimal socket buffer size, the parameters RECV_BUF_SIZE and SEND_BUF_SIZE must be set accordingly, in addition to the operating system parameters you set in the previous step. You can set the parameters for all connections in the sqlnet.ora file on each side of the configuration (primary and standby). RECV_BUF_SIZE=8388608 SEND_BUF_SIZE=8388608 However, to minimize memory usage by processes other than Data Guard redo transport, we recommend that you isolate the increased buffer usage to the Oracle Net alias used for redo transport as well as the server-side listener. The following example shows the send and receive socket buffer size set as a description attribute for a particular connect descriptor in tnsnames.ora: <standby TNS name> = (DESCRIPTION= (SEND_BUF_SIZE=8388608) (RECV_BUF_SIZE=8388608) (ADDRESS=(PROTOCOL=tcp) (HOST=<stby_host>)(PORT=<PORT>)) (CONNECT_DATA= (SERVICE_NAME=<standby service name>))) These setting should be applied to both primary and standby tns descriptors The socket buffer size parameters must be configured with the same values for all of the databases in a Data Guard configuration. On the standby side, or the receiving side, you can set them in either the sqlnet.ora or listener.ora file. In the listener.ora file, you can either specify the buffer space parameters for a particular protocol address, or for a description. Below is an example of a description. After making these changes, reload the listener. <listener name> = In order to achieve the same throughput when the database roles are reversed following a switchover, configure the same settings for the opposite direction. That is, set the Oracle Net alias that the standby database uses to ship redo when it is the primary as well as the primary listener.ora.
Scenario 3: Determine Maximum Network Bandwidth or Evaluate Potential RMAN Operation ThroughputA common point of confusion is that ASYNC redo transport should be able to send up to the maximum network bandwidth. However, redo transport itself uses one process per destination. On the other hand, RMAN operations like backup, restore, duplicate or recover from service commands can leverage many processes (e.g. RMAN channels on one node or across many RAC nodes). Determining the maximum network bandwidth available and the level of parallelism to achieve that can be helpful in understanding your network bandwidth and in tuning RMAN commands. To achieve this you can leverage the num_conn oratcptest parameter or issue the oratcptest command concurrently across RAC nodes.
For example, after determining that 8MB buffer sizes provide optimal network send throughput with num_conn=1 (default), repeat with a different number of connections (parallelism) until no additional throughput is realized. Parallelism=1 java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=1 Parallelism=2 java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=2 … Parallelism=n java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n
Now repeat concurrently across all RAC nodes. For example, 2 RAC nodes at N parallelism per node running concurrent commands: HostA: java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n HostB: java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=async -duration=60s -sockbuf= 8388608 -num_conn=n
Note: RMAN compression and encryption may impact overall throughput. Scenario 4- Post deployment: tuning transaction response time rates with SYNC transportAssessing Network Performance for Data Guard Synchronous Redo TransportSynchronous redo transport requires that a primary database transaction wait for confirmation from the standby that redo has been received and written to disk (a standby redo log file) before commit success is signaled to the application. Network latency is the single largest inhibitor in SYNC transport, and it must be consistently low to ensure minimal response time and throughput impact for OLTP applications. Due to the impact of latency, the options for tuning transport are limited, so focus on assessing the feasibility of SYNC transport for a given network link.
Setting Oracle Net SDU for SYNC TransportOracle Net encapsulates data into buffers the size of the session data unit (SDU) before sending the data across the network. Adjusting the size of the SDU buffers can improve performance, network utilization, and memory consumption. With Oracle Net Services you can influence data transfer by adjusting the Oracle Net session data unit (SDU) size. Oracle internal testing has shown that setting the SDU to 65535 improves SYNC transport performance. Setting SDU for Oracle RACSDU cannot be set in the TCP endpoint for SCAN/Node listeners, but SDU can be changed using the global parameter DEFAULT_SDU_SIZE in the SQLNET.ORA file. DEFAULT_SDU_SIZE=2097152 Setting SDU for Non-Oracle RACYou can set SDU on a per connection basis using the SDU parameter in the local naming configuration file (TNSNAMES.ORA) and the listener configuration file (LISTENER.ORA). tnsnames.ora <net_service_name>= (DESCRIPTION= (SDU=65535) (ADDRESS_LIST= (ADDRESS=(PROTOCOL=tcp)(HOST=<hostname>)(PORT=<PORT>)) (CONNECT_DATA= (SERVICE_NAME=<service name>)) listener.ora <listener name> = (DESCRIPTION= (SDU=65535) (ADDRESS=(PROTOCOL=tcp) (HOST=<hostname>)(PORT=<PORT>) ))
NOTE: ASYNC transport uses a streaming protocol, and increasing the SDU size from the default has no performance benefit. NOTE: When SDU size of the client and server differ, the lower value of the two values is used.
Use oratcptest to Assess SYNC Transport PerformanceUsing oratcptest, SYNC writes can be simulated over the network in order to determine bandwidth and latency. In order to do this accurately, the average redo write size is needed. Determine Redo Write SizeThe log writer (LGWR) redo write size translates to the packet size written to the network. You can determine the average redo write size using the metrics total redo size and total redo writes from an AWR report taken during peak redo rate.
In this example the average redo write size is about 8k The redo write size varies depending on workload and commit time. As the time to commit increases, the amount of redo waiting for the next write increases, thus increasing the next write size. Because SYNC transport increases the time to commit, you can expect the redo write size to increase as well. The degree to which the size increases depends on the latency between the primary and standby. Therefore, metrics taken from an ASYNC configuration are a starting point, and this process should be repeated once SYNC is enabled for a period of time. Run tests with oratcptestIn addition to providing the average write size, you can also specify that the oratcptest server process write network message to the same disk location where the standby redo logs will be placed. NOTE: ASM is currently not supported for the write location Given that the average redo write size in the example is 8k, and if the standby redo logs will be placed on /u01/oraredo, the server command to issue would be: $ java -jar oratcptest.jar -server -port=<port number> -file=/u01/oraredo/oratcp.tmp On the sending side, issue the following client command to send over 8k messages with SYNC writes: $ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k -write [Requesting a test] Message payload = 8 kbytes The lower throughput is a result of the latency of the network round-trip and the write to disk. The round-trip is a necessity with SYNC transport but the write to disk can be addressed with the following section. NOTE: Sync transport with higher round trip latency (> 5ms) can impact application response time and throughput for OLTP applications significantly. In the same environment with batch jobs or DML operations, overall elapsed time may not be impacted as much if sufficient network bandwidth is available. Implement FASTSYNCAs of Oracle 12c, Data Guard FASTSYNC can improve round trip time of a sync remote write by acknowledging the write when written to memory, instead of waiting for the write to disk to complete. Whether you see a benefit with FASTSYNC depends on the speed of the disk at the standby database. Enable FASTSYNC in the log_archive_dest_n parameter by setting Data Guard Broker property LogXptMode=FASTSYNC or by setting SYNC NOAFFIRM directly in the log_archive_dest_n parameter when Broker is not used. DGMGRL> edit database standby set property LogXptMode='FASTSYNC'; OR SQL> alter system set log_archive_dest_2= ‘service=<standby net service name> SYNC NOAFFIRM db_unique_name=<standby unique name> net_timeout=8 valid_for=(online_logfile,all_roles)’
Test the benefits of FASTSYNC in oratcptest by running SYNC mode without the -write option. Server(standby): $ java -jar oratcptest.jar -server -port=<port number> Client(primary): $ java -jar oratcptest.jar test.server.address.com -port=<port number> -mode=sync -duration=120s -interval=20s -length=8k [Requesting a test]
NOTE: As the redo write size increases, the throughput and latency increase. Therefore, it is important to repeat these tests with actual redo write size from metrics collected during sync redo transport.
Increase Socket Buffer SizeSocket buffers do not have the same impact on SYNC transport as they do for ASYNC; however, increased buffer sizes can help resolve gaps in redo following a standby database outage. Using the previously determined socket buffer size is recommended but a setting of 3*Bandwidth Delay Product (BDP) can be used as well. For example, if asynchronous bandwidth is 622 Mbits and latency is 30 ms BDP = 622,000,000 (bandwidth) / 8 x 0.030 (latency) = 2,332,500 bytes 3 x BDP = 6,997,500 bytes Set the Linux kernel parameters net.core.rmem_max and net.core.wmem_max to this value as described above in 'Configuring Operating System Maximum Buffer Size Limits'
Command Options When ASYNC is chosen, only the bandwidth is measured. Using the default message length (1MB) should suffice for a bandwidth calculation. The bandwidth is measured from an application point of view as it is calculated from the beginning of the message send to the start of the next message send. Using the message size and this time interval, bandwidth is calculated. The average bandwidth is the average measurement of all measurements made during the Statistics interval. With oratcptest the latency is calculated when the SYNC option is selected. This calculation is based on the time interval from the start of a message send from the client to the application acknowledgement the client gets from the server. The statistic interval period is used to calculate the average latency from each sent and acknowledged message. This is application latency and includes the lower network protocol latency's. More than one message send occurs during the statistics interval and oratcptest tracks the time interval between all message sends and the acknowledged. If the -file and -write parameters are used, the latency includes the server's write to disk. Because oratcptest uses the interval between the start of the message write and the receipt of the acknowledgement message, latency normally increases as the size of the message increases.
The following table lists the available server options
The following table lists the available client options
|
- Script to detect if SACK is seen(4.3 KB)
- oratcptest(27.38 KB)
No comments:
Post a Comment