Best Practices and Recommendations for RAC databases with SGA size over 100GB

admin · 发表于 2021-8-6 16:02:19

Applies to:  Oracle Database Backup Service - Version N/A and later
Oracle Database Cloud Schema Service - Version N/A and later
Gen 1 Exadata Cloud at Customer (Oracle Exadata Database Cloud Machine) - Version N/A and later
Oracle Cloud Infrastructure - Database Service - Version N/A and later
Oracle Database Exadata Express Cloud Service - Version N/A and later
Information in this document applies to any platform.
PurposeThe goal of this note is to provide best practices and recommendations to users of Oracle Real Application Clusters (RAC) databases using very large SGA (e.g. 100GB) per instance (note that RAC assumes homogeneously sized SGAs across the cluster). This document is compiled and maintained based on Oracle's experience with its global RAC customer base.

This is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this note.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. Oracle Support has identified 100 GB as a baseline for large SGA's that would benefit from the recommendations provided in this note. However, this is just a baseline, and it is possible for similar(but smaller) SGA's to benefit from these recommendations. It is thus imperative that any recommendations from this note are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made

ScopeThis article applies to all new and existing RAC implementations.
This is for RAC databases only as most of the parameters listed in here are for RAC Database only.
DetailsNote that the recommendations presented in this note are a result of the experience from working on databases with SGA of 1 TB and 2.6 TB.
However, the databases with SGA of 100GB and 300GB also benefited from the recommendations
Also, some recommendation is removed for 18.1 and above, so check if the recommendation is applicable to your database.
Note: ORAchk 18.2 and above can be used to validate the proper settings for Large SGA Databases (those documented in this MOS Document.  Though the check is available within ORAchk 18.2, it is always recommended to use the latest version of ORAchk which is available via Document <span1268927.2> to ensure you are receiving the most up-to-date information.

Download latest AHF. Refer ti Autonomous Health Framework (AHF) - Including TFA and ORAchk/EXAchk Document 2550798.1

init.ora parameters:

a.     Set _lm_sync_timeout to 1200 (this recommendation is valid only for databases that are12.2 and lower)
      Setting this will prevent some timeouts during reconfiguration and DRM. It's a static parameter and rolling restart is supported.

b.    Set shared_pool_size to 15% or larger of the total SGA size.
      For example, if SGA size is 1 TB, the shared pool size should be at least 150 GB. It's a dynamic parameter.

c.    Set _gc_policy_minimum to 15000
      There is no need to set _gc_policy_minimum if DRM is disabled by setting _gc_policy_time = 0. _gc_policy_minimum is a dynamic parameter, _gc_policy_time is a static parameter and rolling restart is not supported. To disable DRM, instead of _gc_policy_time, _lm_drm_disable should be used as it's dynamic.

d.     Set _lm_tickets to 5000 (this recommendation is valid only for databases that are12.2 and lower)
      Default is 1000. Allocating more tickets (used for sending messages) avoids issues where we ran out of tickets during the reconfiguration. It's a static parameter and rolling restart is supported. When increasing the parameter, rolling restart is fine but a cold restart can be necessary when decreasing.

e.    Set gcs_server_processes to the twice the default number of lms processes that are allocated. (this recommendation is valid only for databases that are12.2 and lower)
   The fix is also included in the 12.2.0.1 JUL 2018 database RU, so this does apply to the database that is running on 12.2.0.1 JUL 2018 or higher.
      The default number of lms processes depends on the number of CPUs/cores that the server has,
      so please refer to the gcs_server_processes init.ora parameter section in the Oracle Database Reference Guide
      for the default number of lms processes for your server.  Please make sure that the total number of lms processes
      of all databases on the server is less than the total number of CPUs/cores on the server.  Please refer to the Document 558185.1
      It's a static parameter and rolling restart is supported.

Increasing _lm_sync_timeout avoids instance evictions when some reconfiguration steps take longer than expected due to the large SGA.  Setting _lm_sync_timeout also changes the default setting of _lm_rcfg_timeout that is set to 3 times _lm_sync_timeout.  This will prevent some steps from timing out altogether.

_lm_tickets allocates the number of flow control tickets that lms uses to send messages.  The default setting is 1000. Allocating more tickets (used for sending messages) avoids issues such as seen in BUG#16088176.  Refer to the bug 16088176 for more detail.  The suggestion is to set _lm_tickets to 8000 for databases with large sga even if the patch for the bug _lm_tickets is applied.
Ticket allocation has been made dynamic in 19c, so we no longer need the statically-defined limit value in 19c.

Setting _gc_policy_minimum to 15000 or larger makes DRM to occur much less frequently.  Setting this is preferable than disabling the DRM for most databases.

The following bug fixed allocation issues for lms processes, so setting gcs_server_processes in 18c and 19c are not needed>
BUG 26992964 - LOWER LARGE SGA THRESHOLD FOR SPAWNING MORE LMS'S TO 100GB FROM 256GB

Removed the following recommendation on 8/7/2017
Set _ksmg_granule_size to 134217728
Setting this will cut down the time needed to locate the resource for a data block. It's a static parameter and rolling restart is supported.As a result of removing the above recommendation, the following explanation is also removed for the internal only section:
_ksmg_granule_size allocate more dlm buckets and thus reduce the number of resources hashed to one bucket and therefore cut down the time needed for dlm to locate a resource in a specific bucket. Currently the number of dlm resource buckets is limited by_ksmg_granule_size, which defaults to 32 MB. By increasing the granule size to the above (128 MB) we can allocate more dlm buckets and thus reduce the number of resources hashed to one bucket and therefore cut down the time needed for dlm to locate a resource in a specific bucket.

[This section is not visible to customers.]

Setting up hugepages is a general recommendation for all Linux users, but using hugepages is particularly more important for database that has large SGA.
In other words, setting up hugepages when SGA is large is a critical recommendation.

For other platforms, consider using large pages if possible.

Following patches are recommended:
11.2.0.3.5 DB PSU or above is highly recommended to address known issues with large SGA sizes.

For 11.2.0.4 customers: applying the following bug fixes can helpBUG 12747740 - RAC PERF: NODE JOIN RECONFIGURATION (PCMREPLAY) DOES NOT SCALE WITH MORE LMS'S
BUG 14193240 - LMS SIGNALED ORA-600[KGHLKREM1] DURING BEEHIVE LOAD
BUG 16392068 - MSGQ: LMS0 HITS ORA-600 [KJBMPOCR:DSB]
BUG 17232014 - INITIAL ALLOCATION FOR KJBR&KJBL ARE TOO LOW W/ LARGE CACHES DUE TO UB4 OVERFLOW
BUG 17257445 - RAC PERF: DRM OPTIMIZATION (BUG 14558880) SHOULD ALSO WORK FOR RECONFIGURATION
BUG 17314971 - RAC PERF: RM/PT LATCH REDUCTION FOR RCFG (17257445) SHOULD BE ENABLED FOR SYNC7

For SGA that is larger than 4 TB and for Linux platform,
BUG 18780342 - LINUX SUPPORT FOR > 4TB SGA

Only for Oracle Linux platform, have the customer get the kernel fix for the bug that is desirable but not required. The fix is not available for Redhat and otherLinux.
bug 18820745 - [8382D914] MM, HUGETLB: IMPROVE PAGE-FAULT SCALABILITY

Have the customer open a SR to Oracle Linux Support to get the above kernel fix (V2.6.39-400.220.0)
Refer to the internal description of the bug

[This section is not visible to customers.]

		自动登录	找回密码
密码			立即注册

[性能优化] Best Practices and Recommendations for RAC databases with SGA size over 100GB

浏览过的版块