reading”RAC and Clusterware Best Practice and Start Kit on windows”

最近在读如下这篇note时,才发现及时在windows平台上,oracle rac 需要注意、配置的地方还是很多的。参考如下这篇”best practice”
To BottomTo Bottom

Jul 13, 2012BULLETINPUBLISHED1
There are no commentsComments (0) Rate this document Email link to this document Open document in new window Printable Page

In this Document

Purpose
Scope
Details
RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)
RAC Platform Specific Starter Kits and Best Practices
RAC on Windows Step by Step Installation Instructions
RAC on Windows Best Practices
OS Configuration Considerations
Network Considerations
Storage Considerations
Hardware/Vendor Specific Considerations
Oracle Software Considerations
Installation
General
References

 

Applies to:

Oracle Server – Enterprise Edition – Version 10.2.0.1 to 11.2.0.1.0 [Release 10.2 to 11.2]
Microsoft Windows x64 (64-bit)
Microsoft Windows (32-bit)
Microsoft Windows Itanium (64-bit)

Purpose

The goal of the Oracle Real Application Clusters (RAC) series of Best Practice and Starter Kit notes is to provide customers with quick knowledge transfer of generic and platform specific best practices for implementing, upgrading and maintaining an Oracle RAC system. This document is compiled and maintained based on Oracle’s experience with its global RAC customer base.

This Starter Kit is not meant to replace or supplant the Oracle Documentation set, but rather, it is meant as a supplement to the same. It is imperative that the Oracle Documentation be read, understood, and referenced to provide answers to any questions that may not be clearly addressed by this Starter Kit.

All recommendations should be carefully reviewed by your own operations group and should only be implemented if the potential gain as measured against the associated risk warrants implementation. Risk assessments can only be made with a detailed knowledge of the system, application, and business environment.

As every customer environment is unique, the success of any Oracle Database implementation, including implementations of Oracle RAC, is predicated on a successful test environment. It is thus imperative that any recommendations from this Starter Kit are thoroughly tested and validated using a testing environment that is a replica of the target production environment before being implemented in the production environment to ensure that there is no negative impact associated with the recommendations that are made.

Scope

//适用的范围:不仅仅是RAC的安装,在后续的升级同样适用

This article applies to all new and existing RAC implementations as well as RAC upgrades.

Details

RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)

The following document focuses on RAC and Oracle Clusterware Best Practices that are applicable to all platforms including a white paper on available RAC System Load Testing Tools and RAC System Test Plan outlines for 10gR2 & 11gR1 and 11gR2:

Document 810394.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)

 

RAC Platform Specific Starter Kits and Best Practices

The following notes contain detailed platform specific best practices including Step-By-Step installation cookbooks (downloadable in PDF format):

Document 811306.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
Document 811280.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
Document 811271.1 RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
Document 811293.1 RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
Document 811303.1 RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)

 

RAC on Windows Step by Step Installation Instructions

Click here for a Step By Step guide for installing Oracle RAC 10gR2 on Windows (2003 and 2008)
Click here for a Step By Step guide for installing Oracle RAC 11gR1 on Windows (2003 and 2008)
Click here for a Step By Step guide for installing Oracle RAC 11gR2 on Windows (2003 and 2008)

RAC on Windows Best Practices

The Best Practices in this section are specific to the Windows Platform. That said, it is essential that the Platform Independent Best Practices found in Document 810394.1 be reviewed in addition to the content provided in this Document.

OS Configuration Considerations

//杀毒软件是需要注意的一个地方,也许你费劲巴拉debug半天,才发现oracle自己的某个文件被放到了隔离区中,那是很郁闷的一件事情

  • Disable Anti-Virus software running on cluster nodes before and for the entire duration of the installation of Oracle on Windows (be mindful of manual reboots during this time).  Anti-Virus software may of course be re-enabled following the installation but the following should be excluded from being scanned:
    • Oracle Software directories
    • OCFS filesystems
    • ACFS filesystems
    • Network scanning of the private interconnect
  • Disable the Windows firewall.  If the Windows firewall must be re-enabled, it MUST NOT be configured for the private network.
  • //防火墙和磁盘的加密,都会给你带来阻碍。
  • The use of Disk Encryption software on RAC servers is highly discouraged and is to be avoided as Disk encryption software has been known to cause problems during multi-node installation and patching.
  • For Windows 2008 systems modify the elevation prompt behavior for administrators to “Elevate without prompting”:
    1. Open a command prompt and type 'secpol.msc' to launch the Security Policy Console management utility.
    2. From the Local Security Settings console tree, click Local Policies, and then Security Options
    3. Scroll down to and double-click User Account Control: Behavior of the elevation prompt for administrators.
    4. From the drop-down menu, select: "Elevate without prompting (tasks requesting elevation will automatically run as elevated without prompting the administrator)"
    5. Click OK to confirm the changes
  • Ensure that the Administrators group has the ability to manage auditing and security logs:
    1. Open a command prompt and type 'secpol.msc' to launch the Security Policy Console management utility.
    2. Click on 'Local Policies'
    3. Click on 'User Rights Assignment'
    4. Locate and double click the 'Manage auditing and security log' in the listing of User Rights Assignments.
    5. If the Administrators group is NOT listed in the 'Local Security Settings' tab, add the group now.
    6. Click OK to save the changes (if changes were made)
  • Set /USEPMTIMER in the boot.ini to prevent excessive LMD and LMS trace generation and to prevent connectivity issues as described in Document 437101.1.
  • //desktop heap也是头一次听说,和很多memory exhaustion相关。
  • Increase the size of the default Non-Interactive Desktop Heap to 1MB to prevent instability due to Desktop Heap exhaustion.  Information on how to increase this value can be found in Document 744125.1 and Microsoft Knowledge Base Article KB947246.   It is advised that you consult with Microsoft for further tuning of the Non-Interactive Desktop Heap beyond 1MB.
  • //而且官方的文档建议:最好在cmd窗口、会话级别来设置ORACLE_HOME,而不是整个系统的环境变量中设置。
  • Do not set ORACLE_HOME as an environment variable in Oracle (RAC on) Windows environments. If needed (for example, when running opatch) set this variable as needed in a command prompt window.  Reference Document 969581.1.  In particular, this can cause undesired behavior (for example:  listeners starting under the wrong ORACLE_HOME) during upgrade.
  • //在系统的高级设置中,需要将best performance设置为”program”
  • Windows 2000 and 2003 systems should be optimized for Memory Usage of Programs not System Caching (not an option in 2008):
    Start -> Settings -> Control Panel -> System -> Advanced -> Performance -> Memory Usage: Adjust for best performance of -> Programs instead of System Caching
  • Run Perfmon to monitor CPU, Memory, Network, Disk IO Rates – To aid in troubleshooting, configure Perfmon to monitor these OS statistics and to generate binary log files (.BLG). Instructions for implementing this change can be found on the Microsoft support website using the following link: http://support.microsoft.com/kb/146005
  • Download and install Debugging Tools for Windows (containing, among others, adplus and windbg) on each node of your RAC on Windows cluster.  These tools can be an invaluable resource when troubleshooting complex issues.  The downloads and instructions for implementation can be found on the Download and Install Debugging Tools for Windows MSDN website.
  • Download and familiarize the DBA team with useful Sysinternals Windows utilities such as Process Explorer.  These utilities are available on the Microsoft Sysinternals Website.
  • Keep memory allocation under 80%. We recommend shooting for 75% allocated, that is, more than 20-25% free. This will allow for ample memory needed for Windows OS operations (including collection of physical memory dumps if required).
  • There is a general requirement for Oracle RAC to synchronize the time on all nodes.  If the Windows Time Service is being used, modify the Windows Time service settings to avoid large jumps in time and allow the time to gradually match with the reference time. Restart the Windows Time service after you complete this task.
1. Open a command prompt (as the Admin user) and type 'regedit'.
2. Within the registry editor locate the HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesW32TimeConfig key.
3. Set the value for MaxPosPhaseCorrection = 600
4. Set the value for MaxNegPhaseCorrection = 600
5. Set the value for MaxAllowedPhaseOffset = 600
Note that with these values set, if there is a time discrepancy between a cluster node and the reference node that is greater than 10 minutes, time will not be adjusted and a message will be logged (to the Windows Event Viewer)*  Consider also setting the ‘updateinterval’ parameter.  Reference:  http://technet.microsoft.com/en-us/library/cc773263%28v=ws.10%29.aspx
** Consider also setting ‘Maximum Tolerance for Computer Clock Synchronization’ (Reference: http://technet.microsoft.com/en-us/library/cc779260%28v=WS.10%29.aspx). Note thatKerberos Authentication will failif the time difference between the nodes is greater than the ‘Maximum Tolerance for Computer Clock Synchronization’.

Note:  With 11gR2, Cluster Time Synchronization Daemon (CTSSD) can be used in place of Windows Time Service. CTSSD will synchronize time with a reference node in the cluster when Windows Time Service is not found to be configured. Should you require synchronization from an external time source you must use Windows Time Service which will cause CTSSD to run in “observer” mode. However, if Windows Time Service is running, then it must to be configured as shown above.

Network Considerations

  • The Public network MUST be listed first in the network interface binding order.  To make this change perform the following:
    1. Click Start, click Run, type 'ncpa.cpl', and then click OK.
    2. In the menu bar on the top of the window click 'Advanced' and choose 'Advanced Settings' (For Windows 2008, if the "Advanced" is not showing, click 'Alt' to enable that menu item).
    3. Under the Adapters and Bindings tab use the up arrow to move the Public interface to the top of the Connections list.
    4. Under Binding order for increase the priority of IPv4 over IPv6
    5. Click OK to save the changes
  • DHCP Media Sense MUST be disabled. This change must be manually implemented for Windows 2000 but is disabled by default in 2003. Additional information (including instructions for disabling) for Windows 2000 and Windows 2003 can be found in MS Knowledge Base Article KB239924. For Windows 2008, this feature is once again enabled. To disable DHCP Media Sense on 2008, execute the following from a command window as the Administrator user:
    C:UsersAdministrator> netsh interface ipv4 set global dhcpmediasense=disabled
    C:UsersAdministrator> netsh interface ipv6 set global dhcpmediasense=disabled
    Validate the change with:C:UsersAdministrator> netsh interface ipv4 show global
    C:UsersAdministrator> netsh interface ipv6 show global

    //”SNP feature捣乱的问题暂时没有遇到,但是隐约觉得客户那边很多不确定、不容易重现的问题会和这个设置有关”

  • After installing Windows Server 2003 Service Pack 2 (SP2) or Windows Server 2003 Scalable Networking Pack (SNP), turn off default SNP features. On a computer that has a TCP/IP Offload-enabled network adapter, you may experience many network-related problems like network adapters consuming lots of nonpaged pool memory or adapters requesting large blocks of contiguous memory causing the computer to stop responding when it tries to free the memory. This problem also affects Windows 2008 operating systems.  See My Oracle Support Document 988008.1 and Microsoft Knowledge Base Articles KB948496 andKB951037 for details around this issue and instructions on how to take corrective action.  Essentially, you can apply this recommendation by issuing the following commands:
C:UsersAdministrator> netsh int tcp set global chimney=disabled
C:UsersAdministrator> netsh int tcp set global rss=disabled
Validate these changes with:C:UsersAdministrator> netsh interface ipv4 show global
C:UsersAdministrator> netsh interface ipv6 show global

  • Do not use the names:  PUBLIC and PRIVATE (all caps) for your public and interconnect networks (NICs) due to unpublished Bug 6844099.  The words public and private themselves may be used, for example:  Public and Private are acceptable.
  • Network interface names in ‘Network Connections’ (under Control PanelAll Control Panel ItemsNetwork Connections) must match ‘name’ as indicated by ‘Ethernet adapter <name>’ in the ipconfig /all output.
  • Note that there is currently no plan to enable the (11.2) Redundant Interconnect Usage feature and HAIPs on Windows.

Storage Considerations

//杀毒软件还是非常凶险的东西,参考如下标红的部分

  • It is strongly advised to bring the entire Oracle software stack down in order to complete all disk virus scans for conventional Fat 16/32/NTFFS as well as OCFS file systems. This is because Oracle and the virus scan software use different types of locking which are not compatible.  Hence a shared disk for the database configured with OCFS could have 2 nodes virus scanning at the same time and could potentially cause the cluster to crash. We strongly suggest that you virus scan only from one node and only during maintenance windows.  OCFS disks that only contain Oracle database datafiles do not need to be virus scanned.  OCFS disks that contain any non-database datafiles or database configuration files should be scanned periodically (with the entire Oracle stack down).
  • Desupport of the Oracle Cluster File System (OCFS) on Windows is final with Oracle Database 12.  Customers currently using OCFS on Windows to host either the Oracle cluster files (Oracle Cluster Registry – OCR – and Voting Files) or database files or both will need to migrate these files off OCFS prior to upgrading to Oracle Database 12.  See My Oracle Support Document 1392280.1 for more details.

Hardware/Vendor Specific Considerations

  • Ensure minimum BIOS version 2.35.3.3 is used for SUN V40Z DUAL CORE machines, for ECC memory checking.
  • Ensure SUN V40Z 2.6V memory management voltage regulator issues. A SUN CE can identify if the voltage regulator is beginning to fail. The new VRM (Voltage Regulator Module) revision board from rev 1.0 to rev 2.0.

Oracle Software Considerations

The Software Considerations in this section are specific to the Windows Platform. That said, it is highly recommended that the Platform Independent Best Practices found in Document 810394.1 be reviewed in addition to the content below.

Installation

  • Prevent installation failures by disabling the Windows Firewall prior to installation of Oracle, this applies to all Oracle versions.  See the OS Configuration Considerations section within this note for details.
  • Prevent installation failures by disabling Anti-Virus software prior to installation of Oracle, this applies to all Oracle versions. See the OS Configuration Considerations section within this note for details.
  • Prevent installation failures by stopping ‘Distributed Transaction Coordinator’ and ‘Windows Management Instrumentation’ (WMI) services (on each node) prior to installation or patching.  Note that in some cases it may be required to actually disable WMI to allow patching.
  • Prepare for resolution of potential locked file issues by downloading ‘Process Explorer’ from Microsoft’s sysinternals website prior to installation or patching.

General

  • Be sure that the latest windows bundle patch has been applied to ensure optimal performance and stability of the system.  This applies to ALL Oracle releases.  The latest available patch bundles can be found inDocument 161549.1.
  • Note that as a normal function of our Oracle Clusterware / Grid Infrastructure, OraFenceService is designed to fence (I/O) and reboot a node if it perceives that that node is ‘hung’ once its configured timeout has been reached. The default timeout for the OraFence driver is a (very low) 5 seconds. What this means is that if the OraFence driver detects what it perceives to be a hang at the operating system level and that hang persists beyond 5 seconds, it’s possible that the OraFence driver – of its own accord – will fence and evict the node.  It is advisable in some cases to increase the OraFence timeout value as high as 10 seconds in some cases.  The OraFence timeout is controlled by the following Windows registry key:  HKEY_LOCAL_MACHINESYSTEMCurrentControlSetServicesOraFenceServiceTimeout

Database – RAC/Scalability Community
To discuss this topic further with Oracle experts and industry peers, we encourage you to review, join or start a discussion in the My Oracle Support Database – RAC/Scalability Community

References

NOTE:1392280.1 – Desupport of Oracle Cluster File System (OCFS) on Windows with Oracle Database 12
NOTE:437101.1 – EXCESSIVE LMS AND LMD TRACE FILE SIZES GENERATED ON WINDOWS RAC
NOTE:744125.1 – Connections Fail with ORA-12640 or ORA-21561
NOTE:810394.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (Platform Independent)
NOTE:988008.1 – RAC on Windows: Recurring Node Evictions May Be Caused by Default SNP Features Available for Windows Server 2003 SP2 and 2008
NOTE:811280.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (Solaris)
NOTE:811306.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (Linux)
NOTE:969581.1 – How to Set or Switch Oracle Homes on Windows
NOTE:811271.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (Windows)
NOTE:811293.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (AIX)
NOTE:811303.1 – RAC and Oracle Clusterware Best Practices and Starter Kit (HP-UX)
BUG:6844099 – OUI RETURNS WRONG VALUE WHEN NETWORKINTERFACE IS CALLED PUBLIC, PRIVATE, UNKNOWN