Pages

Friday, February 8, 2019

RAC Basic Faq's

RAC Basic Faq's


Useful Commands:

crsctl enable has –> Enable Automatic start of Oracle High Availability services after reboot
crsctl disable has –> Disable Automatic start of Oracle High Availability services after reboot
What is OHASD?
Ohasd stands for Oracle High Availability Services Daemon. Ohasd spawns 3 types of services at cluster level.

Level 1 : Cssd Agent
Level 2: Oraroot Agent (respawns cssd, crsd, cttsd, diskmon,acfs)
Level 3: OraAgent(respawns mdsnd, gipcd, gpnpd, evmd, asm), CssdMonitor

Useful Commands:
1. crsctl enable has –> To start has services after reboot.
2. crsctl disable has –> has services should not start after reboot
3. crsctl config has –> Check configuration whether autostart is enabled or not.
4. cat /etc/oracle/scls_scr/<Node_name>/root/ohasdstr –> check whether it is enabled or not.
5. cat /etc/oracle/scls_scr/<Node_name>/root/ohasdrun –> whether restart enabled if node fails.

What is OCR? How and why OLR is used? Where is the location of OCR & OLR?
OCR stands for Oracle Cluster Registry. It holds information on it such as node membership (which nodes are part of this cluster), Software Version, Location of the voting disk, Status of RAC databases, listeners, instances & services. OCR is placed in ASM, OCFS.

ASM can be brought up only if we have access to OCR. But, OCR is accessible only after the ASM Is up. In this case, how will CRS services come up?

Yes. For this OLR (Oracle Local Registry) is there. This is a multiplexing of OCR file which was placed in local file system.

OLR holds information on it such as CRS_HOME, GPnP details, active version, localhost version, OCR latest backup(with time & location), Node name.,., .
Location Of OCR & OLR:

#cat /etc/oracle/ocr.loc –> OCR file Details.
ocrconfig_loc=<+ASM_Location>
local_only=FALSE

# cat /etc/oracle/olr.loc –> OLR file Details.
olrconfig_loc=<file_name_with_location.olr>
crs_home=<CRS_HOME_Location>

Useful Commands:

NOTE: Some commands like restore need bounce of services. Please verify before taking any action.

ocrconfig –showbackup –> OCR file backup location
ocrconfig –export < File_Name_with_Full_Location.ocr > –> OCR Backup
ocrconfig –restore <File_Name_with_Full_Location.ocr> –> Restore OCR
ocrconfig –import <File_Name_With_Full_Location.dmp> –> Import metadata specifically for OCR.
Ocrcheck –details –> Gives the OCR info in detail
ocrcheck –local –> Gives the OLR info in detail
ocrdump –local <File_Name_with_Full_Location.olr> –> Take the dump of OLR.
ocrdump <File_Name_with_Full_Location.ocr> –> Take the dump of OCR.

What is the Voting Disk and how is this Used?
If a node joins cluster, if a node fails (may be evicted), if VIP need to be assigned in case of GNS is configured. In all the cases, voting disk comes into picture. Voting disk saves the info of which nodes were part of cluster. While starting the crs services, with the help of OCR, it will vote in the voting disk (Nothing but mark attendance in the cluster)

We need not take the backup of the voting disk periodically like our cron jobs. We are supposed to take backup only in SOME of the below cases.

There are two different jobs done by voting disk.

Dynamic – Heart beat information
Static – Node information in the cluster
Useful Commands:

dd if=Name_Of_Voting_Disk of=Name_Of_Voting_Disk_Backup –> Taking backup of voting disk
crsctl query css votedisk –> Check voting disk details.
crsctl add css votedisk path_to_voting_disk –> To add voting disk
crsctl add css votedisk –force –> If the Cluster is down
crsctl delete css votedisk <File_Name_With_Password_With_file_name> –> Delete Voting disk
crsctl delete css votedisk –force –> If the cluster is down
crsctl replace votedisk <+ASM_Disk_Group> –> Replace the voting disk.

What is CRS?
CRSD stands for Cluster Resource Service Daemon. It is a proce–> which is responsible to monitor, stop, start & failover the resources. This process maintains OCR and this is responsible for restarting resource when any failover is about to take place.

Useful Commands:

crs_stat –t –v –> Check crs resources
crsctl stat res -t –> Check in a bit detail view. BEST ONE.
crsctl enable crs –> Enable Automatic start of Services after reboot
crsctl check crs –> Check crs Services.
crsctl disable crs –> Disable Automatic start of CRS services after reboot
crsctl stop crs –> Stop the crs services on the node which we are executing
crsctl stop crs –f –> Stop the crs services forcefully
crsctl start crs –> To start the crs services on respective node
crsctl start crs –excl –> To start the crs services in exclusive mode when u lost voting disk.
You need to replace the voting disk after you start the css.
crsctl stop cluster –all –> Stop the crs services on the cluster nodes
crsctl start cluster –all –> Start the crs services on all the cluster nodes.
olsnodes –> Find all the nodes relative to the cluster
oclumon manage –get master –> With this you will get master node information
cat $CRS_HOME/crs/init/<node_name>.pid –> Find PID from which crs is running.

What is CSSD?
CSSD stands for Cluster Synchronization Service Daemon. This is responsible for communicating the nodes each other. This will monitor the heart beat messages from all the nodes.

Example:

We have 2 node RAC cluster. Till one hour back, our CSSD is monitoring both the nodes and able to communicate each other. Now, if one of the node is down, CRS should know that one of the node is down. This information is provided by CSSD process.

Simple Scenario:

If both the nodes are up & running now. And due to one of the communication channel, CSSD process got information that the other node is down. So, in this case, new transactions cannot be assigned to that node. The node eviction will be done. And the node which is running now will be taking the ownership as master node.

This sample scenario was taken for a better understanding ONLY.

Useful Commands:
crsctl stop css–> For stopping the css
crsctl disable css –> Disabling automatic startup after reboot.

What is CTTSD?
CTTSD stands for Cluster Time Synchronization Service Daemon. This service by default will be in observer mode. If time difference is there, it won’t be taking any action. To run this service in active mode, we need to disable all the time synchronization services like NTP (Network Time Protocol). But, it is recommended as per my knowledge to keep this service in observer mode. This line was quoted because, if this service is in online mode. And time synchronization difference is huge, the cttsd process may terminate. And sometimes, crsd fail to startup due to time difference.

Useful Commands:
cluvfy comp clocksync -n all -verbose –> To check the clock synchronization across all the nodes
crsctl check ctts –> Check the service status & timeoffset in msecs.

What is VIP?
VIP stands for Virtual IP Address. Oracle uses VIP for Database level access. Basically, when a connection comes from application end. Then using this IP address, it will connect. Suppose if IP for one of the node is down. As per protocol timeout, it need to wait 90 seconds to get a session. In this scenario, VIP comes into picture. If one of the VIP is down, connections will be routed only to the active node. The VIP must be on same address as public IP address. This VIP is used for RAC failover and RAC management.

Useful Commands:
srvctl start vip –n <node_name> -i <VIP_Name> –> To start VIP
srvctl stop vip –n <node_name> -i <VIP_Name> –> To stop VIP
srvctl enable vip -i vip_name –> Enable the VIP.
srvctl disable vip -i vip_name –> Disable the VIP.
srvctl status nodeapps –n <node_name> –> status of nodeapps
srvctl status vip –n <node_name> –> status of vip on a node

What is SCAN IP & Listener?
SCAN stands for Single Client Access Name. Scan IP’s must be on same sub net mask. Three SCAN IP’s is a recommended number of count which redirects user sessions to the scan listeners. Load balancing on scan listener will be done by least_recently_loaded algorithm.

SCAN Listener… When a connection is initiated from the application end, scan listener verifies the load balancing. And once it gets info, it will assign the connection to the node listener. And user can do his transaction.

Main use is that we need not change the connect string in the application servers if any changes on the cluster are done like adding a node, deleting a node and other modifications basing on requirement.

Useful Commands:
srvctl config scan –> retrieves scan listener configuration
srvctl config scan_listener –> List of scan listeners with Port number
srvctl add scan –n <node_name> –> Add a scan listener to the cluster
srvctl add scan_listener –p <Desired_port_number> –> to add scan listener on specific port
SQL> SHOW PARAMETER REMOTE_LISTENER; –> find the list of scan listeners
srvctl stop scan –> stops all scan listeners when used without –i option
srvctl stop scan_listener –> Stops one or more services in the cluster
srvctl start scan –> To start the scan VIP
srvctl start scan_listener –> Start the scan listener.
srvctl status scan –> verify scan VIP status
srvctl status scan_listener –> Verify scan listener status.
srvctl modify scan_listener –> Modify the scan listener
srvctl relocate scan_listener –i <Ordinal_Number> –n <node_name> –> relocate the scan listener to another node.

What is ologgerd?
Ologgerd stands for cluster logger service Daemon. This is otherwise called as cluster logger service. This logger services writes the data in the master node. And chooses other nodes as standby. If any network issue occurs between the nodes, and if it is unable to contact the master. Then the other node takes ownership & chooses a node as standby node. This master will manage the operating system metric database in CHM repository.

Useful Commands:
Oclumon manage –get master –> Find which is the master node
oclumon manage -get reppath –> Will get the path of the repository logs
oclumon manage -get repsize –> This will give you the limitations on repository size
Oclumon showobjects –>find which nodes are connected to loggerd
Oclumon dumpnodeview –> This will give a detail view including system, topconsumers, processes, devices, nics, filesystems status, protocol errors.
oclumon dumpnodeview -n <node_1 node_2 node_3> -last “HH:MM:SS” –> you can view all the details in c. column from a specific time you mentioned.
oclumon dumpnodeview allnodes -last “HH:MM:SS” –> If we need info from all the nodes.11.What is sysmon?
This process is responsible for collecting information in the local node. This will collect the info from every node and that data will be sent the data to master loggerd. This will send the info like CPU, memory usage, Os level info, disk info, disk info, process, file system info.

What is evmd?
Evmd stands for Event Volume Manager Daemon. This handles event messaging for the processes. It sends and receives actions regarding resource state changes to and from all other nodes in a cluster. This will take the help of ONS(Oracle Notification Services).

Useful Commands:

evmwatch -A -t “@timestamp @@” –> Get events generated in evmd.
Evmpost –u “<Message here>” –h <node_name> –> This will post message in evmd log in the mentioned node.

What is mdnsd?
Mdnsd stands for Multicast Domain Name Service. This process is used by gpndp to locate profiles in the cluster as well as by GNS to perform name resolutions. Mdnsd updates the pid file in init directory.

What is ONS?
ONS stands for Oracle Notification Service. ONS will allow users to send SMS, emails, voice messages and fax messages in a easy way. ONS will send the state of database, instance. This state information is used for load balancing. ONS will also communicate with daemons in other nodes for informing state of database.

This is started as part of CRS as part of nodeapps. ONS will run as a node application. Every node will have its own ONS configured.

Useful Commands:

srvctl status nodeapps –> Status of nodeapps
cat $ORACLE_HOME/opmn/conf/ons.config –> Check ons configuration.
$ORACLE_HOME/opmn/logs –> ONS logs will be in this location.

what is OPROCD ?
OPROCD stands for Oracle Process Monitor Daemon. Oprocd monitors the system state of cluster nodes. Stonith, which is nothing but power cycling the node. Simply, means power off & power on the server using reboot command. And main change in OPROCD is cssd agent from 11gR2.

Useful Commands:

CRS_HOME/oprocd stop –> To stop the processon single node.

What is FAN?
FAN stands for Fast Application Notification. If any state change occurs in cluster/instance/node, an event is triggered by the event manager and it is propogated by ONS. The event is known as FAN event. It was the feature which was introduced in Oracle 10g for an immediate notification. FAN uses ONS for notifying.

Useful Commands:

onsctl ping –> To check whether ons is running or not.
onsctl debug –> Will get detail view of ons.
onsctl start –> Start the daemon.
onsctl stop –> Stop the daemon.

What is TAF?
TAF stands for Trasparent Application Failover. When any rac node is down, the select statements need to failover to the active node. And insert, delete, update and also Alter session statements are not supported by TAF. Temporary objects & pl/sql packages are lost during the failover.

There are two types of failover methods used in TAF.

Basic failover: It will connect to single node. And no overload will be there. End user experiences delay in completing the transaction.
Preconnect failover: It will connect to primary & backup node at at time. This offers faster failover. An overload will be experienced as statement need to be ready to complete transaction with minimal delay.
Useful Commands:

Add a service:
Srvctl add service –d <database_name> -s <Name_for_service> -r <instance_names> -p <Policy_specification>
Policy specification – none, basic, preconnect

2.Check TAF status:
SELECT machine, failover_type, failover_method, failed_over, COUNT(*) FROM gv$session GROUP BY machine, failover_type, failover_method, failed_over;

What is FCF?

FCF stands for Fast Connection Failover. It is an application level failover process. This will automatically subscribes to FAN events and this will help in immediate reaction on the up & down events from the database cluster. All the failure applications are cleaned up immediately, so that the application will receive a failure message. And after cleanup, if new connection is received then with load balancing it will reach active node. As said, this is application level process I am not discussing much.

What is GCS(LMSn)?

GCS stands for Global Cache Service. GCS catches the information of data blocks, and access privileges of various instances. Integrity is maintained by maintaining global access. It is responsible for transferring blocks from instance to another instance when needed.

Clear Understanding: Blocks of table “A” were retrieved with a connection to second node. Now, if first node requests blocks from this table, services need not pick the data from the datafiles. Blocks can be retrieved from other instance. This is the main use of GCS.

What is GES(LMD)?

GES stands for Global Enqueue Service. GES controls library and dictionary caches on all the nodes. GES manages transaction locks, table locks, library cache locks, dictionary cache locks, database mount lock.

21. What is GRD?

GRD stands for Global Resource Directory. This is to record the information of resources and enqueues. As the word, it stores info on all the information. Information like Data block identifiers, data block mode(shared, exclusive, null), buffer caches will be having access.

22. What is GPNPD?

GPNPD stands for Grid Plug aNd Play Daemon. A file is located in CRS_HOME/gpnp/<node_name>/profile/peer/profile.xml which is known as GPNP profile. And this profile consists of cluster name, hostname, ntwork profiles with IP addresses, OCR. If we do any modifications for voting disk, profile will be updated.

Useful Commands:

gpnptool ver -> Check the version of tool.
gpnptool lfind ->  get local gpnpd server.
gpnptool get -> read the profile
gpnptool lfind -> check daemon is running on local node.
gpnptool check –p= CRS_HOME/gpnp/<node_name>/profile/peer/profile.xml -> Check whether configuration is valid.
23. why is Diskmon?

Disk monitor daemon continuously runs when ocssd starts. And it monitors and performs I/O fencing for Exadata storage server (This server is termed as cell as per Exadata). This process will run since the ocssd starts because exadata cell can be added to any cluster at any time.

Useful Commands:

./crsctl stat res ora.diskmon <– To check diskmon status.

No comments:

Post a Comment