Updating OS in Oracle RAC running on Azure using Flashgrid Software
In this blog post we will talk about on how to update the OS for nodes running Oracle RAC Database on Flashgrid Cluster.
Note: Simultaneously updating OS and applying Grid Infrastructure patches in rolling fashion is not recommended. Nodes should not be rebooted while GI cluster is in rolling patching mode.
Note: Running yum update without first stopping Oracle and FlashGrid services may result in the services restarting non-gracefully during the update.
Note: In-place upgrade from RH7 to RH8, or OL7 to OL8 is not supported. To move between major release versions, a new cluster must be deployed, and data migrated.
To update OS on a running cluster repeat the following steps on each node, one node at a time
1. Create backup snapshot of the OS disk
a. Flush OS buffers:
# sync
b. Create snapshot of the OS disk using the cloud console or CLI.
2. Make sure there are no other nodes that are in offline or re-syncing state. All disk groups must have zero offline disks and Resync = No:
# flashgrid-cluster
FlashGrid Cluster version 23.6.47……..
Storage Fabric version 23.6…
License: Active, Marketplace
Licensee: abc
Support plan: 24×7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
FlashGrid running: OK
Clocks check: OK
Configuration check: OK
Network check: OK
Querying nodes: dbtest01, dbtest02, dbtestqu …
Cluster Name: TEST
Cluster status: Good
—————————————————————
Node Status ASM_Node Storage_Node Quorum_Node Failgroup
—————————————————————
dbtest01 Good Yes Yes No DBTEST01
dbtest02 Good Yes Yes No DBTEST02
dbtestqu Good No No Yes DBTESTQU
—————————————————————
——————————————————————————————————–
GroupName Status Mounted Type TotalMiB FreeMiB OfflineDisks LostDisks Resync ReadLocal Vote
——————————————————————————————————–
DATA Good AllNodes NORMAL 4000 3198 0 0 No Enabled None
GRID Good AllNodes NORMAL 1024 732 0 0 No Enabled 3/3
——————————————————————————————————–
3. If the node is a database node,
a. Stop all local database instances running on the node.
a1. Check the database instances running on the Node
[oracle@dbtest02 ~]$ ps -ef|grep pmonoracle 4373 4171 0 17:34 pts/0 00:00:00 grep –color=auto pmon
oracle 21006 1 0 Feb11 ? 00:01:34 ora_pmon_dbtest2
oracle 22554 1 0 Feb11 ? 00:01:20 ora_pmon_dbtestARC2
grid 32507 1 0 Jan05 ? 00:05:03 asm_pmon_+ASM2
a2. Check the status of the Databases using SRVCTL
[oracle@dbtest02 ~]$ srvctl status database -db dbtestInstance dbtest1 is running on node dbtest01
Instance dbtest2 is running on node dbtest02 [oracle@dbtest02 ~]$ srvctl status database -db dbtestarc
Instance dbtestARC1 is running on node dbtest01
Instance dbtestARC2 is running on node dbtest02
a3. Stop the Database instances on all RAC Nodes
[oracle@dbtest02 ~]$ srvctl stop instance -i dbtestARC2 -d dbtestarc[oracle@dbtest02 ~]$ srvctl stop instance -i dbtest2 -d dbtest
a4. Re-verify if the databases are stopped and no running processes
[oracle@dbtest02 ~]$ ps -ef|grep pmonoracle 7027 4171 0 17:37 pts/0 00:00:00 grep –color=auto pmon
grid 32507 1 0 Jan05 ? 00:05:03 asm_pmon_+ASM2
b. Stop Oracle CRS on the node using Root User:
# crsctl stop crs
[root@dbtest02 ~]# cd /u01/app/19.3.0/grid/bin/[root@dbtest02 bin]# ./crsctl check cluster -all
**************************************************************
dbtest01:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
**************************************************************
dbtest02:
CRS-4537: Cluster Ready Services is online
CRS-4529: Cluster Synchronization Services is online
CRS-4533: Event Manager is online
************************************************************** [root@dbtest02 bin]# crsctl stop crs
CRS-2791: Starting shutdown of Oracle High Availability Services-managed resources on ‘dbtest02’
CRS-2673: Attempting to stop ‘ora.crsd’ on ‘dbtest02’
CRS-2790: Starting shutdown of Cluster Ready Services-managed resources on server ‘dbtest02’
CRS-2673: Attempting to stop ‘ora.LISTENER.lsnr’ on ‘dbtest02’
………………….
………………
CRS-2677: Stop of ‘ora.gpnpd’ on ‘dbtest02’ succeeded
CRS-2677: Stop of ‘ora.gipcd’ on ‘dbtest02’ succeeded
CRS-2793: Shutdown of Oracle High Availability Services-managed resources on ‘dbtest02’ has completed
CRS-4133: Oracle High Availability Services has been stopped.
b1. Check if only no database and grid processes are running
[root@dbtest02 bin]# ps -ef|grep d.binroot 2231 1 0 2023 ? 02:34:48 /opt/flashgrid/bin/flashgrid_aio_srv
root 2236 1 0 2023 ? 09:19:06 /opt/flashgrid/bin/flashgrid_target_srv
root 2247 1 1 2023 ? 1-07:28:38 /opt/flashgrid/bin/flashgrid_initiator_srv
grid 2282 1 0 2023 ? 04:30:11 /opt/flashgrid/bin/flashgrid_asm_srv
root 2290 1 0 2023 ? 09:00:16 /opt/flashgrid/bin/flashgrid_cluster_srv
root 2295 1 0 2023 ? 00:38:01 /opt/flashgrid/bin/flashgrid_iamback
root 2296 1 0 2023 ? 00:38:30 /opt/flashgrid/bin/flashgrid_reconstruct
root 2305 1 0 2023 ? 00:46:29 /opt/flashgrid/bin/flashgrid_diskwatch
root 8495 4121 0 17:38 pts/0 00:00:00 grep –color=auto d.bin
b2. Stop FlashGrid Storage Fabric services on the node:
[root@dbtest02 bin]# flashgrid-node stopFlashGrid Cluster version 23.6…..
Storage Fabric version 23.6…..
License: Active, Marketplace
Licensee: abc
Support plan: 24×7
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Checking for local disks in mounted disk groups…
Completing this operation may take long time. Please wait…
Taking local disks offline:
DATA -> success
GRID -> success
systemctl stop flashgrid … OK
b3. Re-verify if all the Flashgrid processes are stopped
[root@dbtest02 bin]# ps -ef|grep d.bin
root 9216 4121 0 17:39 pts/0 00:00:00 grep –color=auto d.bin
b3. Install OS updates:
# yum update
b4. Reboot the node:
# sync; sync
# flashgrid-node reboot
b5. Wait until the node boots up, all disks are back online, and resyncing operations are complete on all disk groups. All disk groups must have zero offline disks and Resync = No before it is safe to update the next node.
# flashgrid-cluster
If the node is a database node start all previously stopped database local instances.
Proceed with the next node.