Hi,
In this blog post, we will discuss about Fencing, or with other name called stonith (shoot the other node in the head), is used to protect data in the event that nodes become unresponsive. If a node fails to respond, it may still be accessing data. To ensure that your data is safe, you can use fencing to prevent a live node from accessing data until the original node is truly offline. To accomplish this task, you must configure a device that can ensure a node is taken offline. There are a number of available fencing agents that can be configured for this purpose. In general, stonith relies on particular hardware and service protocols that can force reboot or shutdown nodes physically to protect the cluster.
The following are different configurations that use some of the available fencing agents. Note that these examples make certain presumptions about hardware and assume that you already know how to set up, configure, and use the affected hardware.
1- Ensure that stonith is enabled for your cluster configuration:
sudo pcs property set stonith-enabled=true
2- After configuring stonith, run the following commands to check your configuration and ensure that it is set up correctly:
sudo pcs stonith config
sudo pcs cluster verify –full
3- To check the status of your stonith configuration, run the following command:
sudo pcs stonith
4- To view the status of your cluster, run the following command:
sudo pcs status
The following examples describe that various types of fencing configurations that you can implement.
1) IPMI LAN Fencing
Intelligent Platform Management Interface (IPMI) is an interface to a subsystem that provides management features of the host system’s hardware and firmware and includes facilities to power cycle a system over a dedicated network without any requirement to access the system’s operating system. You can configure the fence_ipmilan fencing agent for the cluster so that stonith can be achieved across the IPMI LAN.
If your systems are configured for IPMI, you can run the following commands on one of the nodes in the cluster to enable the ipmilan fencing agent and configure stonith for both nodes, for example:
sudo pcs stonith create ipmilan_n1_fencing fence_ipmilan pcmk_host_list=node1 delay=5 ipaddr=192.168.5.2 login=root passwd=password lanplus=1 op monitor interval=60s
sudo pcs stonith create ipmilan_n2_fencing fence_ipmilan pcmk_host_list=node2 ipaddr=192.168.5.3 login=root passwd=password lanplus=1 op monitor interval=60s
In the example, node1 is a host that has an IPMI LAN interface configured on the IP address 192.168.5.2. The host named node2 has an IPMI LAN interface that is configured on the IP 192.168.5.3. The root user password for the IPMI login on both systems is specified in this example as password. In each instance. You should replace these configuration variables with the appropriate values for your particular environment.
Note that the delay option should only be set to one node. This setting ensures that in the rare case of a fence race condition that only one node is killed and the other continues to run. Without this option set, it is possible that both nodes make the assumption that they are the only surviving node and then simultaneously reset each other.
2) SCSI Fencing
The SCSI Fencing agent is used to provide storage-level fencing. This configuration protects storage resources from being written to by two nodes simultaneously by using SCSI-3 PR (Persistent Reservation). Used in conjunction with a watchdog service, a node can be reset automatically by using stonith when it attempts to access the SCSI resource without a reservation.
– Install the watchdog service on both nodes and then copy the provided fence_scsi_check script to the watchdog configuration before enabling the service, as shown in the following example:
sudo dnf install watchdog
sudo cp /usr/share/cluster/fence_scsi_check /etc/watchdog.d/
sudo systemctl enable –now watchdog
– Enable the iscsid service that is provided in the iscsi-initiator-utils package on both nodes:
sudo dnf install -y iscsi-initiator-utils
sudo systemctl enable –now iscsid
– After both nodes are configured with the watchdog service and the iscsid service, you can configure the fence_scsi fencing agent on one of the cluster nodes to monitor a shared storage device, such as an iSCSI target, for example:
sudo pcs stonith create scsi_fencing fence_scsi pcmk_host_list=”node1 node2″ devices=”/dev/mapper/mpathc1″ meta provides=”unfencing”
In the example, node1 and node2 represent the hostnames of the nodes in the cluster and /dev/mapper/mpathc1 is the shared storage device. Replace these variables with the appropriate values for your particular environment.
4) SBD Fencing
The Storage Based Death (SBD) daemon can run on a system and monitor shared storage. The SBD daemon can use a messaging system to track cluster health. SBD can also trigger a reset if the appropriate fencing agent determines that stonith should be implemented. To set up and configure SBD fencing:
– Stop the cluster by running the following command on one of the nodes:
sudo pcs cluster stop –all
– On each node, install and configure the SBD daemon:
sudo dnf install sbd
– Enable the sbd systemd service:
sudo systemctl enable sbd
Note that the sbd systemd service is automatically started and stopped as a dependency of the pacemaker service, you do not need to run this service independently. Attempting to start or stop the sbd systemd service fails and returns an error indicating that it is controlled as a dependency service.
– Edit the /etc/sysconfig/sbd file and set the SBD_DEVICE parameter to identify the shared storage device. For example, if your shared storage device is available on /dev/mapper/mpathc1, make sure the file contains the following line:
SBD_DEVICE=”/dev/mapper/mpathc1″
On one of the nodes, create the SBD messaging layout on the shared storage device and confirm that it is in place. For example, to set up and verify messaging on the shared storage device at /dev/mapper/mpathc1, run the following commands:
sudo sbd -d /dev/mapper/mpathc1 create
sudo sbd -d /dev/mapper/mpathc1 list
– Lastly, start the cluster and configure the fence_sbd fencing agent for the shared storage device. For example, to configure the shared storage device, /dev/mapper/mpathc1, run the following commands on one of the nodes:
sudo pcs cluster start –all
sudo pcs stonith create sbd_fencing fence_sbd devices=/dev/mapper/mpathc1
5) IF-MIB Fencing
IF-MIB fencing takes advantage of SNMP to access the IF-MIB on an Ethernet network switch and to also shutdown the port on the switch, which effectively takes a host offline. This configuration leaves the host running, while disconnecting it from the network. It is worth bearing in mind that any FibreChannel or InfiniBand connections could remain intact, even after the Ethernet connection has been terminated, which means that any data made available on these connections could still be at risk. As a result, it is best to configure this fencing method as a fallback fencing mechanism. To configure IF-MIB fencing:
– Configure your switch for SNMP v2c, at minimum, and make sure that SNMP SET messages are enabled. For example, on an Oracle Switch, by using the ILOM CLI, you could run the following commands:
sudo set /SP/services/snmp/ sets=enabled
sudo set /SP/services/snmp/ v2c=enabled
– On one of the nodes in your cluster, configure the fence_ifmib fencing agent for each node in your environment
sudo pcs stonith create ifmib_n1_fencing fence_ifmib pcmk_host_list=node1 ipaddr=192.168.5.2 community=private port=1 delay=5 op monitor interval=60s
sudo pcs stonith create ifmib_n2_fencing fence_ifmib pcmk_host_list=node2 ipaddr=192.168.5.3 community=private port=2 op monitor interval=60s
If you have configured multiple fencing agents, you may want to set different fencing levels. Fencing levels enable you to prioritize different approaches to fencing and can provide a valuable mechanism for fallback options should your default fencing mechanism fail.
Each fencing level is attempted in ascending order, starting from level 1. If the fencing agent that is configured for a particular level fails, the fencing agent from the next level is then attempted, and so on.
sudo pcs stonith level add 1 node1 ipmilan_n1_fencing
sudo pcs stonith level add 1 node2 ipmilan_n2_fencing
sudo pcs stonith level add 2 node1 ifmib_n1_fencing
sudo pcs stonith level add 2 node2 ifmib_n2_fencing
On the nex blog post, we will discuss about the quorum in Oracle Linux Clustering.