The information in the following sections can help you set up a cluster hardware configuration. In some cases, the information is vendor specific.
If an RPS-10 Series power switch is used as a part of a cluster, be sure of the following:
Set the rotary address on both power switches to 0. Be sure that the switch is positioned correctly and is not between settings.
Toggle the four Setup switches on both power switches, as follows:
Ensure that the serial port device special file (for example, /dev/ttyS1) that is specified in the /etc/cluster.conf file corresponds to the serial port to which the power switch's serial cable is connected.
Connect the power cable for each cluster system to its own power switch.
Use null modem cables to connect each cluster system to the serial port on the power switch that provides power to the other cluster system.
Figure A-1 shows an example of an RPS-10 Series power switch configuration.
See the RPS-10 documentation supplied by the vendor for additional installation information. Note that the information provided in this document supersedes the vendor information.
The WTI NPS-115 and NPS-230 power switch is a network attached device. Essentially it is a power strip with network connectivity enabling power cycling of individual outlets. Only 1 NPS is needed within the cluster (unlike the RPS-10 model where a separate switch per cluster member is required).
Since there is no independent means whereby the cluster software can verify that each cluster member system has been plugged into the appropriate plug on the back of the NPS power switch, please take care to ensure correct setup. Failure to do so will cause the cluster software to incorrectly conclude that a successful power cycle has occurred.
When setting up the NPS switch the following configuration guidelines should be followed.
When configuring the power switch itself:
Assign a System Password (under the General Parameters menu). Note: this password is stored in clear text in the cluster configuration file, so choose a password which differs from the system's password. (Although, the file permissions for that file /etc/cluster.conf are only readable by root.)
Do not assign a password under the Plug Parameters.
Assign system names to the Plug Parameters, (for example, clu1 to plug 1, clu2 to plug 2 — assuming these are the cluster member names).
When running cluconfig to specify power switch parameters:
Specify a switch type of WTI_NPS.
Specify the password you assigned to the NPS switch (refer to Step 1 in prior section).
When prompted for the plug/port number, specify the same name as assigned in Step 3 in the prior section.
It has been observed that the NPS power switch may become unresponsive when placed on networks which have high occurrences of broadcast or multi-cast packets. In these cases isolating the power switch to a private subnet may be needed.
The NPS-115 power switch has a very useful feature which can accommodate power cycling cluster members with dual power supplies. The NPS-115 consists of 2 banks of power outlets, each of which is independently powered and has 4 plugs. Each power plug of the NPS-115 gets plugged into a separate power source (presumably a separate UPS). For cluster members with dual power supplies, plug their power cords into an outlet in each bank. Then, when configuring the NPS-115 and assigning ports, simply assign the same name to outlets in each bank that has been plugged the corresponding cluster member. For example, suppose the cluster members were clu3 and clu4, where clu3 is plugged into outlets 1 and 5, and clu4 is plugged into outlets 2 and 6:
Plug | Name | Status | Boot Delay | Password | Default | -----+----------------+---------+------------+------------------+---------+ 1 | clu3 | ON | 5 sec | (undefined) | ON | 2 | clu4 | ON | 5 sec | (undefined) | ON | 3 | (undefined) | ON | 5 sec | (undefined) | ON | 4 | (undefined) | ON | 5 sec | (undefined) | ON | 5 | clu3 | ON | 5 sec | (undefined) | ON | 6 | clu4 | ON | 5 sec | (undefined) | ON | 7 | (undefined) | ON | 5 sec | (undefined) | ON | 8 | (undefined) | ON | 5 sec | (undefined) | ON | -----+----------------+---------+------------+------------------+---------+
By specifying the same name to multiple outlets, in response to a power cycle command, all outlets with the same name will be power cycled. In this manner, a cluster member with dual power supplies can be successfully power cycled. Under this dual configuration, the parameters specified to cluconfig are the same as the single configuration described above.
The following information pertains to the RPC-3 and PRC-5 power switches.
The Baytech power switch is a network attached device. Essentially, it is a power strip with network connectivity enabling power cycling of individual outlets. Only 1 Baytech switch is needed within the cluster (unlike the RPS-10 model where a separate switch per cluster member is required).
Since there is no independent means whereby the cluster software can verify that you have plugged each cluster member system into the appropriate plug on the back of the Baytech power switch, please take care to ensure correct setup. Failure to do so will cause the cluster software to incorrectly conclude a successful power cycle has occurred.
As shipped from the manufacturer, all of the outlets of a Baytech switch are set to off. To power on the outlets into which the cluster members are plugged, use the Baytech's configuration menus by starting from the main menu, then selecting Outlet Control. From there, it is possible to turn on individual outlets, for example on 1, on 2, etc.
When setting up the Baytech switch the following configuration guidelines should be followed.
When configuring the Baytech power switch itself:
Using a serial connection, assign the IP address related parameters.
Under the Access => Network access menu, ensure that both Prompt for user name and Prompt for password are enabled.
Assign a user name and password under the Manage Users menu or use the default "admin" account with an assigned password. Note: this password is stored in clear text in the cluster configuration file, so choose a password which differs from the system's password (even though the file permissions for the file /etc/cluster.conf are only readable by root).
To assign the system names to the corresponding outlets, go to the Configuration menu, followed by the Outlets menu, and finally Name Outlets (for example, clu1 to outlet 1, clu2 to outlet 2 — assuming these are the cluster member names).
When running cluconfig to specify power switch parameters:
Specify a switch type of BAYTECH.
Specify the username and password assigned to the Baytech switch (refer to Step 3 in prior section).
When prompted for the plug/port number, specify the same name as assigned in Step 4 in prior section.
The following is an example screen output from configuring the Baytech switch which shows that the outlets have been named according to the example cluster names clu1 and clu2.
Outlet Operation Configuration Menu Enter request, CR to exit. 1)...Outlet Status Display: enabled 2)...Command Confirmation : enabled 3)...Current Alarm Level (amps): 4.1 4)...Name Outlets 5)...Outlet Power-up Delay 6)...Display Outlet Users Enter request>4 Enter number of outlet to name, CR to exit. 1)...clu1 2)...clu2 3)...Outlet 3 4)...Outlet 4 5)...Outlet 5 6)...Outlet 6 7)...Outlet 7 8)...Outlet 8
A description of the usage model for watchdog timers as a cluster data integrity provision appears in the Section called Choosing the Type of Power Controller in Chapter 2. As described in that section, there are two variants of watchdog timers: Hardware-based and software-based.
The following details the configuration tasks required in order to setup watchdog timer usage in a cluster hardware configuration.
Regardless of which type of watchdog timer is employed, it is necessary to create the device special file appropriate for the watchdog timer. This can be accomplished as follows:
# cd /dev # ./MAKEDEV watchdog
When running the cluconfig utility, where it prompts for the power switch type, specify a type of SW_WATCHDOG, regardless of the specific type of watchdog timer in use.
Any cluster system can utilize the software watchdog timer as a data integrity provision, as no dedicated hardware components are required. If you have specified a power switch type of SW_WATCHDOG while using the cluconfig utility, the cluster software will automatically load the corresponding loadable kernel module called softdog.
If the cluster is configured to utilize the software watchdog timer, the cluster quorum daemon (cluquorumd) will periodically reset the timer interval. Should cluqourumd fail to reset the timer, the failed cluster member will reboot itself.
When using the software watchdog timer, there is a small risk that the system will hang in such a way that the software watchdog thread will not be executed. In this unlikely scenario, the other cluster member may takeover services of the apparently hung cluster member. Generally, this is a safe operation; but in the unlikely event that the hung cluster member resumes, data corruption could occur. To further lessen the chance of this vulnerability occurring when using the software watchdog timer, administrators should also configure the NMI watchdog timer.
If you are using the software watchdog timer as a data integrity provision, it is also recommended to enable the Non-Maskable Interrupt (NMI) watchdog timer to enhance the data integrity guarantees. The NMI watchdog timer is a different mechanism for causing the system to reboot in the event of a hang scenario where interrupts are blocked. This NMI watchdog can be used in conjunction with the software watchdog timer.
Unlike the software watchdog timer which is reset by the cluster quorum daemon (cluquorumd), the NMI watchdog timer counts system interrupts. Normally, a healthy system will receive hundreds of device and timer interrupts per second. If there are no interrupts in a 5 second interval, a system hang has occurred and the NMI watchdog timer will expire, initiating a system reboot.
A robust data integrity solution can be implemented by combining the health monitoring of the the cluster quorum daemon with the software watchdog timer along with the low-level system status checks of the NMI watchdog.
Correct operation of the NMI watchdog timer mechanism requires that the cluster members contain an APIC chip on the main system board. The majority of contemporary systems do include the APIC component. Generally, Intel-based SMP systems and Intel-based uniprocessor systems with SMP system boards (2+ cpu slots/sockets, but only one CPU) are known the support the NMI watchdog.
There may be other server types that support NMI watchdog timers aside from ones with Intel-based SMP system boards. Unfortunately, there is no simple way to test for this functionality other than simple trial and error.
The NMI watchdog is enabled on supported systems by adding nmi_watchdog=1 to the kernel's command line. Here is an example /etc/grub.conf:
# # grub.conf # default=0 timeout=10 splashimage=(hd0,0)/grub/splash.xpm.gz title HA Test Kernel (2.4.9-10smp) root (hd0,0) # This is the kernel's command line. kernel /vmlinuz-2.4.9-10smp ro root=/dev/hda2 nmi_watchdog=1 # end of grub.conf
On systems using lilo instead of grub, add nmi_watchdog=1 to the "append" section in /etc/lilo.conf. For example:
# # lilo.conf # prompt timeout=50 default=linux boot=/dev/hda map=/boot/map install=/boot/boot.b lba32 image=/boot/vmlinuz-2.4.9-10smp label=linux read-only root=/dev/hda2 append="nmi_watchdog=1" # end of lilo.conf
In order to determine if the server supports the NMI watchdog timer, first try adding "nmi_watchdog=1" to the kernel command line as described above. After the system has booted, log in as root and type:
The output should appear similar to the following:
CPU0 0: 5623100 XT-PIC timer 1: 13 XT-PIC keyboard 2: 0 XT-PIC cascade 7: 0 XT-PIC usb-ohci 8: 1 XT-PIC rtc 9: 794332 XT-PIC aic7xxx, aic7xxx 10: 569498 XT-PIC eth0 12: 24 XT-PIC PS/2 Mouse 14: 0 XT-PIC ide0 NMI: 5620998 LOC: 5623358 ERR: 0 MIS: 0
The relevant portion of the above output is to verify that the NMI id appears on the left side. If NMI value (in the middle column) is non-zero, the server supports the NMI watchdog.
If this approach fails, that is, NMI is zero, try passing nmi_watchdog=2 to the kernel instead of nmi_watchdog=1 in the manner described previously. Again, check /proc/interrupts after the system boots. If NMI is non-zero, the NMI watchdog has been configured properly. If NMI is zero, your system does not support the NMI watchdog timer.
The kernel provides driver support for various types of hardware watchdog timers. Some of these timers are implemented directly on the system board, whereas others are separate hardware components such as PCI cards. Hardware based watchdog timers provide excellent data integrity provisions in the cluster because they operate independently of the system processor and are therefore fully operational in rebooting a system in the event of a system hang.
Due to a lack of uniformity among low-level hardware watchdog components, it is difficult to make generalizations describing how to know if a particular system contains such components. Many low-level hardware watchdog components are not self-identifying.
The kernel provides support for the following hardware watchdog variants listed in Table A-2 :
Table A-2. Hardware Watchdog Timers
|Intel-810 based TCO WDT||i810-tco|
|Eurotech CPU-1220/1410 WDT||eurotech|
|60xx SBC WDT||sbc60xxwdt|
|Industrial Computer WDT500||wdt|
|Industrial Computer WDT501||wdt|
|Industrial Computer WDT500PCI||wdt_pci|
|Industrial Computer WDT501PCI||wdt_pci|
In order to configure any of the above watchdog timers into the kernel, it is necessary to place a corresponding entry into the /etc/modules.conf file. For example, if an Intel-810 based TCO WDT is to be used, the following line should be added to /etc/modules.conf:
alias watchdog i810-tco
The cluster software includes support for a range of power switch types. This range of power switch module support originated from developers at Mission Critical Linux, Inc. and as part of the open source Linux-HA project. Time and resource constraints did not allow for comprehensive testing of the complete range of switch types. As such, the associated power switch STONITH modules are considered latent features. Examples of these other power switch modules include:
APC Master Switch: http://www.apc.com
It has been observed that the Master Switch may become unresponsive when placed on networks which have high occurrences of broadcast or multi-cast packets. In these cases, isolate the power switch to a private subnet.
APC Serial On/Off Switch (partAP9211): http://www.apc.com
This switch type does not provide a means for the cluster to query its status. Therefore the cluster always assumes it is connected and operational.
It is possible to configure a cluster that does not include any power switch functionality. As described in the Section called Choosing the Type of Power Controller in Chapter 2, configuring a cluster without any power switch provisions is not recommended due to data corruption vulnerabilities under certain failover scenarios.
In order to setup a cluster that does not include any power switch provisions, simply select the type NONE when prompted for power switch type from the cluconfig utility.
Usage of power switch type NONE is discouraged because it does not protect data integrity in the event of system hang. If your cluster configuration does not include hardware power switches, then the software watchdog type is recommended.