Chapter 2. Hardware Installation and Operating System Configuration

To set up the hardware configuration and install the Linux distribution, follow these steps:

After setting up the hardware configuration and installing the Linux distribution, installing the cluster software is possible.

Choosing a Hardware Configuration

The Red Hat Cluster Manager allows administrators to use commodity hardware to set up a cluster configuration that will meet the performance, availability, and data integrity needs of applications and users. Cluster hardware ranges from low-cost minimum configurations that include only the components required for cluster operation, to high-end configurations that include redundant heartbeat channels, hardware RAID, and power switches.

Regardless of configuration, the use of high-quality hardware in a cluster is recommended, as hardware malfunction is the primary cause of system down time.

Although all cluster configurations provide availability, some configurations protect against every single point of failure. In addition, all cluster configurations provide data integrity, but some configurations protect data under every failure condition. Therefore, administrators must fully understand the needs of their computing environment and also the availability and data integrity features of different hardware configurations in order to choose the cluster hardware that will meet the proper requirements.

When choosing a cluster hardware configuration, consider the following:

Performance requirements of applications and users

Choose a hardware configuration that will provide adequate memory, CPU, and I/O resources. Be sure that the configuration chosen will be able to handle any future increases in workload, as well.

Cost restrictions

The hardware configuration chosen must meet budget requirements. For example, systems with multiple I/O ports usually cost more than low-end systems with less expansion capabilities.

Availability requirements

If a computing environment requires the highest degree of availability, such as a production environment, then a cluster hardware configuration that protects against all single points of failure, including disk, storage interconnect, heartbeat channel, and power failures is recommended. Environments that can tolerate an interruption in availability, such as development environments, may not require as much protection. See the Section called Configuring Heartbeat Channels, the Section called Configuring UPS Systems, and the Section called Configuring Shared Disk Storage for more information about using redundant hardware for high availability.

Data integrity under all failure conditions requirement

Using power switches in a cluster configuration guarantees that service data is protected under every failure condition. These devices enable a cluster system to power cycle the other cluster system before restarting its services during failover. Power switches protect against data corruption if an unresponsive (or hanging) system becomes responsive after its services have failed over, and then issues I/O to a disk that is also receiving I/O from the other cluster system.

In addition, if a quorum daemon fails on a cluster system, the system is no longer able to monitor the quorum partitions. If you are not using power switches in the cluster, this error condition may result in services being run on more than one cluster system, which can cause data corruption. See the Section called Configuring Power Switches for more information about the benefits of using power switches in a cluster. It is recommended that production environments use power switches or watchdog timers in the cluster configuration.

Shared Storage Requirements

The operation of the cluster depends on reliable, coordinated access to shared storage. In the event of hardware failure, it is desirable to be able to disconnect one member from the shared storage for repair without disrupting the other member. Shared storage is truly vital to the cluster configuration.

Testing has shown that it is difficult, if not impossible, to configure reliable multi-initiator parallel SCSI configurations at data rates above 80 MBytes/sec. using standard SCSI adapters. Further tests have shown that these configurations can not support online repair because the bus does not work reliably when the HBA terminators are disabled, and external terminators are used. For these reasons, multi-initiator SCSI configurations using standard adapters are not supported. Single-initiator parallel SCSI buses, connected to multi-ported storage devices, or Fibre Channel, are required.

The Red Hat Cluster Manager requires that both cluster members have simultaneous access to the shared storage. Certain host RAID adapters are capable of providing this type of access to shared RAID units. These products require extensive testing to ensure reliable operation, especially if the shared RAID units are based on parallel SCSI buses. These products typically do not allow for online repair of a failed system. No host RAID adapters are currently certified with Red Hat Cluster Manager. Refer to the Red Hat web site at http://www.redhat.com for the most up-to-date supported hardware matrix.

The use of software RAID, or software Logical Volume Management (LVM), is not supported on shared storage. This is because these products do not coordinate access from multiple hosts to shared storage. Software RAID or LVM may be used on non-shared storage on cluster members (for example, boot and system partitions and other filesysytems which are not associated with any cluster services).

Minimum Hardware Requirements

A minimum hardware configuration includes only the hardware components that are required for cluster operation, as follows:

  • Two servers to run cluster services

  • Ethernet connection for a heartbeat channel and client network access

  • Shared disk storage for the cluster quorum partitions and service data.

See the Section called Example of a Minimum Cluster Configuration for an example of this type of hardware configuration.

The minimum hardware configuration is the most cost-effective cluster configuration; however, it includes multiple points of failure. For example, if the RAID controller fails, then all cluster services will be unavailable. When deploying the minimal hardware configuration, software watchdog timers should be configured as a data integrity provision.

To improve availability, protect against component failure, and guarantee data integrity under all failure conditions, the minimum configuration can be expanded. Table 2-1 shows how to improve availability and guarantee data integrity:

Table 2-1. Improving Availability and Guaranteeing Data Integrity

ProblemSolution
Disk failureHardware RAID to replicate data across multiple disks.
RAID controller failureDual RAID controllers to provide redundant access to disk data.
Heartbeat channel failurePoint-to-point Ethernet or serial connection between the cluster systems.
Power source failureRedundant uninterruptible power supply (UPS) systems.
Data corruption under all failure conditionsPower switches or hardware-based watchdog timers

A no-single-point-of-failure hardware configuration that guarantees data integrity under all failure conditions can include the following components:

  • Two servers to run cluster services

  • Ethernet connection between each system for a heartbeat channel and client network access

  • Dual-controller RAID array to replicate quorum partitions and service data

  • Two power switches to enable each cluster system to power-cycle the other system during the failover process

  • Point-to-point Ethernet connection between the cluster systems for a redundant Ethernet heartbeat channel

  • Point-to-point serial connection between the cluster systems for a serial heartbeat channel

  • Two UPS systems for a highly-available source of power

See the Section called Example of a No-Single-Point-Of-Failure Configuration for an example of this type of hardware configuration.

Cluster hardware configurations can also include other optional hardware components that are common in a computing environment. For example, a cluster can include a network switch or network hub, which enables the connection of the cluster systems to a network. A cluster may also include a console switch, which facilitates the management of multiple systems and eliminates the need for separate monitors, mouses, and keyboards for each cluster system.

One type of console switch is a terminal server, which enables connection to serial consoles and management of many systems from one remote location. As a low-cost alternative, you can use a KVM (keyboard, video, and mouse) switch, which enables multiple systems to share one keyboard, monitor, and mouse. A KVM is suitable for configurations in which access to a graphical user interface (GUI) to perform system management tasks is preferred.

When choosing a cluster system, be sure that it provides the PCI slots, network slots, and serial ports that the hardware configuration requires. For example, a no-single-point-of-failure configuration requires multiple serial and Ethernet ports. Ideally, choose cluster systems that have at least two serial ports. See the Section called Installing the Basic System Hardware for more information.

Choosing the Type of Power Controller

The Red Hat Cluster Manager implementation consists of a generic power management layer and a set of device specific modules which accommodate a range of power management types. When selecting the appropriate type of power controller to deploy in the cluster, it is important to recognize the implications of specific device types. The following describes the types of supported power switches followed by a summary table. For a more detailed description of the role a power switch plays to ensure data integrity, refer to the Section called Configuring Power Switches.

Serial- and Network-attached power switches are separate devices which enable one cluster member to power cycle another member. They resemble a power plug strip on which individual outlets can be turned on and off under software control through either a serial or network cable.

Watchdog timers provide a means for failed systems to remove themselves from the cluster prior to another system taking over its services, rather than allowing one cluster member to power cycle another. The normal operational mode for watchdog timers is that the cluster software must periodically reset a timer prior to its expiration. If the cluster software fails to reset the timer, the watchdog will trigger under the assumption that the system may have hung or otherwise failed. The healthy cluster member allows a window of time to pass prior to concluding that another cluster member has failed (by default, this window is 12 seconds). The watchdog timer interval must be less than the duration of time for one cluster member to conclude that another has failed. In this manner, a healthy system can assume that prior to taking over services for a failed cluster member, that it has safely removed itself from the cluster (by rebooting) and therefore is no risk to data integrity. The underlying watchdog support is included in the core Linux kernel. Red Hat Cluster Manager utilizes these watchdog features via its standard APIs and configuration mechanism.

There are two types of watchdog timers: Hardware-based and software-based. Hardware-based watchdog timers typically consist of system board components such as the Intel® i810 TCO chipset. This circuitry has a high degree of independence from the main system CPU. This independence is beneficial in failure scenarios of a true system hang, as in this case it will pull down the system's reset lead resulting in a system reboot. There are some PCI expansion cards that provide watchdog features.

The second type of watchdog timer is software-based. This category of watchdog does not have any dedicated hardware. The implementation is a kernel thread which is periodically run and if the timer duration has expired will initiate a system reboot. The vulnerability of the software watchdog timer is that under certain failure scenarios such as system hangs while interrupts are blocked, the kernel thread will not be called. As a result, in such conditions it can not be definitively depended on for data integrity. This can cause the healthy cluster member to take over services for a hung node which could cause data corruption under certain scenarios.

Finally, administrators can choose not to employ a power controller at all. If choosing the "None" type, note that there are no provisions for a cluster member to power cycle a failed member. Similarly, the failed member can not be guaranteed to reboot itself under all failure conditions. Deploying clusters with a power controller type of "None" is useful for simple evaluation purposes, but because it affords the weakest data integrity provisions, it is not recommended for usage in a production environment.

Ultimately, the right type of power controller deployed in a cluster environment depends on the data integrity requirements weighed against the cost and availability of external power switches.

Table 2-2 summarizes the types of supported power management modules and discusses their advantages and disadvantages individually.

Table 2-2. Power Switches

TypeNotesProsCons
Serial-attached power switchesTwo serial attached power controllers are used in a cluster (one per member system)Affords strong data integrity guarantees. the power controller itself is not a single point of failure as there are two in a cluster.Requires purchase of power controller hardware and cables; consumes serial ports
Network-attached power switchesA single network attached power controller is required per clusterAffords strong data integrity guarantees.Requires purchase of power controller hardware. The power controller itself can be come a single point of failure (although they are typically very reliable devices).
Hardware Watchdog TimerAffords strong data integrity guaranteesObviates the need to purchase external power controller hardwareNot all systems include supported watchdog hardware
Software Watchdog TimerOffers acceptable data integrity provisionsObviates the need to purchase external power controller hardware; works on any systemUnder some failure scenarios, the software watchdog will not be operational, opening a small vulnerability window
No power controllerNo power controller function is in useObviates the need to purchase external power controller hardware; works on any systemVulnerable to data corruption under certain failure scenarios

Cluster Hardware Tables

Use the following tables to identify the hardware components required for your cluster configuration. In some cases, the tables list specific products that have been tested in a cluster, although a cluster is expected to work with other products.

The complete set of qualified cluster hardware components change over time. Consequently, the table below may be incomplete. For the most up-do-date itemization of supported hardware components, refer to the Red Hat documentation website at http://www.redhat.com/docs.

Table 2-3. Cluster System Hardware Table

HardwareQuantityDescriptionRequired
Cluster systemTwoRed Hat Cluster Manager supports IA-32 hardware platforms. Each cluster system must provide enough PCI slots, network slots, and serial ports for the cluster hardware configuration. Because disk devices must have the same name on each cluster system, it is recommended that the systems have symmetric I/O subsystems. In addition, it is recommended that each system have a minimum of 450 MHz CPU speed and 256 MB of memory. See the Section called Installing the Basic System Hardware for more information.Yes

Table 2-4 includes several different types of power switches. A single cluster requires only one type of power switch shown below.

Table 2-4. Power Switch Hardware Table

HardwareQuantityDescriptionRequired
Serial power switchesTwo

Power switches enable each cluster system to power-cycle the other cluster system. See the Section called Configuring Power Switches for information about using power switches in a cluster. Note that clusters are configured with either serial or network attached power switches and not both.
The following serial attached power switch has been fully tested:
RPS-10 (model M/HD in the US, and model M/EC in Europe), which is available from http://www.wti.com/rps-10.htm. Refer to the Section called Setting up RPS-10 Power Switches in Appendix A
Latent support is provided for the following serial attached power switch. This switch has not yet been fully tested:
APC Serial On/Off Switch (partAP9211), http://www.apc.com

Strongly recommended for data integrity under all failure conditions
Null modem cableTwoNull modem cables connect a serial port on a cluster system to a serial power switch. This serial connection enables each cluster system to power-cycle the other system. Some power switches may require different cables.Only if using serial power switches
Mounting bracketOneSome power switches support rack mount configurations and require a separate mounting bracket (e.g. RPS-10).Only for rack mounting power switches
Network power switchOne

Network attached power switches enable each cluster member to power cycle all others. Refer to the Section called Configuring Power Switches for information about using network attached power switches, as well as caveats associated with each.
The following network attached power switch has been fully tested:
· WTI NPS-115, or NPS-230, available from http://www.wti.com. Note that the NPS power switch can properly accommodate systems with dual redundant power supplies. Refer to the Section called Setting up WTI NPS Power Switches in Appendix A.
· Baytech RPC-3 and RPC-5, http://www.baytech.net
Latent support is provided for the APC Master Switch (AP9211, or AP9212), www.apc.com

Strongly recommended for data integrity under all failure conditions
Watchdog TimerTwo

Watchdog timers cause a failed cluster member to remove itself from a cluster prior to a healthy member taking over its services.
Refer to the Section called Configuring Power Switches for more information

Recommended for data integrity on systems which provide integrated watchdog hardware

The following table shows a variety of storage devices for an administrator to choose from. An individual cluster does not require all of the components listed below.

Table 2-5. Shared Disk Storage Hardware Table

HardwareQuantityDescriptionRequired
External disk storage enclosureOne

Use Fibre Channel or single-initiator parallel SCSI to connect the cluster systems to a single or dual-controller RAID array. To use single-initiator buses, a RAID controller must have multiple host ports and provide simultaneous access to all the logical units on the host ports. To use a dual-controller RAID array, a logical unit must fail over from one controller to the other in a way that is transparent to the operating system.
The following are recommended SCSI RAID arrays that provide simultaneous access to all the logical units on the host ports (this is not a comprehensive list; rather its limited to those RAID boxes which have been tested):
· Winchester Systems FlashDisk RAID Disk Array, which is available from http://www.winsys.com.
· Dot Hill's SANnet Storage Systems, which is available from http://www.dothill.com
· Silicon Image CRD-7040 & CRA-7040, CRD -7220, CRD-7240 & CRA-7240, CRD-7400 & CRA-7400 controller based RAID arrays. Available from http://www.synetexinc.com
In order to ensure symmetry of device IDs and LUNs, many RAID arrays with dual redundant controllers are required to be configured in an active/passive mode.
See the Section called Configuring Shared Disk Storage for more information.

Yes
Host bus adapterTwo

To connect to shared disk storage, you must install either a parallel SCSI or a Fibre Channel host bus adapter in a PCI slot in each cluster system.
For parallel SCSI, use a low voltage differential (LVD) host bus adapter. Adapters have either HD68 or VHDCI connectors. Recommended parallel SCSI host bus adapters include the following:
· Adaptec 2940U2W, 29160, 29160LP, 39160, and 3950U2
· Adaptec AIC-7896 on the Intel L440GX+ motherboard
· Qlogic QLA1080 and QLA12160
· Tekram Ultra2 DC-390U2W
· LSI Logic SYM22915
· A recommended Fibre Channel host bus adapter is the Qlogic QLA2200.
See the Section called Host Bus Adapter Features and Configuration Requirements in Appendix A for device features and configuration information.
Host-bus adapter based RAID cards are only supported if they correctly support multi-host operation. At the time of publication, there were no fully tested host-bus adapter based RAID cards. Refer to http://www.redhat.com for more the latest hardware information.

Yes
SCSI cableTwoSCSI cables with 68 pins connect each host bus adapter to a storage enclosure port. Cables have either HD68 or VHDCI connectors. Cables vary based on adapter typeOnly for parallel SCSI configurations
SCSI terminatorTwoFor a RAID storage enclosure that uses "out" ports (such as FlashDisk RAID Disk Array) and is connected to single-initiator SCSI buses, connect terminators to the "out" ports in order to terminate the buses.Only for parallel SCSI configurations and only if necessary for termination
Fibre Channel hub or switchOne or twoA Fibre Channel hub or switch is required.Only for some Fibre Channel configurations
Fibre Channel cableTwo to sixA Fibre Channel cable connects a host bus adapter to a storage enclosure port, a Fibre Channel hub, or a Fibre Channel switch. If a hub or switch is used, additional cables are needed to connect the hub or switch to the storage adapter ports.Only for Fibre Channel configurations

Table 2-6. Network Hardware Table

HardwareQuantityDescriptionRequired
Network interfaceOne for each network connectionEach network connection requires a network interface installed in a cluster system.Yes
Network switch or hubOneA network switch or hub allows connection of multiple systems to a network.No
Network cableOne for each network interfaceA conventional network cable, such as a cable with an RJ45 connector, connects each network interface to a network switch or a network hub.Yes

Table 2-7. Point-To-Point Ethernet Heartbeat Channel Hardware Table

HardwareQuantityDescriptionRequired
Network interfaceTwo for each channelEach Ethernet heartbeat channel requires a network interface installed in both cluster systems.No
Network crossover cableOne for each channelA network crossover cable connects a network interface on one cluster system to a network interface on the other cluster system, creating an Ethernet heartbeat channel.Only for a redundant Ethernet heartbeat channel

Table 2-8. Point-To-Point Serial Heartbeat Channel Hardware Table

HardwareQuantityDescriptionRequired
Serial cardTwo for each serial channel

Each serial heartbeat channel requires a serial port on both cluster systems. To expand your serial port capacity, you can use multi-port serial PCI cards. Recommended multi-port cards include the following:
Vision Systems VScom 200H PCI card, which provides two serial ports, is available from http://www.vscom.de
Cyclades-4YoPCI+ card, which provides four serial ports, is available from http://www.cyclades.com.
Note that since configuration of serial heartbeat channels is optional, it is not required to invest in additional hardware specifically for this purpose. Should future support be provided for more than 2 cluster members, serial heartbeat channel support may be deprecated.

No
Null modem cableOne for each channelA null modem cable connects a serial port on one cluster system to a corresponding serial port on the other cluster system, creating a serial heartbeat channel.Only for serial heartbeat channel

Table 2-9. Console Switch Hardware Table

HardwareQuantityDescriptionRequired
Terminal serverOne

A terminal server enables you to manage many systems from one remote location.

No
KVMOneA KVM enables multiple systems to share one keyboard, monitor, and mouse. Cables for connecting systems to the switch depend on the type of KVM.No

Table 2-10. UPS System Hardware Table

HardwareQuantityDescriptionRequired
UPS systemOne or two

Uninterruptible power supply (UPS) systems protect against downtime if a power outage occurs. UPS systems are highly recommended for cluster operation. Ideally, connect the power cables for the shared storage enclosure and both power switches to redundant UPS systems. In addition, a UPS system must be able to provide voltage for an adequate period of time, and should be connected to its own power circuit.
A recommended UPS system is the APC Smart-UPS 1400 Rackmount available from http://www.apc.com.

Strongly recommended for availability

Example of a Minimum Cluster Configuration

The hardware components described in Table 2-11 can be used to set up a minimum cluster configuration. This configuration does not guarantee data integrity under all failure conditions, because it does not include power switches. Note that this is a sample configuration; it is possible to set up a minimum configuration using other hardware.

Table 2-11. Minimum Cluster Hardware Configuration Components

HardwareQuantity
Two servers

Each cluster system includes the following hardware:
Network interface for client access and an Ethernet heartbeat channel
One Adaptec 29160 SCSI adapter (termination disabled) for the shared storage connection

Two network cables with RJ45 connectorsNetwork cables connect a network interface on each cluster system to the network for client access and Ethernet heartbeats.
RAID storage enclosureThe RAID storage enclosure contains one controller with at least two host ports.
Two HD68 SCSI cablesEach cable connects one HBA to one port on the RAID controller, creating two single-initiator SCSI buses.

Example of a No-Single-Point-Of-Failure Configuration

The components described in Table 2-12 can be used to set up a no-single-point-of-failure cluster configuration that includes two single-initiator SCSI buses and power switches to guarantee data integrity under all failure conditions. Note that this is a sample configuration; it is possible to set up a no-single-point-of-failure configuration using other hardware.

Table 2-12. No-Single-Point-Of-Failure Configuration Components

HardwareQuantity
Two servers

Each cluster system includes the following hardware:
Two network interfaces for:
Point-to-point Ethernet heartbeat channel
Client network access and Ethernet heartbeat connection
Three serial ports for:
Point-to-point serial heartbeat channel
Remote power switch connection
Connection to the terminal server
One Tekram Ultra2 DC-390U2W adapter (termination enabled) for the shared disk storage connection

One network switchA network switch enables the connection of multiple systems to a network.
One Cyclades terminal serverA terminal server allows for management of remote systems from a central location. (A terminal server is not required for cluster operation.)
Three network cablesNetwork cables connect the terminal server and a network interface on each cluster system to the network switch.
Two RJ45 to DB9 crossover cablesRJ45 to DB9 crossover cables connect a serial port on each cluster system to the Cyclades terminal server.
One network crossover cableA network crossover cable connects a network interface on one cluster system to a network interface on the other system, creating a point-to-point Ethernet heartbeat channel.
Two RPS-10 power switchesPower switches enable each cluster system to power-cycle the other system before restarting its services. The power cable for each cluster system is connected to its own power switch.
Three null modem cables

Null modem cables connect a serial port on each cluster system to the power switch that provides power to the other cluster system. This connection enables each cluster system to power-cycle the other system.
A null modem cable connects a serial port on one cluster system to a corresponding serial port on the other system, creating a point-to-point serial heartbeat channel.

FlashDisk RAID Disk Array with dual controllersDual RAID controllers protect against disk and controller failure. The RAID controllers provide simultaneous access to all the logical units on the host ports.
Two HD68 SCSI cablesHD68 cables connect each host bus adapter to a RAID enclosure "in" port, creating two single-initiator SCSI buses.
Two terminatorsTerminators connected to each "out" port on the RAID enclosure terminate both single-initiator SCSI buses.
Redundant UPS SystemsUPS systems provide a highly-available source of power. The power cables for the power switches and the RAID enclosure are connected to two UPS systems.

Figure 2-1 shows an example of a no-single-point-of-failure hardware configuration that includes the hardware described in the previous table, two single-initiator SCSI buses, and power switches to guarantee data integrity under all error conditions. A "T" enclosed in a circle represents a SCSI terminator.

Figure 2-1. No-Single-Point-Of-Failure Configuration Example