Steps for Installing and Configuring the Red Hat Linux Distribution

After the setup of basic system hardware, proceed with installation of Red Hat Linux on both cluster systems and ensure that they recognize the connected devices. Follow these steps:

  1. Install the Red Hat Linux distribution on both cluster systems. If customizing the kernel, be sure to follow the kernel requirements and guidelines described in the Section called Kernel Requirements.

  2. Reboot the cluster systems.

  3. When using a terminal server, configure Linux to send console messages to the console port.

  4. Edit the /etc/hosts file on each cluster system and include the IP addresses used in the cluster. See the Section called Editing the /etc/hosts File for more information about performing this task.

  5. Decrease the alternate kernel boot timeout limit to reduce cluster system boot time. See the Section called Decreasing the Kernel Boot Timeout Limit for more information about performing this task.

  6. Ensure that no login (or getty) programs are associated with the serial ports that are being used for the serial heartbeat channel or the remote power switch connection (if applicable). To perform this task, edit the /etc/inittab file and use a pound symbol (#) to comment out the entries that correspond to the serial ports used for the serial channel and the remote power switch. Then, invoke the init q command.

  7. Verify that both systems detect all the installed hardware:

  8. Verify that the cluster systems can communicate over all the network interfaces by using the ping command to send test packets from one system to the other.

  9. If intending to configure Samba services, verify that the Samba related RPM packages are installed on your system.

Kernel Requirements

When manually configuring the kernel, adhere to the following are kernel requirements:

In addition, when installing the Linux distribution, it is strongly recommended to do the following:

Editing the /etc/hosts File

The /etc/hosts file contains the IP address-to-hostname translation table. The /etc/hosts file on each cluster system must contain entries for the following:

As an alternative to the /etc/hosts file, naming services such as DNS or NIS can be used to define the host names used by a cluster. However, to limit the number of dependencies and optimize availability, it is strongly recommended to use the /etc/hosts file to define IP addresses for cluster network interfaces.

The following is an example of an /etc/hosts file on a cluster system:

127.0.0.1         localhost.localdomain   localhost
193.186.1.81      cluster2.yourdomain.com      cluster2
10.0.0.1          ecluster2.yourdomain.com     ecluster2
193.186.1.82      cluster3.yourdomain.com      cluster3
10.0.0.2          ecluster3.yourdomain.com     ecluster3
193.186.1.83      clusteralias.yourdomain.com  clusteralias

The previous example shows the IP addresses and host names for two cluster systems (cluster2 and cluster3), and the private IP addresses and host names for the Ethernet interface used for the point-to-point heartbeat connection on each cluster system (ecluster2 and ecluster3) as well as the IP alias clusteralias used for remote cluster monitoring.

Verify correct formatting of the local host entry in the /etc/hosts file to ensure that it does not include non-local systems in the entry for the local host. An example of an incorrect local host entry that includes a non-local system (server1) is shown next:

127.0.0.1     localhost.localdomain     localhost server1

A heartbeat channel may not operate properly if the format is not correct. For example, the channel will erroneously appear to be offline. Check the /etc/hosts file and correct the file format by removing non-local systems from the local host entry, if necessary.

Note that each network adapter must be configured with the appropriate IP address and netmask.

The following is an example of a portion of the output from the /sbin/ifconfig command on a cluster system:

# ifconfig

eth0      Link encap:Ethernet  HWaddr 00:00:BC:11:76:93  
          inet addr:192.186.1.81  Bcast:192.186.1.245  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:65508254 errors:225 dropped:0 overruns:2 frame:0
          TX packets:40364135 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:19 Base address:0xfce0

eth1      Link encap:Ethernet  HWaddr 00:00:BC:11:76:92  
          inet addr:10.0.0.1  Bcast:10.0.0.245  Mask:255.255.255.0
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          Interrupt:18 Base address:0xfcc0

The previous example shows two network interfaces on a cluster system: The eth0 network interface for the cluster system and the eth1 (network interface for the point-to-point heartbeat connection).

Decreasing the Kernel Boot Timeout Limit

It is possible to reduce the boot time for a cluster system by decreasing the kernel boot timeout limit. During the Linux boot sequence, the bootloader allows for specifying an alternate kernel to boot. The default timeout limit for specifying a kernel is ten seconds.

To modify the kernel boot timeout limit for a cluster system, edit the /etc/lilo.conf file and specify the desired value (in tenths of a second) for the timeout parameter. The following example sets the timeout limit to three seconds:

timeout = 30

To apply any changes made to the /etc/lilo.conf file, invoke the /sbin/lilo command.

Similarly, when using the grub boot loader, the timeout parameter in /boot/grub/grub.conf should be modified to specify the appropriate number of seconds before timing out. To set this interval to 3 seconds, edit the parameter to the following:

timeout = 3

Displaying Console Startup Messages

Use the dmesg command to display the console startup messages. See the dmesg(8) manual page for more information.

The following example of the dmesg command output shows that a serial expansion card was recognized during startup:

May 22 14:02:10 storage3 kernel: Cyclades driver 2.3.2.5 2000/01/19 14:35:33
May 22 14:02:10 storage3 kernel: built May 8 2000 12:40:12
May 22 14:02:10 storage3 kernel: Cyclom-Y/PCI #1: 0xd0002000-0xd0005fff, IRQ9, 
   4 channels starting from port 0.

The following example of the dmesg command output shows that two external SCSI buses and nine disks were detected on the system (note that lines with forward slashes will be printed as one line on most screens):

May 22 14:02:10 storage3 kernel: scsi0 : Adaptec AHA274x/284x/294x \
	      (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 
May 22 14:02:10 storage3 kernel:         
May 22 14:02:10 storage3 kernel: scsi1 : Adaptec AHA274x/284x/294x \
              (EISA/VLB/PCI-Fast SCSI) 5.1.28/3.2.4 
May 22 14:02:10 storage3 kernel:         
May 22 14:02:10 storage3 kernel: scsi : 2 hosts. 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST39236LW         Rev: 0004 
May 22 14:02:11 storage3 kernel: Detected scsi disk sda at scsi0, channel 0, id 0, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdb at scsi1, channel 0, id 0, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdc at scsi1, channel 0, id 1, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdd at scsi1, channel 0, id 2, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sde at scsi1, channel 0, id 3, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdf at scsi1, channel 0, id 8, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdg at scsi1, channel 0, id 9, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdh at scsi1, channel 0, id 10, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: SEAGATE   Model: ST318203LC        Rev: 0001 
May 22 14:02:11 storage3 kernel: Detected scsi disk sdi at scsi1, channel 0, id 11, lun 0 
May 22 14:02:11 storage3 kernel:   Vendor: Dell      Model: 8 BAY U2W CU      Rev: 0205 
May 22 14:02:11 storage3 kernel:   Type:   Processor \
                          ANSI SCSI revision: 03 
May 22 14:02:11 storage3 kernel: scsi1 : channel 0 target 15 lun 1 request sense \
	      failed, performing reset. 
May 22 14:02:11 storage3 kernel: SCSI bus is being reset for host 1 channel 0. 
May 22 14:02:11 storage3 kernel: scsi : detected 9 SCSI disks total.

The following example of the dmesg command output shows that a quad Ethernet card was detected on the system:

May 22 14:02:11 storage3 kernel: 3c59x.c:v0.99H 11/17/98 Donald Becker
May 22 14:02:11 storage3 kernel: tulip.c:v0.91g-ppc 7/16/99 becker@cesdis.gsfc.nasa.gov 
May 22 14:02:11 storage3 kernel: eth0: Digital DS21140 Tulip rev 34 at 0x9800, \
	      00:00:BC:11:76:93, IRQ 5. 
May 22 14:02:12 storage3 kernel: eth1: Digital DS21140 Tulip rev 34 at 0x9400, \
	      00:00:BC:11:76:92, IRQ 9. 
May 22 14:02:12 storage3 kernel: eth2: Digital DS21140 Tulip rev 34 at 0x9000, \
	      00:00:BC:11:76:91, IRQ 11. 
May 22 14:02:12 storage3 kernel: eth3: Digital DS21140 Tulip rev 34 at 0x8800, \
	      00:00:BC:11:76:90, IRQ 10.

Displaying Devices Configured in the Kernel

To be sure that the installed devices, including serial and network interfaces, are configured in the kernel, use the cat /proc/devices command on each cluster system. Use this command to also determine if there is raw device support installed on the system. For example:

# cat /proc/devices
Character devices:
  1 mem
  2 pty
  3 ttyp
  4 ttyS
  5 cua
  7 vcs
 10 misc
 19 ttyC
 20 cub
128 ptm
136 pts
162 raw

Block devices:
  2 fd
  3 ide0
  8 sd
 65 sd
#

The previous example shows: