Boot failure for fresh install on HP DL380 G5 with P400 Controller

Issues related to hardware problems
Post Reply
jlscott
Posts: 4
Joined: 2012/01/16 22:30:32
Location: Wellington, New Zealand

Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by jlscott » 2012/01/16 23:40:05

Hi,

I have two second-hand HP Proliant DL380 G5 servers equiped with Smart Array P400 raid controllers. Each is configured to support two 146GB SAS disks in a Raid 0+1 array.

I can successfully install Centos 6 (latest release) using a Netinstall initiated from the ISO image downloaded from a local mirror onto each server. I have been selecting a "basic Server" type of install with the "customise later" option. The disk partitioning layout is custom with a 1000MB boot partition, and the rest of the space allocated to a logical volume group, split into logical volumes for swap, root, /home, and /var.

The install completes successfully, but when it is time to reboot the server to run the new OS, Grub performs the first stage of the boot (displaying the message " will boot kernel xxx in x seconds"). However on completion of the countdown, after a brief pause an error message flashes onto the screen for a fraction of a second (far too quick to read) and then the system goes dead. Only a manual power cycle will get it back. I think the message flashed onto the screen is something like: "[boot error] firmware bug ...".

Both P400 controllers have been updated to version 2.08 as per the HP advisory [url=http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&objectID=c00864832]http://h20000.www2.hp.com/bizsupport/TechSupport/Document.jsp?lang=en&objectID=c00864832[/url] which is the latest version I can find.

Both servers had had their BIOS flashed to the latest version released by HP for these machines.

If I re-boot the CD and select the "recover" option, I can then browse and update the Centos 6 file system mounted on "/mnt/sysimage/" in the recovery shell.

Google searches have failed to throw much light on the problem.

Both of these servers will successfully intall and then boot into Centos v5.7 without problems. I have previoulsy installed Centos 6 on another DL380 G4 server without any issues. This model has an integrated Smart Array 6i controller.

I would appreciate any advice that will enable me to solve this problem. Perhaps there are custom drivers to use when installing Centos 6 on the DL380 G5 servers, or whether there are firmware updates that I should apply.

I should mention that I am using the x86_64 version of Centos in every case. Information about one of the servers (running Centos 5.7) is available from
[url=http://pastebin.centos.org/38270]http://pastebin.centos.org/38270[/url]

Thanks, James

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Boot failure for fresh install on HP DL380 G5 with P400 Cont

Post by pschaff » 2012/01/17 00:02:02

That controller is supported by the distro cciss driver. There are a lot of upstream bugs [url=https://bugzilla.redhat.com/buglist.cgi?query_format=advanced&bug_status=NEW&bug_status=ASSIGNED&bug_status=MODIFIED&bug_status=ON_DEV&bug_status=ON_QA&bug_status=VERIFIED&bug_status=RELEASE_PENDING&bug_status=POST&bug_status=CLOSED&longdesc=cciss&longdesc_type=allwordssubstr&product=Red%20Hat%20Enterprise%20Linux%206&classification=Red%20Hat]mentioning cciss[/url]. I'd suggest looking at those. I'd also try an install with all the defaults - no custom partitioning - just to eliminate that as an issue. 100MB is pretty small for /boot by today's standards. Might also want to wipe the partition tables with [b]dd[/b] at the start to be sure no signatures of earlier installs remain.

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by TrevorH » 2012/01/17 00:38:49

Firmware 2.08 is ancient. You can find version 7.22 [url=http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareDescription.jsp?lang=en&cc=us&prodTypeId=329290&prodSeriesId=1157687&prodNameId=1157689&swEnvOID=4004&swLang=8&mode=2&taskId=135&swItem=MTX-9a4b961a955841c6b974fcc0e6]here[/url] if my link works!

In addition you probably want to check that the DL380G5 BIOS is up to date as well (SP53328.exe). May also be worth removing the splashimage= and hiddenmenu lines from /boot/grub/grub.conf

jlscott
Posts: 4
Joined: 2012/01/16 22:30:32
Location: Wellington, New Zealand

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by jlscott » 2012/01/17 04:03:45

TrevorH: Thanks for the pointer. I had seen that firmware notice, but as the version number was so different form the ones in use, I was wary about assuming that it was for the same controller. I will try it along with your other suggestions. The bios is already up to date (I had applied SP53328.exe.

pschaff: I will try the suggestions you have made, if the firmware upgrade does not solve the issue. I cannot see any of the reported cciss faults you refer to as suggesting a couse of action for me that might shed some light on this problem.

The cciss driver in my Centos 5.7 system that is working perfectly on the DL380 G5 is 3.6.28-RH2, and the version on my Centos 6.0 system on a DL380 G4 is 3.6.26 (though there are some outstanding updates, including a new kernel) to be applied.

jlscott
Posts: 4
Joined: 2012/01/16 22:30:32
Location: Wellington, New Zealand

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by jlscott » 2012/01/20 02:50:02

I have now updated the firmware on the P400 controller to v7.22 as recommended (and also updated the firmware on the SAS drives to the latest release, HPDF) using a copy of Centos 5.7 installed on the server. I re-installed Centos 6 but still get the same problem.

I found that by editing the boot arguments before boot to remove the line specifying the initrd (which thereby caused a kernel panic and lockup), I was able to get the error message left on the screen. The error message is "[Firmware Bug]: the BIOS has corrupted hw-PMU resources (MSR 186 is 43003c)".

Google searches on this error message suggest that it may be associated with the change of driver from cciss to HPSA. Can anyone confirm whether the latest CentOS 6 kernel 2.6.32-220 uses cciss of HPSA drivers for the P400 raid array controller.

The full set of kernel boot arguments are: ro root=/dev/mapper/vg_a-lv_root rd_NO_LUKS LANG=en-US.UTF-8 rd_LVM_LV=vg_a/lv_swap rd_NO_MD rd_LVM_LV=lv_a/lv_root SYSFONT=latarcyrheb-sun16 rhgb crashkernel=auto quiet rd_LVM_LV=VolGroup00/LogVol01 KEYBOARDTYPE=pc KEYTABLE=us rd_NO_DM

I deleted the "quiet" argument, and the initrd line to bring the boot to a halt with messages still on the screen. The following final messages seem to be relevant to me:

VFS: Cannot open root device "mapper/vg_a-lv_root" or unknown-block(0,0)
Please append a correct "root=" boot option; here are the available partitions
Kernel Panic - not syncing: VFS unable to mount root fs on unknown-block(0,0)
PID: 1, comm: swapper Not tainted 2.6.32-220.el6.x86_64 #1
Call Trace:

This suggests that the driver that gives the "mapper" device is failing to load during the boot process. Any ideas how to fix this?

Thanks

(note that the original error message did not seem to appear in thei last boot, but maybe it had scrolled off the top of the screen and so is not the real issue).

r_hartman
Posts: 711
Joined: 2009/03/23 15:08:11
Location: Netherlands
Contact:

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by r_hartman » 2012/01/20 09:02:43

[url=http://h20000.www2.hp.com/bizsupport/TechSupport/SoftwareIndex.jsp?lang=en&cc=us&prodNameId=1157689&prodTypeId=329290&prodSeriesId=1157687&swLang=8&taskId=135&swEnvOID=4103#113214]This page[/url] suggests the P400 still uses the cciss driver.

I'd expect your errors now stem from removing the initrd line, as that will normally hold the drivers needed for booting the box.

It would be interesting to find what driver is used during installation, or what driver the LiveDVD would use, provided that does see the drives.

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by TrevorH » 2012/01/20 09:32:55

I checked one of my systems with a p400 and its pci id is 103c:3230 which is _not_ listed as supported by hpsa in kernel 2.6.32-220.2.1.el6.x86_64. It is listed by cciss.

I suspect this needs to be reported upstream as I found discussions on the lkml where they changed the code to die if this occurred and subsequently changed it back so that it attempted to continue in the event of this error. Without downloading the kernel source to check, I suspect that TUV have picked up the first change but not the second (yet).

jlscott
Posts: 4
Joined: 2012/01/16 22:30:32
Location: Wellington, New Zealand

Re: Boot failure for fresh install on HP DL380 G5 with P400 Controller

Post by jlscott » 2012/01/24 22:35:47

Thanks for all the information and suggestions.

I tried a Centos 6.2 Live CD on the server, but this would not boot either. It comes up the "ISOLINUX" line and then goes dead.

I booted the install CD and selected system recovery mode. After the file system had been mounted on /mnt/sysimage, a quick look at the system showed that this environment was using the cciss driver, though both the hpsa and cciss modules were loaded.

I think at this time, that I will stick with Centos 5 for these servers as it looks like it will take a while for the problem to be resolved.

Thanks again for your help and advice.

James :-)

Post Reply