software RAID-1 data recovery

General support questions including new installations
markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/06 21:30:03

sorry. The full ./getinfo is:



Information for general problems.[code]
== BEGIN uname -rmi ==
2.6.9-78.ELsmp x86_64 x86_64
== END uname -rmi ==

== BEGIN rpm -qa \*-release\* ==
centos-release-4-4.2
rpmforge-release-0.3.6-1.el4.rf
== END rpm -qa \*-release\* ==

== BEGIN cat /etc/redhat-release ==
CentOS release 4.4 (Final)
== END cat /etc/redhat-release ==

== BEGIN getenforce ==
Disabled
== END getenforce ==

== BEGIN free -m ==
total used free shared buffers cached
Mem: 4903 3093 1809 0 114 1864
-/+ buffers/cache: 1114 3788
Swap: 996 521 474
== END free -m ==

== BEGIN rpm -qa yum\* rpm-\* python | sort ==
python-2.3.4-14.2
rpm-build-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-python-4.3.3-18_nonptl
yum-2.0.7-1
== END rpm -qa yum\* rpm-\* python | sort ==

== BEGIN ls /etc/yum.repos.d ==
CentOS-Base.repo
CentOS-Media.repo
mirrors-rpmforge
rpmforge.repo
== END ls /etc/yum.repos.d ==

== BEGIN cat /etc/yum.conf ==
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
pkgpolicy=newest
distroverpkg=redhat-release
tolerant=1
exactarch=1

[base]
name=Red Hat Linux $releasever - $basearch - Base
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/


[updates]
name=Red Hat Linux $releasever - Updates
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/


== END cat /etc/yum.conf ==

== BEGIN yum repolist all ==

Usage: yum [options] <update | upgrade | install | info | remove | list |
clean | provides | search | check-update | groupinstall | groupupdate |
grouplist >

Options:
-c [config file] - specify the config file to use
-e [error level] - set the error logging level
-d [debug level] - set the debugging level
-y answer yes to all questions
-t be tolerant about errors in package commands
-R [time in minutes] - set the max amount of time to randomly run in.
-C run from cache only - do not update the cache
--installroot=[path] - set the install root (default '/')
--version - output the version of yum
--exclude=some_pkg_name - packagename to exclude - you can use
this more than once
--download-only - only download packages - do not run the transaction
-h, --help this screen

== END yum repolist all ==

== BEGIN egrep 'include|exclude' /etc/yum.repos.d/*.repo ==
== END egrep 'include|exclude' /etc/yum.repos.d/*.repo ==

== BEGIN sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==
== END sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==

== BEGIN cat /etc/fstab ==
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/ / ext3 defaults 1 1
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs noexec,nosuid 0 0
none /proc proc defaults 0 0
LABEL=/state/partition /state/partition1 ext3 nosuid,defaults 1 2
none /sys sysfs defaults 0 0
LABEL=/var /var ext3 defaults 1 2
/dev/sda3 swap swap defaults 0 0
192.***.*.250:/export/home /home nfs nosuid,intr,rsize=32768,wsize=32768,noatime 0 0
192.***.*.250:/compute /compute nfs rsize=32768,wsize=32768,noatime,nosuid,nodev,intr 0 0
/dev/hdc /media/cdrom auto pamconsole,exec,noauto,managed 0 0
/dev/fd0 /media/floppy auto pamconsole,exec,noauto,managed 0 0
== END cat /etc/fstab ==

== BEGIN df -h ==
Filesystem Size Used Avail Use% Mounted on
/dev/sda1 7.7G 3.4G 4.0G 46% /
none 2.4G 0 2.4G 0% /dev/shm
/dev/sda5 55G 84M 53G 1% /state/partition1
/dev/sda2 3.9G 2.7G 1012M 73% /var
192.***.*.250:/export/home
258G 242G 2.8G 99% /home
192.***.*.250:/compute
1.4T 1.1T 281G 79% /compute
== END df -h ==

== BEGIN blkid ==
/dev/sda1: LABEL="/" UUID="ff0a5c95-1ba8-4b17-9821-0c09fc97f43e" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda2: LABEL="/var" UUID="2a2c2be9-0a4e-41cc-9f04-1fd72a935b93" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda3: TYPE="swap"
/dev/sda5: LABEL="/state/partition" UUID="2864d45c-622b-4eae-8396-ce36d0c7c57e" SEC_TYPE="ext3" TYPE="ext2"
== END blkid ==

== BEGIN cat /proc/mdstat ==
Personalities :
unused devices: <none>
== END cat /proc/mdstat ==

== BEGIN rpm -qa kernel\* | sort ==
kernel-2.6.9-42.0.2.EL
kernel-devel-2.6.9-42.0.2.EL
kernel-doc-2.6.9-42.0.2.EL
kernel-ib-1.0-1
kernel-smp-2.6.9-42.0.2.EL
kernel-smp-2.6.9-78.EL
kernel-smp-devel-2.6.9-42.0.2.EL
kernel-smp-devel-2.6.9-78.EL
kernel-utils-2.4-14.1.117
== END rpm -qa kernel\* | sort ==

== BEGIN lspci -nn ==
00:06.0 Class 0604: 1022:7460 (rev 07)
00:07.0 Class 0601: 1022:7468 (rev 05)
00:07.1 Class 0101: 1022:7469 (rev 03)
00:07.3 Class 0680: 1022:746b (rev 05)
00:0a.0 Class 0604: 1022:7450 (rev 12)
00:0a.1 Class 0800: 1022:7451 (rev 01)
00:0b.0 Class 0604: 1022:7450 (rev 12)
00:0b.1 Class 0800: 1022:7451 (rev 01)
00:18.0 Class 0600: 1022:1100
00:18.1 Class 0600: 1022:1101
00:18.2 Class 0600: 1022:1102
00:18.3 Class 0600: 1022:1103
00:19.0 Class 0600: 1022:1100
00:19.1 Class 0600: 1022:1101
00:19.2 Class 0600: 1022:1102
00:19.3 Class 0600: 1022:1103
01:00.0 Class 0c03: 1022:7464 (rev 0b)
01:00.1 Class 0c03: 1022:7464 (rev 0b)
01:05.0 Class 0300: 1023:9880 (rev 3a)
02:02.0 Class 0200: 14e4:1648 (rev 03)
02:02.1 Class 0200: 14e4:1648 (rev 03)
02:04.0 Class 0100: 1000:0030 (rev 08)
== END lspci -nn ==

== BEGIN lsusb ==
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000
== END lsusb ==

== BEGIN rpm -qa kmod\* kmdl\* ==
== END rpm -qa kmod\* kmdl\* ==

== BEGIN ifconfig -a ==
eth0 Link encap:Ethernet HWaddr 00:09:3D:12:1F:A3
inet addr:192.***.*.25 Bcast:192.***.*.255 Mask:255.255.255.0
inet6 addr: fe***********2:1fa3/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:1021314388 errors:0 dropped:0 overruns:0 frame:0
TX packets:386552887 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:653012841839 (608.1 GiB) TX bytes:242005503401 (225.3 GiB)
Interrupt:185

eth1 Link encap:Ethernet HWaddr 00:09:3D:12:1F:A4
inet addr:10******3 Bcast: *******55 Mask:255.255.255.0
inet6 addr: fe****************4 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:125196523 errors:0 dropped:0 overruns:0 frame:0
TX packets:158909688 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:37651640392 (35.0 GiB) TX bytes:166544974196 (155.1 GiB)
Interrupt:193

lo Link encap:Local Loopback
inet addr:1******.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:2834894 errors:0 dropped:0 overruns:0 frame:0
TX packets:2834894 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:579591118 (552.7 MiB) TX bytes:579591118 (552.7 MiB)

sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:1******1 P-t-P:1*******2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:83128653 errors:0 dropped:0 overruns:0 frame:0
TX packets:132347077 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:26625080333 (24.7 GiB) TX bytes:152690283974 (142.2 GiB)

== END ifconfig -a ==



== BEGIN brctl show ==
./getinfo.sh: line 87: brctl: command not found
== END brctl show ==

== BEGIN route -n ==
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
255.255.255.255 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
10.*.*.* 0.0.0.0 255.255.255.255 UH 0 0 0 tun0
10.**.*.200 192.168.1.250 255.255.255.255 UGH 0 0 0 eth0
10.****.0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
10****.0 10.8.0.2 255.255.255.0 UG 0 0 0 tun0
192******.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
16******0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
2*****.0 0.0.0.0 240.0.0.0 U 0 0 0 eth0
0.0.0.0 10.10.3.254 0.0.0.0 UG 0 0 0 eth1
== END route -n ==

== BEGIN cat /etc/resolv.conf ==
#
# Do NOT Edit (generated by dbreport)
#
# Private side resolver configuration file
#
nameserver 19*******250
#search local bgsu.edu

== END cat /etc/resolv.conf ==

== BEGIN grep net /etc/nsswitch.conf ==
#networks: ldap [NOTFOUND=return] files
netmasks: files
networks: files
netgroup: files
== END grep net /etc/nsswitch.conf ==

== BEGIN chkconfig --list | grep -Ei 'network|wpa' ==
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
== END chkconfig --list | grep -Ei 'network|wpa' ==


[/code]




showmount -a gives:




All mount points on lcpp-cluster.bgsu.edu:
192.168.1.0/255.255.255.0:/compute
192.168.1.0/255.255.255.0:/diskless/x64/RHEL4-AS/root
192.168.1.0/255.255.255.0:/export
192.168.1.10:192.168.1.0/255.255.255.0
192.168.1.11:192.168.1.0/255.255.255.0
192.168.1.12:192.168.1.0/255.255.255.0
192.168.1.13:192.168.1.0/255.255.255.0
192.168.1.14:192.168.1.0/255.255.255.0
192.168.1.15:192.168.1.0/255.255.255.0
192.168.1.16:192.168.1.0/255.255.255.0
192.168.1.17:192.168.1.0/255.255.255.0
192.168.1.18:192.168.1.0/255.255.255.0
192.168.1.19:192.168.1.0/255.255.255.0
192.168.1.1:192.168.1.0/255.255.255.0
192.168.1.20:192.168.1.0/255.255.255.0
192.168.1.21:192.168.1.0/255.255.255.0
192.168.1.22:192.168.1.0/255.255.255.0
192.168.1.23:192.168.1.0/255.255.255.0
192.168.1.24:192.168.1.0/255.255.255.0
192.168.1.251:192.168.1.0/255.255.255.0
192.168.1.25:192.168.1.0/255.255.255.0
192.168.1.2:192.168.1.0/255.255.255.0
192.168.1.3:192.168.1.0/255.255.255.0
192.168.1.4:192.168.1.0/255.255.255.0
192.168.1.5:192.168.1.0/255.255.255.0
192.168.1.6:192.168.1.0/255.255.255.0
192.168.1.7:192.168.1.0/255.255.255.0
192.168.1.8:192.168.1.0/255.255.255.0
192.168.1.9:192.168.1.0/255.255.255.0

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Post by pschaff » 2012/02/06 23:11:59

You [b]still[/b] need to update.

The script output shows that the filesystem in question is mounted via NFS, which in retrospect I/we should have seen earlier. You need to work with the RAID recovery on the system to which it is physically connected, not on the NFS client. Remove the filesystem[s] on the RAID from /etc/exports on the [b]server[/b] (assuming it is also CentOS), make sure all clients unmount it, and do "exportfs -r". The requested script information should have been provided for the server and not the client. You also need to run as [url=http://wiki.centos.org/TipsAndTricks/BecomingRoot]root[/url] to get fdisk results.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/10 01:56:44

Sorry. Our cluster consists of 25 nodes, and if we update, we should update all for consistency, so it is probably not feasible at this time.

About the ./getinfo script: the LAST time I sent output, I was logged in remotely and the data must have been for the log in node. Almost all previous data was obtained when I physically plugged in to what I believe is the head node. So I am a little confused about whether ALL the info I have given was not from the server, or if it was just the last post.
I realized this possibility after I visited the computer today. Here is what it gave me today, when I was physically logged into what I believe is the head node (I am sorry if this data still isn't from the right node, if it isn't please let me know and I will be sure to log into the right node next time.)

./getinfo:

Information for general problems.
[code]
== BEGIN uname -rmi ==
2.6.9-42.0.2.ELsmp x86_64 x86_64
== END uname -rmi ==

== BEGIN rpm -qa \*-release\* ==
centos-release-4-4.2
rpmforge-release-0.3.6-1.el4.rf
== END rpm -qa \*-release\* ==

== BEGIN cat /etc/redhat-release ==
CentOS release 4.4 (Final)
== END cat /etc/redhat-release ==

== BEGIN getenforce ==
Disabled
== END getenforce ==

== BEGIN free -m ==
total used free shared buffers cached
Mem: 5966 5939 27 0 108 2992
-/+ buffers/cache: 2838 3127
Swap: 4094 0 4094
== END free -m ==

== BEGIN rpm -qa yum\* rpm-\* python | sort ==
python-2.3.4-14.2
rpm-build-4.3.3-18_nonptl
rpm-devel-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-python-4.3.3-18_nonptl
yum-2.0.7-1
== END rpm -qa yum\* rpm-\* python | sort ==

== BEGIN ls /etc/yum.repos.d ==
CentOS-Base.repo
CentOS-Media.repo
mirrors-rpmforge
rpmforge.repo
== END ls /etc/yum.repos.d ==

== BEGIN cat /etc/yum.conf ==
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
pkgpolicy=newest
distroverpkg=redhat-release
tolerant=1
exactarch=1

[base]
name=Red Hat Linux $releasever - $basearch - Base
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/


[updates]
name=Red Hat Linux $releasever - Updates
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/


== END cat /etc/yum.conf ==

== BEGIN yum repolist all ==

Usage: yum [options] <update | upgrade | install | info | remove | list |
clean | provides | search | check-update | groupinstall | groupupdate |
grouplist >

Options:
-c [config file] - specify the config file to use
-e [error level] - set the error logging level
-d [debug level] - set the debugging level
-y answer yes to all questions
-t be tolerant about errors in package commands
-R [time in minutes] - set the max amount of time to randomly run in.
-C run from cache only - do not update the cache
--installroot=[path] - set the install root (default '/')
--version - output the version of yum
--exclude=some_pkg_name - packagename to exclude - you can use
this more than once
--download-only - only download packages - do not run the transaction
-h, --help this screen

== END yum repolist all ==

== BEGIN egrep 'include|exclude' /etc/yum.repos.d/*.repo ==
== END egrep 'include|exclude' /etc/yum.repos.d/*.repo ==

== BEGIN sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==
== END sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==

== BEGIN cat /etc/fstab ==
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/1 / ext3 defaults 1 1
LABEL=/boot1 /boot ext3 defaults 1 2
none /dev/pts devpts gid=5,mode=620 0 0
none /dev/shm tmpfs defaults 0 0
LABEL=/export /export ext3 defaults,nosuid,nodev 1 2
none /proc proc defaults 0 0
none /sys sysfs defaults 0 0
LABEL=SWAP-sda3 swap swap pri=0 0 0

# The ram-backed filesystem for ganglia RRD graph databases.
tmpfs /var/lib/ganglia/rrds tmpfs size=1523785000,gid=nobody,uid=nobody,defaults 1 0
/dev/hda /media/cdrecorder auto pamconsole,exec,noauto,managed 0 0
== END cat /etc/fstab ==

== BEGIN df -h ==
Filesystem Size Used Avail Use% Mounted on
/dev/sdc2 32G 11G 20G 36% /
/dev/sdc1 251M 51M 187M 22% /boot
none 3.0G 0 3.0G 0% /dev/shm
/dev/sdc5 258G 242G 2.9G 99% /export
tmpfs 1.5G 9.2M 1.5G 1% /var/lib/ganglia/rrds
/dev/md0 1.4T 1.1T 281G 79% /compute
/export/home/opt 258G 242G 2.9G 99% /home/opt
== END df -h ==

== BEGIN fdisk -l ==
Disk /dev/sda doesn't contain a valid partition table
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdc: 319.9 GB, 319977160704 bytes
255 heads, 63 sectors/track, 38901 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

Device Boot Start End Blocks Id System
/dev/sdc1 * 1 33 265041 83 Linux
/dev/sdc2 34 4210 33551752+ 83 Linux
/dev/sdc3 4211 4732 4192965 82 Linux swap
/dev/sdc4 4733 38901 274462492+ 5 Extended
/dev/sdc5 4733 38901 274462461 83 Linux

Disk /dev/md0: 1500.3 GB, 1500301819904 bytes
2 heads, 4 sectors/track, 366284624 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

== END fdisk -l ==

== BEGIN blkid ==
/dev/sda3: TYPE="swap"
/dev/sdc1: LABEL="/boot1" UUID="5cf75fc9-a55f-4edd-a47b-1798435f5440" SEC_TYPE="ext3" TYPE="ext2"
/dev/sdc2: LABEL="/1" UUID="d540b42f-57a4-4319-bfcf-62c08545be02" SEC_TYPE="ext3" TYPE="ext2"
/dev/sdc3: TYPE="swap"
/dev/sdc5: LABEL="/export" UUID="ac829e90-8da2-48c1-ac1e-b6be5947970f" SEC_TYPE="ext3" TYPE="ext2"
/dev/md0: UUID="79ad8637-4254-4adc-a877-8fef3662ac0f" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda: UUID="79ad8637-4254-4adc-a877-8fef3662ac0f" SEC_TYPE="ext3" TYPE="ext2"
== END blkid ==

== BEGIN cat /proc/mdstat ==
Personalities : [raid1]
md0 : active raid1 sda[0] sdb[1]
1465138496 blocks [2/2] [UU]

unused devices: <none>
== END cat /proc/mdstat ==

== BEGIN pvs ==
== END pvs ==

== BEGIN vgs ==
No volume groups found
== END vgs ==

== BEGIN lvs ==
No volume groups found
== END lvs ==

== BEGIN rpm -qa kernel\* | sort ==
kernel-2.6.9-42.0.2.EL
kernel-devel-2.6.9-42.0.2.EL
kernel-doc-2.6.9-42.0.2.EL
kernel-ib-1.0-1
kernel-smp-2.6.9-42.0.2.EL
kernel-smp-devel-2.6.9-42.0.2.EL
kernel-utils-2.4-13.1.83
== END rpm -qa kernel\* | sort ==

== BEGIN lspci -nn ==
00:00.0 Class 0600: 8086:25d8 (rev b1)
00:02.0 Class 0604: 8086:25f7 (rev b1)
00:04.0 Class 0604: 8086:25f8 (rev b1)
00:06.0 Class 0604: 8086:25f9 (rev b1)
00:08.0 Class 0880: 8086:1a38 (rev b1)
00:10.0 Class 0600: 8086:25f0 (rev b1)
00:10.1 Class 0600: 8086:25f0 (rev b1)
00:10.2 Class 0600: 8086:25f0 (rev b1)
00:11.0 Class 0600: 8086:25f1 (rev b1)
00:13.0 Class 0600: 8086:25f3 (rev b1)
00:15.0 Class 0600: 8086:25f5 (rev b1)
00:16.0 Class 0600: 8086:25f6 (rev b1)
00:1d.0 Class 0c03: 8086:2688 (rev 09)
00:1d.1 Class 0c03: 8086:2689 (rev 09)
00:1d.2 Class 0c03: 8086:268a (rev 09)
00:1d.7 Class 0c03: 8086:268c (rev 09)
00:1e.0 Class 0604: 8086:244e (rev d9)
00:1f.0 Class 0601: 8086:2670 (rev 09)
00:1f.1 Class 0101: 8086:269e (rev 09)
00:1f.2 Class 0106: 8086:2681 (rev 09)
00:1f.3 Class 0c05: 8086:269b (rev 09)
01:00.0 Class 0604: 8086:3500 (rev 01)
01:00.3 Class 0604: 8086:350c (rev 01)
02:00.0 Class 0604: 8086:3510 (rev 01)
02:02.0 Class 0604: 8086:3518 (rev 01)
04:00.0 Class 0200: 8086:1096 (rev 01)
04:00.1 Class 0200: 8086:1096 (rev 01)
05:01.0 Class 0104: 13c1:1003
06:00.0 Class 0604: 8086:0329 (rev 09)
06:00.2 Class 0604: 8086:032a (rev 09)
09:00.0 Class 0604: 8086:0329 (rev 09)
09:00.2 Class 0604: 8086:032a (rev 09)
0c:01.0 Class 0300: 1002:515e (rev 02)
== END lspci -nn ==

== BEGIN lsusb ==
Bus 004 Device 001: ID 0000:0000
Bus 003 Device 001: ID 0000:0000
Bus 002 Device 033: ID 04f2:0112 Chicony Electronics Co., Ltd KU-8933 Keyboard with PS/2 Mouse port
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000
== END lsusb ==

== BEGIN rpm -qa kmod\* kmdl\* ==
== END rpm -qa kmod\* kmdl\* ==

== BEGIN ifconfig -a ==
eth0 Link encap:Ethernet HWaddr 00:30:48:79:5D:FC
inet addr:19********50 Bcast:19*********55 Mask:255.255.255.0
inet6 addr: fe80::230:48ff:fe79:5dfc/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:7050401479 errors:0 dropped:0 overruns:0 frame:0
TX packets:3854296210 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:7954246396358 (7.2 TiB) TX bytes:2527274825651 (2.2 TiB)
Base address:0x2000 Memory:da000000-da020000

eth1 Link encap:Ethernet HWaddr 00:3********5D:FD
inet addr:1********00 Bcast:1********55 Mask:255.255.255.0
inet6 addr: fe*****************5dfd/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:21054902 errors:0 dropped:0 overruns:0 frame:0
TX packets:1391077 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:1487260918 (1.3 GiB) TX bytes:124717546 (118.9 MiB)
Base address:0x2020 Memory:da020000-da040000

lo Link encap:Local Loopback
inet addr:********.1 Mask:255.0.0.0
inet6 addr: ::1/128 Scope:Host
UP LOOPBACK RUNNING MTU:16436 Metric:1
RX packets:6796368343 errors:0 dropped:0 overruns:0 frame:0
TX packets:6796368343 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:1079369857796 (1005.2 GiB) TX bytes:1079369857796 (1005.2 GiB)

sit0 Link encap:IPv6-in-IPv4
NOARP MTU:1480 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:0
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

tun0 Link encap:UNSPEC HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
inet addr:10.8.0.1 P-t-P:10.8.0.2 Mask:255.255.255.255
UP POINTOPOINT RUNNING NOARP MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:100
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

vmnet8 Link encap:Ethernet HWaddr 00:50:56:C0:00:08
inet addr:1***********.1 Bcast:1**********55 Mask:255.255.255.0
inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
UP BROADCAST RUNNING MULTICAST MTU:1500 Metric:1
RX packets:0 errors:0 dropped:0 overruns:0 frame:0
TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
collisions:0 txqueuelen:1000
RX bytes:0 (0.0 b) TX bytes:0 (0.0 b)

== END ifconfig -a ==

== BEGIN brctl show ==
./getinfo.sh: line 87: brctl: command not found
== END brctl show ==

== BEGIN route -n ==
Kernel IP routing table
Destination Gateway Genmask Flags Metric Ref Use Iface
255.2********.255 0.0.0.0 255.255.255.255 UH 0 0 0 eth0
1******2 0.0.0.0 255.255.255.255 UH 0 0 0 tun0
1********00 192.168.1.250 255.255.255.255 UGH 0 0 0 eth0
1********0 0.0.0.0 255.255.255.0 U 0 0 0 eth1
1*******0 10.8.0.2 255.255.255.0 UG 0 0 0 tun0
19********.0 0.0.0.0 255.255.255.0 U 0 0 0 eth0
19********0 0.0.0.0 255.255.255.0 U 0 0 0 vmnet8
1*********0 0.0.0.0 255.255.0.0 U 0 0 0 eth1
224.0.0.0 0.0.0.0 240.0.0.0 U 0 0 0 eth0
0.0.0.0 10.10.3.254 0.0.0.0 UG 0 0 0 eth1
== END route -n ==

== BEGIN cat /etc/resolv.conf ==
#
# Do NOT Edit (generated by dbreport)
#
# Public side resolver configuration file
#
nameserver 127.0.0.1
== END cat /etc/resolv.conf ==

== BEGIN grep net /etc/nsswitch.conf ==
#networks: ldap [NOTFOUND=return] files
netmasks: files
networks: files
netgroup: files nis
== END grep net /etc/nsswitch.conf ==

== BEGIN chkconfig --list | grep -Ei 'network|wpa' ==
NetworkManager 0:off 1:off 2:off 3:off 4:off 5:off 6:off
network 0:off 1:off 2:on 3:on 4:on 5:on 6:off
== END chkconfig --list | grep -Ei 'network|wpa' ==

[/code]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Post by pschaff » 2012/02/10 16:00:01

This system does seem to have the disks connected. Your RAID is apparently using the raw devices /dev/sda and /dev/sdb rather than partitions, but then the blkid result [b]/dev/sda3: TYPE="swap" [/b] is quite odd. What does "cat /proc/swaps" show? (If that works for EL4 - can't remember and don't have it running.)

At any rate, to work on data recovery the devices still need to be unmounted, but as /proc/mdstat shows the RAID as active and both devices present, they are probably in sync and any chances of recovering data is slim.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/10 22:00:38

cat /proc/stat gives (I did this from the log in node, because I am pretty sure it doesn't matter which node I am on to look at this file):

[code]
Filename Type Size Used Priority
/dev/sda3 partition 1020116 534264 -1
[/code]

User avatar
TrevorH
Forum Moderator
Posts: 22798
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Post by TrevorH » 2012/02/10 22:48:15

[quote]
I did this from the log in node, because I am pretty sure it doesn't matter which node I am on to look at this file
[/quote]

You seem quite confused about the relationship that all these machines have to each other. It does matter which one you are on. It matters a lot. Your head node is the machine with the problem with its RAID array and it is that machine that you need to deal with and that one only. The others only matter because they are NFS clients of the head node and are using its data over the network. They matter only because they are using parts of the RAID array via the network and this will stop you from being able to manipulate the underlying devices. It is very possible that in order to fix this, you may need to shutdown all the other machines while you deal with the real problem - but you shouldn't need to do that until you have a clear grasp of what the problem is and how you are going to fix it.

And, BTW, files in /proc are specific to the machine in question, each machine running linux will have a /proc filesystem of its own and it reflects information about the state of that linux system in particular.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/11 20:20:35

sorry, I didn't realize (but maybe should have) that the files would be different. I was hoping they wouldn't be, because I again have to wait until Monday to get access to the machine. Thanks for your patience.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/13 21:54:49

ok, here is /proc/swaps from the head node:

[code]
Filename Type Size Used Priority
/dev/sdc3 partition 4192956 136 0
[/code]

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Post by pschaff » 2012/02/14 19:11:02

At least that is apparently not one of the RAID disks, but with all the confusion about what node is what, I really don't know where things stand with this thread. How about taking a step back and briefly summarizing what you now understand, and succinctly pose any remaining open questions.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Post by markmh » 2012/02/27 18:34:18

Thanks for your help so far Phil and Trevor,

In summary my problem is the following: I want to try to recover the data from a software RAID on a cluster with CentOS 4.4. So far I know that the cluster has 25 nodes. One of them (with the address 192.168.1.250) has the two disks mounted at /compute. Through nfs the other nodes can access it.

Unfortunately, I have not enough experience to do it alone. The administrator of the cluster is no longer available. The data that we are missing is the work of me and my colleagues. Here is an update.

We unmounted /compute by using umount -l.

1) The analysis of the two disks composing the RAID1 shows one disk is broken.

Command: /sbin/mdadm --examine /dev/sda

[code]
/dev/sda:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 89cd271b:f66ca0cc:0d4ef2da:9877d7ef
  Creation Time : Tue Jan 13 12:03:05 2009
     Raid Level : raid1
    Device Size : 1465138496 (1397.26 GiB 1500.30 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sun Jan 15 16:49:07 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c247fd2d - correct
         Events : 0.27518273


      Number   Major   Minor   RaidDevice State
this     0       8        0        0      active sync   /dev/sda
   0     0       8        0        0      active sync   /dev/sda
   1     1       8       16        1      active sync   /dev/sdb
[/code]


Command: /sbin/mdadm --examine /dev/sdb

[code]
mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 20203934)
[/code]

so it looks like the sdb disk is broken.

2) We have tried to mount the sda disk by itself, but again there was a message saying that it is busy, and it did not work. It seems that this is because the RAID is still running.

So what we are trying to figure out now is a way to access the data on the sda disk. Is there a way to recover the data with only one disk? Is there a way to mount it by itself?

If you want info about the system, just ask, I have prepared a list of all the info you have asked me to collect from the head node.
Thanks.

Post Reply