software RAID-1 data recovery

General support questions including new installations
markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/06 21:30:03

sorry. The full ./getinfo is:



Information for general problems.

Code: Select all

== BEGIN uname -rmi ==
2.6.9-78.ELsmp x86_64 x86_64
== END   uname -rmi ==

== BEGIN rpm -qa \*-release\* ==
centos-release-4-4.2
rpmforge-release-0.3.6-1.el4.rf
== END   rpm -qa \*-release\* ==

== BEGIN cat /etc/redhat-release ==
CentOS release 4.4 (Final)
== END   cat /etc/redhat-release ==

== BEGIN getenforce ==
Disabled
== END   getenforce ==

== BEGIN free -m ==
             total       used       free     shared    buffers     cached
Mem:          4903       3093       1809          0        114       1864
-/+ buffers/cache:       1114       3788
Swap:          996        521        474
== END   free -m ==

== BEGIN rpm -qa yum\* rpm-\* python | sort ==
python-2.3.4-14.2
rpm-build-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-python-4.3.3-18_nonptl
yum-2.0.7-1
== END   rpm -qa yum\* rpm-\* python | sort ==

== BEGIN ls /etc/yum.repos.d ==
CentOS-Base.repo
CentOS-Media.repo
mirrors-rpmforge
rpmforge.repo
== END   ls /etc/yum.repos.d ==

== BEGIN cat /etc/yum.conf ==
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
pkgpolicy=newest
distroverpkg=redhat-release
tolerant=1
exactarch=1

[base]
name=Red Hat Linux $releasever - $basearch - Base
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/


[updates]
name=Red Hat Linux $releasever - Updates
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/


== END   cat /etc/yum.conf ==

== BEGIN yum repolist all ==

    Usage:  yum [options] <update | upgrade | install | info | remove | list |
            clean | provides | search | check-update | groupinstall | groupupdate |
            grouplist >

         Options:
          -c [config file] - specify the config file to use
          -e [error level] - set the error logging level
          -d [debug level] - set the debugging level
          -y answer yes to all questions
          -t be tolerant about errors in package commands
          -R [time in minutes] - set the max amount of time to randomly run in.
          -C run from cache only - do not update the cache
          --installroot=[path] - set the install root (default '/')
          --version - output the version of yum
          --exclude=some_pkg_name - packagename to exclude - you can use
            this more than once
          --download-only - only download packages - do not run the transaction
          -h, --help this screen

== END   yum repolist all ==

== BEGIN egrep 'include|exclude' /etc/yum.repos.d/*.repo ==
== END   egrep 'include|exclude' /etc/yum.repos.d/*.repo ==

== BEGIN sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==
== END   sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==

== BEGIN cat /etc/fstab ==
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/                 /                       ext3    defaults        1 1
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /dev/shm                tmpfs   noexec,nosuid        0 0
none                    /proc                   proc    defaults        0 0
LABEL=/state/partition  /state/partition1       ext3    nosuid,defaults        1 2
none                    /sys                    sysfs   defaults        0 0
LABEL=/var              /var                    ext3    defaults        1 2
/dev/sda3               swap                    swap    defaults        0 0
192.***.*.250:/export/home      /home           nfs     nosuid,intr,rsize=32768,wsize=32768,noatime 0 0
192.***.*.250:/compute /compute nfs rsize=32768,wsize=32768,noatime,nosuid,nodev,intr 0 0
/dev/hdc                /media/cdrom            auto    pamconsole,exec,noauto,managed 0 0
/dev/fd0                /media/floppy           auto    pamconsole,exec,noauto,managed 0 0
== END   cat /etc/fstab ==

== BEGIN df -h ==
Filesystem            Size  Used Avail Use% Mounted on
/dev/sda1             7.7G  3.4G  4.0G  46% /
none                  2.4G     0  2.4G   0% /dev/shm
/dev/sda5              55G   84M   53G   1% /state/partition1
/dev/sda2             3.9G  2.7G 1012M  73% /var
192.***.*.250:/export/home
                      258G  242G  2.8G  99% /home
192.***.*.250:/compute
                      1.4T  1.1T  281G  79% /compute
== END   df -h ==

== BEGIN blkid ==
/dev/sda1: LABEL="/" UUID="ff0a5c95-1ba8-4b17-9821-0c09fc97f43e" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda2: LABEL="/var" UUID="2a2c2be9-0a4e-41cc-9f04-1fd72a935b93" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda3: TYPE="swap"
/dev/sda5: LABEL="/state/partition" UUID="2864d45c-622b-4eae-8396-ce36d0c7c57e" SEC_TYPE="ext3" TYPE="ext2"
== END   blkid ==

== BEGIN cat /proc/mdstat ==
Personalities :
unused devices: <none>
== END   cat /proc/mdstat ==

== BEGIN rpm -qa kernel\* | sort ==
kernel-2.6.9-42.0.2.EL
kernel-devel-2.6.9-42.0.2.EL
kernel-doc-2.6.9-42.0.2.EL
kernel-ib-1.0-1
kernel-smp-2.6.9-42.0.2.EL
kernel-smp-2.6.9-78.EL
kernel-smp-devel-2.6.9-42.0.2.EL
kernel-smp-devel-2.6.9-78.EL
kernel-utils-2.4-14.1.117
== END   rpm -qa kernel\* | sort ==

== BEGIN lspci -nn ==
00:06.0 Class 0604: 1022:7460 (rev 07)
00:07.0 Class 0601: 1022:7468 (rev 05)
00:07.1 Class 0101: 1022:7469 (rev 03)
00:07.3 Class 0680: 1022:746b (rev 05)
00:0a.0 Class 0604: 1022:7450 (rev 12)
00:0a.1 Class 0800: 1022:7451 (rev 01)
00:0b.0 Class 0604: 1022:7450 (rev 12)
00:0b.1 Class 0800: 1022:7451 (rev 01)
00:18.0 Class 0600: 1022:1100
00:18.1 Class 0600: 1022:1101
00:18.2 Class 0600: 1022:1102
00:18.3 Class 0600: 1022:1103
00:19.0 Class 0600: 1022:1100
00:19.1 Class 0600: 1022:1101
00:19.2 Class 0600: 1022:1102
00:19.3 Class 0600: 1022:1103
01:00.0 Class 0c03: 1022:7464 (rev 0b)
01:00.1 Class 0c03: 1022:7464 (rev 0b)
01:05.0 Class 0300: 1023:9880 (rev 3a)
02:02.0 Class 0200: 14e4:1648 (rev 03)
02:02.1 Class 0200: 14e4:1648 (rev 03)
02:04.0 Class 0100: 1000:0030 (rev 08)
== END   lspci -nn ==

== BEGIN lsusb ==
Bus 002 Device 001: ID 0000:0000
Bus 001 Device 001: ID 0000:0000
== END   lsusb ==

== BEGIN rpm -qa kmod\* kmdl\* ==
== END   rpm -qa kmod\* kmdl\* ==

== BEGIN ifconfig -a ==
eth0      Link encap:Ethernet  HWaddr 00:09:3D:12:1F:A3
          inet addr:192.***.*.25  Bcast:192.***.*.255  Mask:255.255.255.0
          inet6 addr: fe***********2:1fa3/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:1021314388 errors:0 dropped:0 overruns:0 frame:0
          TX packets:386552887 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:653012841839 (608.1 GiB)  TX bytes:242005503401 (225.3 GiB)
          Interrupt:185

eth1      Link encap:Ethernet  HWaddr 00:09:3D:12:1F:A4
          inet addr:10******3  Bcast: *******55  Mask:255.255.255.0
          inet6 addr: fe****************4 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:125196523 errors:0 dropped:0 overruns:0 frame:0
          TX packets:158909688 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:37651640392 (35.0 GiB)  TX bytes:166544974196 (155.1 GiB)
          Interrupt:193

lo        Link encap:Local Loopback
          inet addr:1******.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:2834894 errors:0 dropped:0 overruns:0 frame:0
          TX packets:2834894 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:579591118 (552.7 MiB)  TX bytes:579591118 (552.7 MiB)

sit0      Link encap:IPv6-in-IPv4
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00
          inet addr:1******1  P-t-P:1*******2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:83128653 errors:0 dropped:0 overruns:0 frame:0
          TX packets:132347077 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:26625080333 (24.7 GiB)  TX bytes:152690283974 (142.2 GiB)

== END   ifconfig -a ==



== BEGIN brctl show ==
./getinfo.sh: line 87: brctl: command not found
== END   brctl show ==

== BEGIN route -n ==
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
255.255.255.255 0.0.0.0         255.255.255.255 UH    0      0        0 eth0
10.*.*.*        0.0.0.0         255.255.255.255 UH    0      0        0 tun0
10.**.*.200     192.168.1.250   255.255.255.255 UGH   0      0        0 eth0
10.****.0       0.0.0.0         255.255.255.0   U     0      0        0 eth1
10****.0        10.8.0.2        255.255.255.0   UG    0      0        0 tun0
192******.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
16******0     0.0.0.0         255.255.0.0     U     0      0        0 eth1
2*****.0       0.0.0.0         240.0.0.0       U     0      0        0 eth0
0.0.0.0         10.10.3.254     0.0.0.0         UG    0      0        0 eth1
== END   route -n ==

== BEGIN cat /etc/resolv.conf ==
#
# Do NOT Edit (generated by dbreport)
#
# Private side resolver configuration file
#
nameserver 19*******250
#search local bgsu.edu

== END   cat /etc/resolv.conf ==

== BEGIN grep net /etc/nsswitch.conf ==
#networks:  ldap [NOTFOUND=return] files
netmasks:   files
networks:   files
netgroup:   files
== END   grep net /etc/nsswitch.conf ==

== BEGIN chkconfig --list | grep -Ei 'network|wpa' ==
network         0:off   1:off   2:on    3:on    4:on    5:on    6:off
== END   chkconfig --list | grep -Ei 'network|wpa' ==







showmount -a gives:




All mount points on lcpp-cluster.bgsu.edu:
192.168.1.0/255.255.255.0:/compute
192.168.1.0/255.255.255.0:/diskless/x64/RHEL4-AS/root
192.168.1.0/255.255.255.0:/export
192.168.1.10:192.168.1.0/255.255.255.0
192.168.1.11:192.168.1.0/255.255.255.0
192.168.1.12:192.168.1.0/255.255.255.0
192.168.1.13:192.168.1.0/255.255.255.0
192.168.1.14:192.168.1.0/255.255.255.0
192.168.1.15:192.168.1.0/255.255.255.0
192.168.1.16:192.168.1.0/255.255.255.0
192.168.1.17:192.168.1.0/255.255.255.0
192.168.1.18:192.168.1.0/255.255.255.0
192.168.1.19:192.168.1.0/255.255.255.0
192.168.1.1:192.168.1.0/255.255.255.0
192.168.1.20:192.168.1.0/255.255.255.0
192.168.1.21:192.168.1.0/255.255.255.0
192.168.1.22:192.168.1.0/255.255.255.0
192.168.1.23:192.168.1.0/255.255.255.0
192.168.1.24:192.168.1.0/255.255.255.0
192.168.1.251:192.168.1.0/255.255.255.0
192.168.1.25:192.168.1.0/255.255.255.0
192.168.1.2:192.168.1.0/255.255.255.0
192.168.1.3:192.168.1.0/255.255.255.0
192.168.1.4:192.168.1.0/255.255.255.0
192.168.1.5:192.168.1.0/255.255.255.0
192.168.1.6:192.168.1.0/255.255.255.0
192.168.1.7:192.168.1.0/255.255.255.0
192.168.1.8:192.168.1.0/255.255.255.0
192.168.1.9:192.168.1.0/255.255.255.0

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Postby pschaff » 2012/02/06 23:11:59

You still need to update.

The script output shows that the filesystem in question is mounted via NFS, which in retrospect I/we should have seen earlier. You need to work with the RAID recovery on the system to which it is physically connected, not on the NFS client. Remove the filesystem[s] on the RAID from /etc/exports on the server (assuming it is also CentOS), make sure all clients unmount it, and do "exportfs -r". The requested script information should have been provided for the server and not the client. You also need to run as root to get fdisk results.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/10 01:56:44

Sorry. Our cluster consists of 25 nodes, and if we update, we should update all for consistency, so it is probably not feasible at this time.

About the ./getinfo script: the LAST time I sent output, I was logged in remotely and the data must have been for the log in node. Almost all previous data was obtained when I physically plugged in to what I believe is the head node. So I am a little confused about whether ALL the info I have given was not from the server, or if it was just the last post.
I realized this possibility after I visited the computer today. Here is what it gave me today, when I was physically logged into what I believe is the head node (I am sorry if this data still isn't from the right node, if it isn't please let me know and I will be sure to log into the right node next time.)

./getinfo:

Information for general problems.

Code: Select all

== BEGIN uname -rmi ==
2.6.9-42.0.2.ELsmp x86_64 x86_64
== END   uname -rmi ==

== BEGIN rpm -qa \*-release\* ==
centos-release-4-4.2
rpmforge-release-0.3.6-1.el4.rf
== END   rpm -qa \*-release\* ==

== BEGIN cat /etc/redhat-release ==
CentOS release 4.4 (Final)
== END   cat /etc/redhat-release ==

== BEGIN getenforce ==
Disabled
== END   getenforce ==

== BEGIN free -m ==
             total       used       free     shared    buffers     cached
Mem:          5966       5939         27          0        108       2992
-/+ buffers/cache:       2838       3127
Swap:         4094          0       4094
== END   free -m ==

== BEGIN rpm -qa yum\* rpm-\* python | sort ==
python-2.3.4-14.2
rpm-build-4.3.3-18_nonptl
rpm-devel-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-libs-4.3.3-18_nonptl
rpm-python-4.3.3-18_nonptl
yum-2.0.7-1
== END   rpm -qa yum\* rpm-\* python | sort ==

== BEGIN ls /etc/yum.repos.d ==
CentOS-Base.repo
CentOS-Media.repo
mirrors-rpmforge
rpmforge.repo
== END   ls /etc/yum.repos.d ==

== BEGIN cat /etc/yum.conf ==
[main]
cachedir=/var/cache/yum
debuglevel=2
logfile=/var/log/yum.log
pkgpolicy=newest
distroverpkg=redhat-release
tolerant=1
exactarch=1

[base]
name=Red Hat Linux $releasever - $basearch - Base
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/$releasever/$basearch/


[updates]
name=Red Hat Linux $releasever - Updates
baseurl=http://mirror.dulug.duke.edu/pub/yum-repository/redhat/updates/$releasever/


== END   cat /etc/yum.conf ==

== BEGIN yum repolist all ==

    Usage:  yum [options] <update | upgrade | install | info | remove | list |
            clean | provides | search | check-update | groupinstall | groupupdate |
            grouplist >
               
         Options:
          -c [config file] - specify the config file to use
          -e [error level] - set the error logging level
          -d [debug level] - set the debugging level
          -y answer yes to all questions
          -t be tolerant about errors in package commands
          -R [time in minutes] - set the max amount of time to randomly run in.
          -C run from cache only - do not update the cache
          --installroot=[path] - set the install root (default '/')
          --version - output the version of yum
          --exclude=some_pkg_name - packagename to exclude - you can use
            this more than once
          --download-only - only download packages - do not run the transaction
          -h, --help this screen
   
== END   yum repolist all ==

== BEGIN egrep 'include|exclude' /etc/yum.repos.d/*.repo ==
== END   egrep 'include|exclude' /etc/yum.repos.d/*.repo ==

== BEGIN sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==
== END   sed -n -e "/^\[/h; /priority *=/{ G; s/\n/ /; s/ity=/ity = /; p }" /etc/yum.repos.d/*.repo | sort -k3n ==

== BEGIN cat /etc/fstab ==
# This file is edited by fstab-sync - see 'man fstab-sync' for details
LABEL=/1                /                       ext3    defaults        1 1
LABEL=/boot1            /boot                   ext3    defaults        1 2
none                    /dev/pts                devpts  gid=5,mode=620  0 0
none                    /dev/shm                tmpfs   defaults        0 0
LABEL=/export           /export                 ext3    defaults,nosuid,nodev        1 2
none                    /proc                   proc    defaults        0 0
none                    /sys                    sysfs   defaults        0 0
LABEL=SWAP-sda3         swap                    swap    pri=0        0 0

# The ram-backed filesystem for ganglia RRD graph databases.
tmpfs /var/lib/ganglia/rrds tmpfs size=1523785000,gid=nobody,uid=nobody,defaults 1 0   
/dev/hda                /media/cdrecorder       auto    pamconsole,exec,noauto,managed 0 0
== END   cat /etc/fstab ==

== BEGIN df -h ==
Filesystem            Size  Used Avail Use% Mounted on
/dev/sdc2              32G   11G   20G  36% /
/dev/sdc1             251M   51M  187M  22% /boot
none                  3.0G     0  3.0G   0% /dev/shm
/dev/sdc5             258G  242G  2.9G  99% /export
tmpfs                 1.5G  9.2M  1.5G   1% /var/lib/ganglia/rrds
/dev/md0              1.4T  1.1T  281G  79% /compute
/export/home/opt      258G  242G  2.9G  99% /home/opt
== END   df -h ==

== BEGIN fdisk -l ==
Disk /dev/sda doesn't contain a valid partition table
Disk /dev/sdb doesn't contain a valid partition table
Disk /dev/md0 doesn't contain a valid partition table

Disk /dev/sda: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdb: 1500.3 GB, 1500301910016 bytes
255 heads, 63 sectors/track, 182401 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes


Disk /dev/sdc: 319.9 GB, 319977160704 bytes
255 heads, 63 sectors/track, 38901 cylinders
Units = cylinders of 16065 * 512 = 8225280 bytes

   Device Boot      Start         End      Blocks   Id  System
/dev/sdc1   *           1          33      265041   83  Linux
/dev/sdc2              34        4210    33551752+  83  Linux
/dev/sdc3            4211        4732     4192965   82  Linux swap
/dev/sdc4            4733       38901   274462492+   5  Extended
/dev/sdc5            4733       38901   274462461   83  Linux

Disk /dev/md0: 1500.3 GB, 1500301819904 bytes
2 heads, 4 sectors/track, 366284624 cylinders
Units = cylinders of 8 * 512 = 4096 bytes

== END   fdisk -l ==

== BEGIN blkid ==
/dev/sda3: TYPE="swap"
/dev/sdc1: LABEL="/boot1" UUID="5cf75fc9-a55f-4edd-a47b-1798435f5440" SEC_TYPE="ext3" TYPE="ext2"
/dev/sdc2: LABEL="/1" UUID="d540b42f-57a4-4319-bfcf-62c08545be02" SEC_TYPE="ext3" TYPE="ext2"
/dev/sdc3: TYPE="swap"
/dev/sdc5: LABEL="/export" UUID="ac829e90-8da2-48c1-ac1e-b6be5947970f" SEC_TYPE="ext3" TYPE="ext2"
/dev/md0: UUID="79ad8637-4254-4adc-a877-8fef3662ac0f" SEC_TYPE="ext3" TYPE="ext2"
/dev/sda: UUID="79ad8637-4254-4adc-a877-8fef3662ac0f" SEC_TYPE="ext3" TYPE="ext2"
== END   blkid ==

== BEGIN cat /proc/mdstat ==
Personalities : [raid1]
md0 : active raid1 sda[0] sdb[1]
      1465138496 blocks [2/2] [UU]
     
unused devices: <none>
== END   cat /proc/mdstat ==

== BEGIN pvs ==
== END   pvs ==

== BEGIN vgs ==
  No volume groups found
== END   vgs ==

== BEGIN lvs ==
  No volume groups found
== END   lvs ==

== BEGIN rpm -qa kernel\* | sort ==
kernel-2.6.9-42.0.2.EL
kernel-devel-2.6.9-42.0.2.EL
kernel-doc-2.6.9-42.0.2.EL
kernel-ib-1.0-1
kernel-smp-2.6.9-42.0.2.EL
kernel-smp-devel-2.6.9-42.0.2.EL
kernel-utils-2.4-13.1.83
== END   rpm -qa kernel\* | sort ==

== BEGIN lspci -nn ==
00:00.0 Class 0600: 8086:25d8 (rev b1)
00:02.0 Class 0604: 8086:25f7 (rev b1)
00:04.0 Class 0604: 8086:25f8 (rev b1)
00:06.0 Class 0604: 8086:25f9 (rev b1)
00:08.0 Class 0880: 8086:1a38 (rev b1)
00:10.0 Class 0600: 8086:25f0 (rev b1)
00:10.1 Class 0600: 8086:25f0 (rev b1)
00:10.2 Class 0600: 8086:25f0 (rev b1)
00:11.0 Class 0600: 8086:25f1 (rev b1)
00:13.0 Class 0600: 8086:25f3 (rev b1)
00:15.0 Class 0600: 8086:25f5 (rev b1)
00:16.0 Class 0600: 8086:25f6 (rev b1)
00:1d.0 Class 0c03: 8086:2688 (rev 09)
00:1d.1 Class 0c03: 8086:2689 (rev 09)
00:1d.2 Class 0c03: 8086:268a (rev 09)
00:1d.7 Class 0c03: 8086:268c (rev 09)
00:1e.0 Class 0604: 8086:244e (rev d9)
00:1f.0 Class 0601: 8086:2670 (rev 09)
00:1f.1 Class 0101: 8086:269e (rev 09)
00:1f.2 Class 0106: 8086:2681 (rev 09)
00:1f.3 Class 0c05: 8086:269b (rev 09)
01:00.0 Class 0604: 8086:3500 (rev 01)
01:00.3 Class 0604: 8086:350c (rev 01)
02:00.0 Class 0604: 8086:3510 (rev 01)
02:02.0 Class 0604: 8086:3518 (rev 01)
04:00.0 Class 0200: 8086:1096 (rev 01)
04:00.1 Class 0200: 8086:1096 (rev 01)
05:01.0 Class 0104: 13c1:1003
06:00.0 Class 0604: 8086:0329 (rev 09)
06:00.2 Class 0604: 8086:032a (rev 09)
09:00.0 Class 0604: 8086:0329 (rev 09)
09:00.2 Class 0604: 8086:032a (rev 09)
0c:01.0 Class 0300: 1002:515e (rev 02)
== END   lspci -nn ==

== BEGIN lsusb ==
Bus 004 Device 001: ID 0000:0000 
Bus 003 Device 001: ID 0000:0000 
Bus 002 Device 033: ID 04f2:0112 Chicony Electronics Co., Ltd KU-8933 Keyboard with PS/2 Mouse port
Bus 002 Device 001: ID 0000:0000 
Bus 001 Device 001: ID 0000:0000 
== END   lsusb ==

== BEGIN rpm -qa kmod\* kmdl\* ==
== END   rpm -qa kmod\* kmdl\* ==

== BEGIN ifconfig -a ==
eth0      Link encap:Ethernet  HWaddr 00:30:48:79:5D:FC 
          inet addr:19********50  Bcast:19*********55  Mask:255.255.255.0
          inet6 addr: fe80::230:48ff:fe79:5dfc/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:7050401479 errors:0 dropped:0 overruns:0 frame:0
          TX packets:3854296210 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:7954246396358 (7.2 TiB)  TX bytes:2527274825651 (2.2 TiB)
          Base address:0x2000 Memory:da000000-da020000

eth1      Link encap:Ethernet  HWaddr 00:3********5D:FD 
          inet addr:1********00  Bcast:1********55  Mask:255.255.255.0
          inet6 addr: fe*****************5dfd/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:21054902 errors:0 dropped:0 overruns:0 frame:0
          TX packets:1391077 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:1487260918 (1.3 GiB)  TX bytes:124717546 (118.9 MiB)
          Base address:0x2020 Memory:da020000-da040000

lo        Link encap:Local Loopback 
          inet addr:********.1  Mask:255.0.0.0
          inet6 addr: ::1/128 Scope:Host
          UP LOOPBACK RUNNING  MTU:16436  Metric:1
          RX packets:6796368343 errors:0 dropped:0 overruns:0 frame:0
          TX packets:6796368343 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:1079369857796 (1005.2 GiB)  TX bytes:1079369857796 (1005.2 GiB)

sit0      Link encap:IPv6-in-IPv4 
          NOARP  MTU:1480  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:0
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

tun0      Link encap:UNSPEC  HWaddr 00-00-00-00-00-00-00-00-00-00-00-00-00-00-00-00 
          inet addr:10.8.0.1  P-t-P:10.8.0.2  Mask:255.255.255.255
          UP POINTOPOINT RUNNING NOARP MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:0 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:100
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

vmnet8    Link encap:Ethernet  HWaddr 00:50:56:C0:00:08 
          inet addr:1***********.1  Bcast:1**********55  Mask:255.255.255.0
          inet6 addr: fe80::250:56ff:fec0:8/64 Scope:Link
          UP BROADCAST RUNNING MULTICAST  MTU:1500  Metric:1
          RX packets:0 errors:0 dropped:0 overruns:0 frame:0
          TX packets:5 errors:0 dropped:0 overruns:0 carrier:0
          collisions:0 txqueuelen:1000
          RX bytes:0 (0.0 b)  TX bytes:0 (0.0 b)

== END   ifconfig -a ==

== BEGIN brctl show ==
./getinfo.sh: line 87: brctl: command not found
== END   brctl show ==

== BEGIN route -n ==
Kernel IP routing table
Destination     Gateway         Genmask         Flags Metric Ref    Use Iface
255.2********.255 0.0.0.0         255.255.255.255 UH    0      0        0 eth0
1******2        0.0.0.0         255.255.255.255 UH    0      0        0 tun0
1********00     192.168.1.250   255.255.255.255 UGH   0      0        0 eth0
1********0       0.0.0.0         255.255.255.0   U     0      0        0 eth1
1*******0        10.8.0.2        255.255.255.0   UG    0      0        0 tun0
19********.0     0.0.0.0         255.255.255.0   U     0      0        0 eth0
19********0    0.0.0.0         255.255.255.0   U     0      0        0 vmnet8
1*********0     0.0.0.0         255.255.0.0     U     0      0        0 eth1
224.0.0.0       0.0.0.0         240.0.0.0       U     0      0        0 eth0
0.0.0.0         10.10.3.254     0.0.0.0         UG    0      0        0 eth1
== END   route -n ==

== BEGIN cat /etc/resolv.conf ==
#
# Do NOT Edit (generated by dbreport)
#
# Public side resolver configuration file
#
nameserver 127.0.0.1
== END   cat /etc/resolv.conf ==

== BEGIN grep net /etc/nsswitch.conf ==
#networks:  ldap [NOTFOUND=return] files
netmasks:   files
networks:   files
netgroup:   files nis
== END   grep net /etc/nsswitch.conf ==

== BEGIN chkconfig --list | grep -Ei 'network|wpa' ==
NetworkManager    0:off   1:off   2:off   3:off   4:off   5:off   6:off
network           0:off   1:off   2:on   3:on   4:on   5:on   6:off
== END   chkconfig --list | grep -Ei 'network|wpa' ==


pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Postby pschaff » 2012/02/10 16:00:01

This system does seem to have the disks connected. Your RAID is apparently using the raw devices /dev/sda and /dev/sdb rather than partitions, but then the blkid result /dev/sda3: TYPE="swap" is quite odd. What does "cat /proc/swaps" show? (If that works for EL4 - can't remember and don't have it running.)

At any rate, to work on data recovery the devices still need to be unmounted, but as /proc/mdstat shows the RAID as active and both devices present, they are probably in sync and any chances of recovering data is slim.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/10 22:00:38

cat /proc/stat gives (I did this from the log in node, because I am pretty sure it doesn't matter which node I am on to look at this file):

Code: Select all

Filename            Type      Size   Used   Priority
/dev/sda3                               partition   1020116   534264   -1

User avatar
TrevorH
Forum Moderator
Posts: 21161
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: software RAID-1 data recovery

Postby TrevorH » 2012/02/10 22:48:15

I did this from the log in node, because I am pretty sure it doesn't matter which node I am on to look at this file


You seem quite confused about the relationship that all these machines have to each other. It does matter which one you are on. It matters a lot. Your head node is the machine with the problem with its RAID array and it is that machine that you need to deal with and that one only. The others only matter because they are NFS clients of the head node and are using its data over the network. They matter only because they are using parts of the RAID array via the network and this will stop you from being able to manipulate the underlying devices. It is very possible that in order to fix this, you may need to shutdown all the other machines while you deal with the real problem - but you shouldn't need to do that until you have a clear grasp of what the problem is and how you are going to fix it.

And, BTW, files in /proc are specific to the machine in question, each machine running linux will have a /proc filesystem of its own and it reflects information about the state of that linux system in particular.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/11 20:20:35

sorry, I didn't realize (but maybe should have) that the files would be different. I was hoping they wouldn't be, because I again have to wait until Monday to get access to the machine. Thanks for your patience.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/13 21:54:49

ok, here is /proc/swaps from the head node:

Code: Select all

Filename                                Type            Size    Used    Priority
/dev/sdc3                               partition       4192956 136     0

pschaff
Retired Moderator
Posts: 18276
Joined: 2006/12/13 20:15:34
Location: Tidewater, Virginia, North America
Contact:

Re: software RAID-1 data recovery

Postby pschaff » 2012/02/14 19:11:02

At least that is apparently not one of the RAID disks, but with all the confusion about what node is what, I really don't know where things stand with this thread. How about taking a step back and briefly summarizing what you now understand, and succinctly pose any remaining open questions.

markmh
Posts: 24
Joined: 2012/01/29 21:57:39

Re: software RAID-1 data recovery

Postby markmh » 2012/02/27 18:34:18

Thanks for your help so far Phil and Trevor,

In summary my problem is the following: I want to try to recover the data from a software RAID on a cluster with CentOS 4.4. So far I know that the cluster has 25 nodes. One of them (with the address 192.168.1.250) has the two disks mounted at /compute. Through nfs the other nodes can access it.

Unfortunately, I have not enough experience to do it alone. The administrator of the cluster is no longer available. The data that we are missing is the work of me and my colleagues. Here is an update.

We unmounted /compute by using umount -l.

1) The analysis of the two disks composing the RAID1 shows one disk is broken.

Command: /sbin/mdadm --examine /dev/sda

Code: Select all

/dev/sda:
          Magic : a92b4efc
        Version : 00.90.00
           UUID : 89cd271b:f66ca0cc:0d4ef2da:9877d7ef
  Creation Time : Tue Jan 13 12:03:05 2009
     Raid Level : raid1
    Device Size : 1465138496 (1397.26 GiB 1500.30 GB)
   Raid Devices : 2
  Total Devices : 2
Preferred Minor : 0

    Update Time : Sun Jan 15 16:49:07 2012
          State : clean
 Active Devices : 2
Working Devices : 2
 Failed Devices : 0
  Spare Devices : 0
       Checksum : c247fd2d - correct
         Events : 0.27518273


      Number   Major   Minor   RaidDevice State
this     0       8        0        0      active sync   /dev/sda
   0     0       8        0        0      active sync   /dev/sda
   1     1       8       16        1      active sync   /dev/sdb



Command: /sbin/mdadm --examine /dev/sdb

Code: Select all

mdadm: No super block found on /dev/sdb (Expected magic a92b4efc, got 20203934)


so it looks like the sdb disk is broken.

2) We have tried to mount the sda disk by itself, but again there was a message saying that it is busy, and it did not work. It seems that this is because the RAID is still running.

So what we are trying to figure out now is a way to access the data on the sda disk. Is there a way to recover the data with only one disk? Is there a way to mount it by itself?

If you want info about the system, just ask, I have prepared a list of all the info you have asked me to collect from the head node.
Thanks.