ext4 file system inconsistency

General support questions
ohw0571
Posts: 127
Joined: 2008/10/05 12:24:17

ext4 file system inconsistency

Post by ohw0571 » 2013/08/02 15:05:26

Hello,

during the past three weeks, I have encountered am alarming issue on two of my EL6 machines (a laptop running RHEL 6.4 and a workstation with CentOS 6.4).
A routine fs check (the one carried out every six months) brought up an "unexpected inconsistency" message for the partition holding my home directory; manual fsck'ing revealed multiple issues to be fixed. Initially, I suspected a problem with the hard drive (although there were no messages in the syslog indicating communication errors), so it was replaced by a brand new one. But today (about three weeks later), when I forced a file system check during a reboot, the new partition was again reported to be inconsistent. For all I know, there have not been any power outages during those last weeks. I should also mention that, while this machine was re-installed with CentOS 6 half a year ago, it has been rock-solid with CentOS 5 for years.
I found similar errors on my laptop (during a forced fsck, prompted by the problem on the other machine), so these also must have occurred over the last six months.

The errors are of this kind:
"end of extent exceeds allowed value"
"i_blocks is ..., should be ..."
"block bitmap differences"
"free blocks count wrong"

From these observations hardware failures do not seem very likely to me. In all three instances of the problem, the home partition was affected, all others were fine - this is probably significant. Is there any indication that ext4 on EL 6.x might be somewhat prone to corruption, at least under certain circumstances. I remember there has been a bug some time ago, but this should have been fixed...
Normal applications should not have direct access to the hard drive, so they should - in principle - never corrupt a file system. What the two affected machines have in common is that they are running vmware player with the home directory as a shared folder.

Any ideas what might be causing this problem?

Thanks in advance
Oliver

ohw0571
Posts: 127
Joined: 2008/10/05 12:24:17

Re: ext4 file system inconsistency

Post by ohw0571 » 2013/09/07 10:18:26

There is some rather worrying news on this subject.

I can now confirm that this ext4 file system coruption on EL 6.4 occurs when a vmware virtual machine (windows guest) has been launched. I have observed the problem on all my three machines running vmware, with quite similar configuration:

- CentOS 6.4 or RHEL 6.4
- latest vmware versions (5.0.2 and 6.0.0)
- WinXP or Win7 guest OS

:idea: It is important to note that corruption of the host file system occurs silently :idea:
I.e., you will not note any problem when working with either the guest or the host OS, and the file system is reported as "clean" during boot!
Only if you initiate an explicit fsck, the errors will become apparent.

From what I have seen, there are strong indications for a really severe problem, which makes recent vmware versions *unusable* under EL 6.4. :-o The problem seems to have appeared during the first half of 2013, and thus may be correlated with the upgrade to EL 6.4, or the recent vmware versions.
If I were to formulate a hypothesis, I'd suspect that one of the vmware kernel modules interferes with the ext4 driver. Interestingly, if you look into the compatibility lists on the vmware pages, EL6 as a host OS is listed up to 6.3, but 6.4 is missing! Not sure whether this is meaningful, or whether they just forgot to update the page...

Anyway, in order to track this down, it would be really important to know if anybody else is seeing the same problem. I therefore would like to ask people running vmware under EL 6.4 to check their ext4 partitions (e.g. "touch /forcefsck"; reboot) and report their results (positive or negative).

Thanks in advance
Oliver

User avatar
AlanBartlett
Forum Moderator
Posts: 9345
Joined: 2007/10/22 11:30:09
Location: ~/Earth/UK/England/Suffolk
Contact:

ext4 file system inconsistency

Post by AlanBartlett » 2013/09/07 17:19:12

I have now made this thread '[i]sticky[/i]' so that it receives the maximum exposure.

YBellefeuille
Posts: 319
Joined: 2012/03/06 22:30:17
Location: Ottawa

Re: ext4 file system inconsistency

Post by YBellefeuille » 2013/09/08 02:59:05

VirtualBox also gives a similar warning:

[quote]
The host I/O cache for at least one controller is disabled and the medium [...] for this VM is located on an ext4 partition. There is a known Linux kernel bug which can lead to the corruption of the virtual disk image under these conditions.

Either enable the host I/O cache permanently in the VM settings or put the disk image and the snapshot folder onto a different file medium.

The host I/O cache will now be enabled for this medium.[/quote]

User avatar
jlehtone
Posts: 4531
Joined: 2007/12/11 08:17:33
Location: Finland

Re: ext4 file system inconsistency

Post by jlehtone » 2013/09/11 19:20:57

Is the essence of the issue:

[i]When application X has file Y open from ext4 filesystem, the filesystem can become corrupted.[/i]

VMware and VirtualBox are examples of X, because they use a special way to keep a file open.

Is the Y an image that holds VM's filesystem? (What is "vmware shared folder"?)


Put in other way, would this show at all if the VM is on dedicated "raw" LVM/SAN volume?

ohw0571
Posts: 127
Joined: 2008/10/05 12:24:17

Re: ext4 file system inconsistency

Post by ohw0571 » 2013/09/15 14:34:11

@ jlehtone
Not sure whether you are referring to a specific property of ext4 but yes, I'm working with hard disk files (this is the default way - and thus should be the most thoroughly tested I guess).
The "shared folder" mechanism is vmware's way to give the VM access to files on the host (not sure about the technical details, but it's conceptually similar to a samba server running on the host). Originally, I suspected that to be a problem, but in the meantime I found that corruption also occurs with shared folders disabled.

A workaround seems to be to use a different file system:
After re-creating the fs as ext3 on one of the affected machines, I did not observe any corruption!

This seems to support the assumption that it is either an ext4 bug or an incompatibility of vmware with ext4, at least the version shipped with EL 6.4.

gameavatarnew
Posts: 1
Joined: 2013/09/28 02:46:08
Contact:

Re: ext4 file system inconsistency

Post by gameavatarnew » 2013/09/28 02:55:56

I can now confirm that this ext4 file system coruption on EL 6.4 occurs when a vmware virtual machine (windows guest) has been launched. I have observed the problem on all my three machines running vmware, with quite similar configuration:

- CentOS 6.4 or RHEL 6.4
- Win8
Thanks.

jetserver
Posts: 1
Joined: 2014/06/10 11:35:59

Re: ext4 file system inconsistency

Post by jetserver » 2014/06/10 11:49:27

Hi,

Bumping this issue in case someone has more information about it.

We are running a clustered vmware environment (4.1 / 5.5) running 100+ virtual servers, most of them are centos 6.X, ext4 w/ journaled file system.
Most of them are up to date with latest release & kernel.

Here is the scenario -

1. Server is hard booted (press the "reset" button)
2. Server is copied (turned off -> copy machine -> turned on).

90% of the time, when the server comes back from each of these two actions we are getting a "file system consistency" error, and forced to run fsck.

Now, if we don't run fsck, turn off the server -> migrate it to another host on the cluster -> turn on, system will throw an error about open files or something like that, auto-fix it and continue boot as normal.

The worst case is when we have physical host down, and another host on the cluster take control on the orphan machines and turn them on - they will not boot because they are all get stuck on the "file system consistency" error.

After running into this thread (anod noticed additional references about this in google) I started thinking there is an issue, and it's not a local problem.
It's also hard to "google" about it, I don't know exactly what to look for :)

At this point - any ideas will be welcomed..

Kind regards,
Eli.

User avatar
TrevorH
Site Admin
Posts: 33216
Joined: 2009/09/24 10:40:56
Location: Brighton, UK

Re: ext4 file system inconsistency

Post by TrevorH » 2014/06/10 14:58:49

If you press the (virtual equivalent of the) reset button then your system goes away without marking the filesystem clean so it will run fsck on the next boot and will fix files that were open when you hit reset.
The future appears to be RHEL or Debian. I think I'm going Debian.
Info for USB installs on http://wiki.centos.org/HowTos/InstallFromUSBkey
CentOS 5 and 6 are deadest, do not use them.
Use the FAQ Luke

drk
Posts: 405
Joined: 2014/01/30 20:38:28

Re: ext4 file system inconsistency

Post by drk » 2014/06/10 16:01:46

At this point - any ideas will be welcomed..
Yes, quit resetting the running VMs. Shut them down first.

Post Reply