Bug#445148: fsck during system boot fails with separate /boot partition

Discussion:

(too old to reply)

Frans Pop

2007-10-08 10:50:07 UTC

Hi,

We've had an installation report [1] for Debian Etch where a user set up a
separate /boot partition. The first boot into the new system failed because
the fsck on the /boot partition failed:
Checking file systems...fsck 1.40-WIP (14-Nov-2006)
/dev/dasda1 is mounted. e2fsck: Cannot continue, aborting.

However, neither 'mount' or /proc/mounts show the /dev/dasda1 partition as
mounted.

I've reproduced the issue and my suspicion is that either the boot loader or
the kernel is failing to release /dev/dasda1 after the kernel and/or initrd
have been loaded which leads to fsck concluding that it is mounted.

Cheers,
Frans Pop

Peter 1 Oberparleiter

2007-10-08 12:00:10 UTC

Permalink

Post by Frans Pop
I've reproduced the issue and my suspicion is that either the boot loader or
the kernel is failing to release /dev/dasda1 after the kernel and/or initrd
have been loaded which leads to fsck concluding that it is mounted.

Judging from the data found in the bugtracker, this sounds more like a

Post by Frans Pop
Configuration is a 125 cylinder /dev/dasda1, which is to be mounted
as /boot, and a large /dev/dasdb1 which is to be mounted as /.
df: Filesystem 1k-blocks Used Available Use% Mounted on
df: tmpfs 257928 80 257848 0% /dev
df: tmpfs 257928 80 257848 0% /dev
df: tmpfs 257928 80 257848 0% /.dev
df: /dev/dasda1 5781776 557576 4930500 10% /target

DF shows that /dev/dasda1 is >5G while a dasd with 125 cylinder should be
around 80-90MB.

Example for a correct setup:

/dev/dasda1 mount as /boot device number 1000
/dev/dasdb1 mount as / device number 1001

->

Kernel command line: dasd=1000-1001 root=/dev/dasdb1

/etc/fstab contents:

/dev/dasda1 /boot
/dev/dasdb1 /

Regards,
Peter
--
Peter Oberparleiter
Linux on System z Development
IBM Deutschland Entwicklung GmbH

--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org

Frans Pop

2007-10-08 12:50:10 UTC

Permalink

Post by Peter 1 Oberparleiter
Judging from the data found in the bugtracker, this sounds more like a

That was my initial thought as well. However, I could reproduce the issue
without any problems during a new installation in Hercules.

I got exactly the same fsck error during boot and I'm fairly sure there are
no configuration errors as I checked several times both during and after
the installation.
- /etc/fstab is correct
- except for the fsck failure, the system boots correctly
- if <pass> in /etc/fstab is set to 0 for /boot, the system boots without
any problems

I have also verified that, while in maintenance mode after the fsck failure,
both mount and /proc/mounts really do not show /dev/dasda1 mounted. Only /
(/dev/dasdb1) is shown as mounted at that point, as you'd expect.

Post by Peter 1 Oberparleiter
DF shows that /dev/dasda1 is >5G while a dasd with 125 cylinder should be
around 80-90MB.

That df output (and the whole hardware summary) is almost certainly from his
second, successful install where he did *not* use a separate /boot
partition. Otherwise /boot would also have been listed there (as it is for
my own installation). Confusing, but not relevant.

I'm convinced there's a real bug here.

Cheers,
Frans Pop

--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org

Peter 1 Oberparleiter

2007-10-08 13:00:14 UTC

Permalink

Post by Frans Pop
I got exactly the same fsck error during boot and I'm fairly sure there are
no configuration errors as I checked several times both during and after
the installation.
- /etc/fstab is correct
- except for the fsck failure, the system boots correctly
- if <pass> in /etc/fstab is set to 0 for /boot, the system boots without
any problems

This might indicate a timing problem - the init script tries to mount a
DASD that has not yet been fully initialized by the kernel (the return
code -EBUSY that triggers the "already mounted" message may indicate
different problems). Try to add a sleep 5 into the init script just before
the mount command that fails to check if this is the case.

Regards,
Peter
--
Peter Oberparleiter
Linux on System z Development
IBM Deutschland Entwicklung GmbH

--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org

Frans Pop

2007-10-08 15:20:19 UTC

Permalink

Post by Peter 1 Oberparleiter
This might indicate a timing problem - the init script tries to mount a
DASD that has not yet been fully initialized by the kernel (the return
code -EBUSY that triggers the "already mounted" message may indicate
different problems). Try to add a sleep 5 into the init script just
before the mount command that fails to check if this is the case.

I've found the root of the problem. It's a configuration problem after all.

We currently do not add the 'dasd=' parameter to the kernel boot arguments.
Instead, we let udev assign device names.

Because of the 'root=' parameter (in which we use a by-path device name),
the first dasd that is detected in my test is 0123, which ends up as dasda
and thus 0122 ends up as dasdb, effectively swapping the two dasds. And as
we still use the classic device names in /etc/fstab, the result is chaos.

The confusion came from the fact that mount still lists / mounted as
/dev/dasd_b_1, even if it is actually mounted as /dev/dasd_a_1.
So, fsck is completely correct in reporting /dev/dasda1 as already mounted.
It also means that when /boot is mounted later, it's not actually the boot
partition that is mounted, but it is the root partition that is mounted
(for a second time) on /boot.

Adding the dasd= boot parameter did not help - it seems to be ignored in our
initrds; changing /etc/fstab to use /dev/disk/by-path devices consistently
does result in a correct boot.
I'll consult with other Debian people how we want to resolve this.

Thanks very much for your replies, which did help in straightening this out
for me.

Cheers,
Frans Pop

--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org

Adam Thornton

2007-10-08 15:50:13 UTC

Permalink

Post by Frans Pop
Adding the dasd= boot parameter did not help - it seems to be
ignored in our
initrds; changing /etc/fstab to use /dev/disk/by-path devices
consistently
does result in a correct boot.
I'll consult with other Debian people how we want to resolve this.

I think /dev/disk/by-path is probably the best choice for an s390
system. Here's why:

Most s390/zSeries users run either under z/VM or Hercules; in either
case, they have a virtualization environment layer of some kind
available to them. Because of this, it's really easy to clone disks
via a simple copy (either at the track level in z/VM with DDR or with
cp in hercules) and very common to make rescue systems by simply
attaching another guest's disks to a still-good guest.

For these reasons, device detection order is not a good idea, and
disk UUID or label are also poor choices since cloned disk devices
are so common. On the other hand, most organizations have at least
an informal standard for how to map disks. For example, we do the
following:
150 is the IPL device (and sometimes all of /); 151 is usually swap;
152-15F are system disks, 160-16F are disks for users or
applications. At least in the environments in which I work and
consult, device addressing is a more reliable guide to what disk
should be mounted where than any of the other options.

Adam

--
To UNSUBSCRIBE, email to debian-bugs-dist-***@lists.debian.org
with a subject of "unsubscribe". Trouble? Contact ***@lists.debian.org