The Debian installer could be
a lot quicker. When we install more than 2000 packages in
Skolelinux / Debian Edu using
tasksel in the installer, unpacking the binary packages take forever.
A part of the slow I/O issue was discussed in
bug #613428 about too
much file system sync-ing done by dpkg, which is the package
responsible for unpacking the binary packages. Other parts (like code
executed by postinst scripts) might also sync to disk during
installation. All this sync-ing to disk do not really make sense to
me. If the machine crash half-way through, I start over, I do not try
to salvage the half installed system. So the failure sync-ing is
supposed to protect against, hardware or system crash, is not really
relevant while the installer is running.
A few days ago, I thought of a way to get rid of all the file
system sync()-ing in a fairly non-intrusive way, without the need to
change the code in several packages. The idea is not new, but I have
not heard anyone propose the approach using dpkg-divert before. It
depend on the small and clever package
eatmydata, which
uses LD_PRELOAD to replace the system functions for syncing data to
disk with functions doing nothing, thus allowing programs to live
dangerous while speeding up disk I/O significantly. Instead of
modifying the implementation of dpkg, apt and tasksel (which are the
packages responsible for selecting, fetching and installing packages),
it occurred to me that we could just divert the programs away, replace
them with a simple shell wrapper calling
"eatmydata $program $@", to get the same effect.
Two days ago I decided to test the idea, and wrapped up a simple
implementation for the Debian Edu udeb.
The effect was stunning. In my first test it reduced the running
time of the pkgsel step (installing tasks) from 64 to less than 44
minutes (20 minutes shaved off the installation) on an old Dell
Latitude D505 machine. I am not quite sure what the optimised time
would have been, as I messed up the testing a bit, causing the debconf
priority to get low enough for two questions to pop up during
installation. As soon as I saw the questions I moved the installation
along, but do not know how long the question were holding up the
installation. I did some more measurements using Debian Edu Jessie,
and got these results. The time measured is the time stamp in
/var/log/syslog between the "pkgsel: starting tasksel" and the
"pkgsel: finishing up" lines, if you want to do the same measurement
yourself. In Debian Edu, the tasksel dialog do not show up, and the
timing thus do not depend on how quickly the user handle the tasksel
dialog.
Machine/setup |
Original tasksel |
Optimised tasksel |
Reduction |
Latitude D505 Main+LTSP LXDE |
64 min (07:46-08:50) |
<44 min (11:27-12:11) |
>20 min 18% |
Latitude D505 Roaming LXDE |
57 min (08:48-09:45) |
34 min (07:43-08:17) |
23 min 40% |
Latitude D505 Minimal |
22 min (10:37-10:59) |
11 min (11:16-11:27) |
11 min 50% |
Thinkpad X200 Minimal |
6 min (08:19-08:25) |
4 min (08:04-08:08) |
2 min 33% |
Thinkpad X200 Roaming KDE |
19 min (09:21-09:40) |
15 min (10:25-10:40) |
4 min 21% |
The test is done using a netinst ISO on a USB stick, so some of the
time is spent downloading packages. The connection to the Internet
was 100Mbit/s during testing, so downloading should not be a
significant factor in the measurement. Download typically took a few
seconds to a few minutes, depending on the amount of packages being
installed.
The speedup is implemented by using two hooks in
Debian
Installer, the pre-pkgsel.d hook to set up the diverts, and the
finish-install.d hook to remove the divert at the end of the
installation. I picked the pre-pkgsel.d hook instead of the
post-base-installer.d hook because I test using an ISO without the
eatmydata package included, and the post-base-installer.d hook in
Debian Edu can only operate on packages included in the ISO. The
negative effect of this is that I am unable to activate this
optimization for the kernel installation step in d-i. If the code is
moved to the post-base-installer.d hook, the speedup would be larger
for the entire installation.
I've implemented this in the
debian-edu-install
git repository, and plan to provide the optimization as part of the
Debian Edu installation. If you want to test this yourself, you can
create two files in the installer (or in an udeb). One shell script
need do go into /usr/lib/pre-pkgsel.d/, with content like this:
#!/bin/sh
set -e
. /usr/share/debconf/confmodule
info() {
logger -t my-pkgsel "info: $*"
}
error() {
logger -t my-pkgsel "error: $*"
}
override_install() {
apt-install eatmydata || true
if [ -x /target/usr/bin/eatmydata ] ; then
for bin in dpkg apt-get aptitude tasksel ; do
file=/usr/bin/$bin
# Test that the file exist and have not been diverted already.
if [ -f /target$file ] ; then
info "diverting $file using eatmydata"
printf "#!/bin/sh\neatmydata $bin.distrib \"\$@\"\n" \
> /target$file.edu
chmod 755 /target$file.edu
in-target dpkg-divert --package debian-edu-config \
--rename --quiet --add $file
ln -sf ./$bin.edu /target$file
else
error "unable to divert $file, as it is missing."
fi
done
else
error "unable to find /usr/bin/eatmydata after installing the eatmydata pacage"
fi
}
override_install
To clean up, another shell script should go into
/usr/lib/finish-install.d/ with code like this:
#! /bin/sh -e
. /usr/share/debconf/confmodule
error() {
logger -t my-finish-install "error: $@"
}
remove_install_override() {
for bin in dpkg apt-get aptitude tasksel ; do
file=/usr/bin/$bin
if [ -x /target$file.edu ] ; then
rm /target$file
in-target dpkg-divert --package debian-edu-config \
--rename --quiet --remove $file
rm /target$file.edu
else
error "Missing divert for $file."
fi
done
sync # Flush file buffers before continuing
}
remove_install_override
In Debian Edu, I placed both code fragments in a separate script
edu-eatmydata-install and call it from the pre-pkgsel.d and
finish-install.d scripts.
By now you might ask if this change should get into the normal
Debian installer too? I suspect it should, but am not sure the
current debian-installer coordinators find it useful enough. It also
depend on the side effects of the change. I'm not aware of any, but I
guess we will see if the change is safe after some more testing.
Perhaps there is some package in Debian depending on sync() and
fsync() having effect? Perhaps it should go into its own udeb, to
allow those of us wanting to enable it to do so without affecting
everyone.
Update 2014-09-24: Since a few days ago, enabling this optimization
will break installation of all programs using gnutls because of
bug #702711. An updated
eatmydata package in Debian will solve it.
Update 2014-10-17: The bug mentioned above is fixed in testing and
the optimization work again. And I have discovered that the
dpkg-divert trick is not really needed and implemented a slightly
simpler approach as part of the debian-edu-install package. See
tools/edu-eatmydata-install in the source package.
Update 2014-11-11: Unfortunately, a new
bug #765738 in eatmydata only
triggering on i386 made it into testing, and broke this installation
optimization again. If unblock
request 768893 is accepted, it should be working again.