Node imaging at Liverpool
d.love@liverpool.ac.uk
2009-03-20
We were never supplied with a working imaging setup, but the SC method is deficient anyway. For a maintainable system we need things they say can’t be done—basically tailoring a single image to multiple clients with hardware differences, e.g. disk size and type over a mixture of Sun x2200, x4100, and Supermicros. Here’s how I’m doing it on the main two heterogeneous clusters.
I’m using SystemImager, which I originally encountered as part of OSCAR (see also a tutorial somewhere on the DisCo site). I’m using the unstable release (4.1.6), but I don’t remember whether there was anything much wrong with the stable one. This is currently done off the lv1 head node for targets on both lv1 and lv3. There are currently separate lv1 and lv3 base images as I haven’t checked how feasible it is to put the Infinipath and SCore components into overrides, should SCore survive.
You first want to clean up a ‘golden’ compute node as much as sensible—we had loads of junk packages from the SC install. Then basically follow the HOWTO1 If you’re going to image all the nodes at once, use the bittorrent server.
# chkconfig | grep system systemimager-server-bittorrent off systemimager-server-flamethrowerd off systemimager-server-monitord on systemimager-server-netbootmond on systemimager-server-rsyncd on
After setting up the services on the head, install the relevant packages on
the golden client and use si_prepareclient there to generate the image. You
want the UYOK option with a complete set of drivers in the initrd (which is
the default), since the ‘standard’ kernel/initrd you get otherwise doesn’t
have the right disk drivers for the x4100s (or x2200s?).2
si_prepareclient --server lv1
[Some things I fell foul of at that stage on the golden client, thanks to SC: make sure /etc/fstab lists / first; make sure /etc/sysconfig/bootloader is set up properly for grub, and that grub is installed; make sure /boot/grub/menu.lst specifies a device name for booting, not a filesystem UUID, which will be wrong.]
Then on the head (or other image server):
si_getimage --golden-client lvinfi081 --image lvinfi --post-install reboot
Now prepare the target client for PXE boot install. The Systemimager tool is
si_mkclientnetboot, but
pxeconfig seems better:
pxeconfig lvinfi104 --filename default.install
That will make the appropriate PXE file in /tftpboot/pxelinux.cfg. Make sure it’s using the kernel and initrd from the golden client, not the ‘standard’ ones somewhere the chrooted tftp server can see, e.g. a default.install like:3
DISPLAY message.txt PROMPT 1 TIMEOUT 50 console=ttyS1,19200 DEFAULT systemimager LABEL systemimager KERNEL /x86_64/lvinfi/kernel APPEND initrd=/x86_64/lvinfi/initrd.img root=/dev/ram ramdisk_blocksize=1024 ramdisk_size=80000 MONITOR_SERVER=192.168.2.25 LABEL localhost LOCALBOOT 0
You might want to make sure you have a proper console view via [IE]LOM. Then reset the client and it should image itself. If you’re doing many nodes, you might want the monitor server to watch rough progress.
It took ∼9 minutes when I last timed it properly, but I realized it was then installing ∼250 MB of junk from spool, and a lot of the time is taken by BIOS-level stuff during the two boots involved.4 The images I have without that junk are ∼1 GB, which is probably too big, but it includes Infiniband stuff in that case, and I haven’t tried to build one from scratch recently. It shouldn’t take much longer for many nodes if you use bittorrent.
If you have heterogeneous nodes, like us, you may want to supply overrides to image them all from the same image on the server, which is what SC say you can’t, e.g. this fixes our supermicros:
# cat /var/lib/systemimager/overrides/supermicro/etc/modprobe.d/ipmi # The ipmi startup fails without this, and then ipmitool doesn't work. # module parameters are from # ftp://ftp.supermicro.com/utility/Supero_Doctor_II/Linux/README-IPMI.htm options ipmi_si type=kcs ports=0xca8 regspacings=4
Here’s the guts of what I used as the result of si_clusterconfig -e,
i.e. /etc/systemimager/cluster.xml—yuk:
<?xml version='1.0' standalone='yes'?> <xml> <!-- The image server hostname. --> <master>lv1</master> <name>all</name> <override>all</override> <group> <name>lvinfi</name> <priority>20</priority> <image>lvinfi</image> <override>lvinfi</override> <node>lvinfi000-lvinfi107</node> </group> <group> <name>supermicro</name> <priority>40</priority> <image>lvinfi</image> <override>supermicro</override> <node>lvinfi050-lvinfi107</node> </group> <group> <name>sun</name> <priority>40</priority> <image>lvinfi</image> <override>sun</override> <node>lvinfi000-lvinfi049</node> </group> </xml>
You can use this method to make a backup image of the image server to itself, which presumably only makes sense if the image is stored on a remote filesystem.
1 But s/i386/x86_64/ in it, and you may not want to use si_mkbootserver—just check that the image server, tftp, and DHCP server are set up properly.
2 If you’re imaging to heterogeneous nodes, you must build an initrd for normal booting with all the necessary disk drivers. E.g. for our mix of x2200, x4100, and Supermicro, in /etc/sysconfig/kernel, use INITRD_MODULES="pata_amd mptsas pata_serverworks sata_nv sata_svw pata_amd sata_nv processor thermal fan jbd ext3 edd", and run mkinitrd before si_prepareclient.
Also, if you have different disk sizes, you can avoid either wasting space on the biggest or having multiple install scripts (what’s in /var/lib/systemimager/scripts) by editing /etc/systemimager/autoinstallinstallscript.conf after doing si_prepareclient to replace the size of both the last partition (which should be for /tmp) and the extended one with *. The OSCAR setup does this more cleanly, but requires its database setup. The edit above only has to be done once if you avoid over-writing the script at the si_getimage stage in future.
3 The flavour of kernel/initrd isn’t important as long as it has the appropriate network and disk drivers to work on the nodes to be imaged, e.g. this one from the Infiniband golden node is used for the GigE nodes too.
4 You could modify the install script to do incremental rsyncs to update an existing implementation, like Streamline apparently do, but that probably stops you using bittorrent. Otherwise you can use si_pushupdate or si_pushoverrides to distribute updates.