Node imaging at Liverpool

Dave Love
d.love@liverpool.ac.uk

2009-03-20

We were never supplied with a working imaging setup, but the SC method is deficient anyway. For a maintainable system we need things they say can’t be done—basically tailoring a single image to multiple clients with hardware differences, e.g. disk size and type over a mixture of Sun x2200, x4100, and Supermicros. Here’s how I’m doing it on the main two heterogeneous clusters.

I’m using SystemImager, which I originally encountered as part of OSCAR (see also a tutorial somewhere on the DisCo site). I’m using the unstable release (4.1.6), but I don’t remember whether there was anything much wrong with the stable one. This is currently done off the lv1 head node for targets on both lv1 and lv3. There are currently separate lv1 and lv3 base images as I haven’t checked how feasible it is to put the Infinipath and SCore components into overrides, should SCore survive.

You first want to clean up a ‘golden’ compute node as much as sensible—we had loads of junk packages from the SC install. Then basically follow the HOWTO1 If you’re going to image all the nodes at once, use the bittorrent server.

  # chkconfig | grep system
  systemimager-server-bittorrent     off
  systemimager-server-flamethrowerd  off
  systemimager-server-monitord       on
  systemimager-server-netbootmond    on
  systemimager-server-rsyncd         on

After setting up the services on the head, install the relevant packages on the golden client and use si_prepareclient there to generate the image. You want the UYOK option with a complete set of drivers in the initrd (which is the default), since the ‘standard’ kernel/initrd you get otherwise doesn’t have the right disk drivers for the x4100s (or x2200s?).2

  si_prepareclient --server lv1

[Some things I fell foul of at that stage on the golden client, thanks to SC: make sure /etc/fstab lists / first; make sure /etc/sysconfig/bootloader is set up properly for grub, and that grub is installed; make sure /boot/grub/menu.lst specifies a device name for booting, not a filesystem UUID, which will be wrong.]

Then on the head (or other image server):

  si_getimage --golden-client lvinfi081 --image lvinfi --post-install reboot

Now prepare the target client for PXE boot install. The Systemimager tool is si_mkclientnetboot, but pxeconfig seems better:

  pxeconfig lvinfi104 --filename default.install

That will make the appropriate PXE file in /tftpboot/pxelinux.cfg. Make sure it’s using the kernel and initrd from the golden client, not the ‘standard’ ones somewhere the chrooted tftp server can see, e.g. a default.install like:3

  DISPLAY message.txt
  PROMPT 1
  TIMEOUT 50
  console=ttyS1,19200
  
  DEFAULT systemimager
  LABEL systemimager
  KERNEL /x86_64/lvinfi/kernel
  APPEND initrd=/x86_64/lvinfi/initrd.img root=/dev/ram ramdisk_blocksize=1024 ramdisk_size=80000 MONITOR_SERVER=192.168.2.25

  LABEL localhost
  LOCALBOOT 0
  

You might want to make sure you have a proper console view via [IE]LOM. Then reset the client and it should image itself. If you’re doing many nodes, you might want the monitor server to watch rough progress.

It took ∼9 minutes when I last timed it properly, but I realized it was then installing ∼250 MB of junk from spool, and a lot of the time is taken by BIOS-level stuff during the two boots involved.4 The images I have without that junk are ∼1 GB, which is probably too big, but it includes Infiniband stuff in that case, and I haven’t tried to build one from scratch recently. It shouldn’t take much longer for many nodes if you use bittorrent.

If you have heterogeneous nodes, like us, you may want to supply overrides to image them all from the same image on the server, which is what SC say you can’t, e.g. this fixes our supermicros:

  # cat /var/lib/systemimager/overrides/supermicro/etc/modprobe.d/ipmi 
  # The ipmi startup fails without this, and then ipmitool doesn't work.
  # module parameters are from
  # ftp://ftp.supermicro.com/utility/Supero_Doctor_II/Linux/README-IPMI.htm
  options ipmi_si type=kcs ports=0xca8 regspacings=4

Here’s the guts of what I used as the result of si_clusterconfig -e, i.e. /etc/systemimager/cluster.xml—yuk:

  <?xml version='1.0' standalone='yes'?>
  <xml>	<!-- The image server hostname. -->
  	<master>lv1</master>
  	<name>all</name>
  	<override>all</override>
  	<group>
  		<name>lvinfi</name>
  		<priority>20</priority>
  		<image>lvinfi</image>
  		<override>lvinfi</override>
  		<node>lvinfi000-lvinfi107</node>
  	</group>
  	<group>
  		<name>supermicro</name>
  		<priority>40</priority>
  		<image>lvinfi</image>
  		<override>supermicro</override>
  		<node>lvinfi050-lvinfi107</node>
  	</group>
  	<group>
  		<name>sun</name>
  		<priority>40</priority>
  		<image>lvinfi</image>
  		<override>sun</override>
  		<node>lvinfi000-lvinfi049</node>
  	</group>
  </xml>

You can use this method to make a backup image of the image server to itself, which presumably only makes sense if the image is stored on a remote filesystem.


1 But s/i386/x86_64/ in it, and you may not want to use si_mkbootserver—just check that the image server, tftp, and DHCP server are set up properly.

2 If you’re imaging to heterogeneous nodes, you must build an initrd for normal booting with all the necessary disk drivers. E.g. for our mix of x2200, x4100, and Supermicro, in /etc/sysconfig/kernel, use INITRD_MODULES="pata_amd mptsas pata_serverworks sata_nv sata_svw pata_amd sata_nv processor thermal fan jbd ext3 edd", and run mkinitrd before si_prepareclient.

Also, if you have different disk sizes, you can avoid either wasting space on the biggest or having multiple install scripts (what’s in /var/lib/systemimager/scripts) by editing /etc/systemimager/autoinstallinstallscript.conf after doing si_prepareclient to replace the size of both the last partition (which should be for /tmp) and the extended one with *. The OSCAR setup does this more cleanly, but requires its database setup. The edit above only has to be done once if you avoid over-writing the script at the si_getimage stage in future.

3 The flavour of kernel/initrd isn’t important as long as it has the appropriate network and disk drivers to work on the nodes to be imaged, e.g. this one from the Infiniband golden node is used for the GigE nodes too.

4 You could modify the install script to do incremental rsyncs to update an existing implementation, like Streamline apparently do, but that probably stops you using bittorrent. Otherwise you can use si_pushupdate or si_pushoverrides to distribute updates.

liverpool-imaging.html

LivImaging (last edited 2009-03-20 15:13:11 by DaveLove)

This website maintained by Research Computing Services, University of Manchester