I mentioned earlier that we netboot (PXE) our cluster. Before NFS-root begins, some things have to take place. Namely, the kernel needs to be served, IP assigned, DNS look-ups need to be made to figure out where servers are and so on. Primarily 3 protocols are in the mix at this time, TFTP, DHCP, DNS. We used to run 3 individual applications to handle all of this, they're all in their own right quite fine applications atftpd, Bind9, DHCP (from ISC). But it just becomes too much to look after, you have a config file for each of the daemons as well as databases with node information. Our configuration used MySQL and PHP to generate all the databases for these daemons. This way you would only have to maintain one central configuration. Which means you need to look after yet another daemon to make it all work. You add all of this together and it becomes one major headache.
Several months ago I had installed openWRT onto a router at home. While configuring openWRT I came across something called dnsmasq. By default, on openWRT, dnsmasq handles DNS and DHCP. I thought it was spiffy to merge the 2 services .. after all they are so often run together (on internal networks). The name stuck in my head as something to pay bit more attention to. Somewhere along the line I got some more experience with dnsmasq, and had discovered it also had TFTP support. Could it be possible what we use 4 daemons could be accomplished with just one?
So when the opportunity arose I dumped all node address information out of the MySQL database into a simple awk-parsable flat file. I wrote a short parsing script which took the central database and spit out a file dnsmasq.hosts (with name/IP pairs) and another dnsmasq.nodes (with MAC-address/name pairs). Finally I configured the master (static) dnsmasq.conf file to start all the services I needed (DNS, DHCP, TFTP), include the dnsmasq.hosts and dnsmasq.nodes files. Since the dnsmasq.nodes includes a category flag it is trivial to tell which group of nodes should use what TFTP images and what kind of DHCP leases they should be served.
Dnsmasq couldn't offer a more simple and intuitive configuration with 1/2 days work I was able to greatly improve upon on old system and make a lot more manageable. There is only one gripe I have with dnsmasq, I wish it would be possible to just have one configuration line per node that is have the name, IP, and mac address all in one line. If this was the case I wouldn't even need an awk script to make the config file (although it turned out to be handy because I also use the same file to generate a nodes list for torque). But its understandable since there are instances where you only want to run a DHCP server or just DNS server and so having DHCP and DNS information on one line wouldn't make much sense.
Scalability for dnsmasq is something to consider. Their website claims that it has been tested with installation of up to 1000 nodes, which might or might not be a problem. Depending on what type of configuration your building. I kind of wonder what happens at the 1000s of machines level. How will its performance degrade, and how does that compare to say the other TFTP/DHCP servers (BIND9 is know to work quite well with a lot of data).
Here are some configuration examples:
Master Flat file node database
#NODES file it needs to be processed by nodesFileGen #nodeType nodeIndex nic# MACAddr nfsServer 01 1 nfsServer 02 1 headNode 00 1 00:00:00:00:00:00 #Servers based on the supermicro p2400 hardware (white 1u supermicro
box) server_sm2400 miscServ 1 00:00:00:00:00:00 server_sm2400 miscServ 2 00:00:00:00:00:00 #dual 2.4ghz supermicro nodes node2ghz 01 1 00:00:00:00:00:00 node2ghz 02 1 00:00:00:00:00:00 node2ghz 03 1 00:00:00:00:00:00 ...[snip]...
#dual 3.4ghz dell nodes node3ghz 01 1 00:00:00:00:00:00 node3ghz 02 1 00:00:00:00:00:00 node3ghz 03 1 00:00:00:00:00:00 ...[snip]...
Flat File DB Parser script
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21#!/bin/bash #intput sample #type number nic# mac addr #nodeName 07 1 00:00:00:00:00:00 #output sample #ip hostname #10.0.103.10 nodeName10 awk ' /^headNode.*/ {printf("10.0.0.3 %s\ ", $1)}; \ /^server_sm2400.*/ {printf("10.0.3.%d %s\ ", $3, $2)}; \ /^nfsServer.*/ {printf("10.0.1.%d %s%02d\ ", $2, $1, $2)}; \ /^node2ghz.*/ {printf("10.0.100.%d %s%02d\ ", $2, $1, $2)}; \ /^node3ghz.*/ {printf("10.0.101.%d %s%02d\ ", $2, $1, $2)}; \ '
\ ~/data/nodes.db > /etc/dnsmasq.hosts
#output sample #mac,netType,hostname,hostname #00:00:00:00:00:00,net:nodeName,nodeName10,nodeName10 awk ' /^headNode.*/ {printf("%s,net:%s,%s,%s\ ", $4, $1, $1, $1)}; \ /^server_sm2400.*/ {printf("%s,net:%s,%s,%s\ ", $4, $1, $2, $2)}; \ /^node2ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\ ", $4, $1, $1, $2, $1, $2)}; \ /^node3ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\ ", $4, $1, $1, $2, $1, $2)}; \ '
\ ~/data/nodes.db > /etc/dnsmasq.nodes
#output sample #hostname np=$CPUS type #nodeName10 np=8 nodeName awk ' /^node2ghz.*/ {printf("%s%02d np=2 node2ghz\ ", $1, $2)}; \ /^node3ghz.*/ {printf("%s%02d np=2 node3ghz\ ", $1, $2)}; \ '
\ ~/data/nodes.db > /var/spool/torque/server_priv/nodes
#Lets reload dnsmasq now killall -HUP dnsmasq
dnsmasq.conf
Debian LILUG News Software Super Computers 2008-03-13T00:30:40-04:00interface=eth0 dhcp-lease-max=500 domain=myCluster enable-tftp tftp-root=/srv/tftp dhcp-option=3,10.0.0.1 addn-hosts=/etc/dnsmasq.hosts dhcp-hostsfile=/etc/dnsmasq.nodes dhcp-boot=net:misc,misc/pxelinux.0,nodeServer,10.0.0.2 dhcp-range=net:misc,10.0.200.0,10.0.200.255,12h dhcp-boot=net:headNode,headNode/pxelinux.0,nodeServer,10.0.0.2 dhcp-range=net:headNode,10.0.0.3,10.0.0.3,12h dhcp-boot=net:server_sm2400,server_sm2400/pxelinux.0,nodeServer,10.0.0.2 dhcp-range=net:server_sm2400,10.0.0.3,10.0.0.3,12h dhcp-boot=net:node2ghz,node2ghz.cfg,nodeServer,10.0.0.2 dhcp-range=net:node2ghz,10.0.100.0,10.0.100.255,12h dhcp-boot=net:node3ghz,node3ghz.cfg,nodeServer,10.0.0.2 dhcp-range=net:node3ghz,10.0.101.0,10.0.101.255,12h