dnsmasq -- buy 1 get 2 free!

I mentioned earlier that we netboot (PXE) our cluster. Before NFS-root begins, some things have to take place. Namely, the kernel needs to be served, IP assigned, DNS look-ups need to be made to figure out where servers are and so on. Primarily 3 protocols are in the mix at this time, TFTP, DHCP, DNS. We used to run 3 individual applications to handle all of this, they're all in their own right quite fine applications atftpd, Bind9, DHCP (from ISC). But it just becomes too much to look after, you have a config file for each of the daemons as well as databases with node information. Our configuration used MySQL and PHP to generate all the databases for these daemons. This way you would only have to maintain one central configuration. Which means you need to look after yet another daemon to make it all work. You add all of this together and it becomes one major headache.

Several months ago I had installed openWRT onto a router at home. While configuring openWRT I came across something called dnsmasq. By default, on openWRT, dnsmasq handles DNS and DHCP. I thought it was spiffy to merge the 2 services .. after all they are so often run together (on internal networks). The name stuck in my head as something to pay bit more attention to. Somewhere along the line I got some more experience with dnsmasq, and had discovered it also had TFTP support. Could it be possible what we use 4 daemons could be accomplished with just one?

So when the opportunity arose I dumped all node address information out of the MySQL database into a simple awk-parsable flat file. I wrote a short parsing script which took the central database and spit out a file dnsmasq.hosts (with name/IP pairs) and another dnsmasq.nodes (with MAC-address/name pairs). Finally I configured the master (static) dnsmasq.conf file to start all the services I needed (DNS, DHCP, TFTP), include the dnsmasq.hosts and dnsmasq.nodes files. Since the dnsmasq.nodes includes a category flag it is trivial to tell which group of nodes should use what TFTP images and what kind of DHCP leases they should be served.

Dnsmasq couldn't offer a more simple and intuitive configuration with 1/2 days work I was able to greatly improve upon on old system and make a lot more manageable. There is only one gripe I have with dnsmasq, I wish it would be possible to just have one configuration line per node that is have the name, IP, and mac address all in one line. If this was the case I wouldn't even need an awk script to make the config file (although it turned out to be handy because I also use the same file to generate a nodes list for torque). But its understandable since there are instances where you only want to run a DHCP server or just DNS server and so having DHCP and DNS information on one line wouldn't make much sense.

Scalability for dnsmasq is something to consider. Their website claims that it has been tested with installation of up to 1000 nodes, which might or might not be a problem. Depending on what type of configuration your building. I kind of wonder what happens at the 1000s of machines level. How will its performance degrade, and how does that compare to say the other TFTP/DHCP servers (BIND9 is know to work quite well with a lot of data).

Here are some configuration examples:

Master Flat file node database

#NODES file it needs to be processed by nodesFileGen
#nodeType nodeIndex nic# MACAddr

nfsServer 01 1
nfsServer 02 1

headNode 00 1 00:00:00:00:00:00

#Servers based on the supermicro p2400 hardware (white 1u supermicro

box) server_sm2400 miscServ 1 00:00:00:00:00:00 server_sm2400 miscServ 2 00:00:00:00:00:00 #dual 2.4ghz supermicro nodes node2ghz 01 1 00:00:00:00:00:00 node2ghz 02 1 00:00:00:00:00:00 node2ghz 03 1 00:00:00:00:00:00 ...[snip]...

#dual 3.4ghz dell nodes
node3ghz 01 1 00:00:00:00:00:00
node3ghz 02 1 00:00:00:00:00:00
node3ghz 03 1 00:00:00:00:00:00
...[snip]...

Flat File DB Parser script

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
#!/bin/bash

#intput sample
#type number nic# mac addr
#nodeName 07 1 00:00:00:00:00:00

#output sample
#ip hostname
#10.0.103.10 nodeName10
awk '
  /^headNode.*/ {printf("10.0.0.3 %s\
", $1)};                          \
  /^server_sm2400.*/ {printf("10.0.3.%d %s\
", $3, $2)};                      \
  /^nfsServer.*/ {printf("10.0.1.%d %s%02d\
", $2, $1, $2)};          \
  /^node2ghz.*/ {printf("10.0.100.%d %s%02d\
", $2, $1, $2)};          \
  /^node3ghz.*/ {printf("10.0.101.%d %s%02d\
", $2, $1, $2)};          \
  '

\ ~/data/nodes.db > /etc/dnsmasq.hosts

#output sample
#mac,netType,hostname,hostname
#00:00:00:00:00:00,net:nodeName,nodeName10,nodeName10
awk '
  /^headNode.*/ {printf("%s,net:%s,%s,%s\
", $4, $1, $1, $1)};                      \
  /^server_sm2400.*/ {printf("%s,net:%s,%s,%s\
", $4, $1, $2, $2)};              \
  /^node2ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\
", $4, $1, $1, $2, $1, $2)};      \
  /^node3ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\
", $4, $1, $1, $2, $1, $2)};      \
  '

\ ~/data/nodes.db > /etc/dnsmasq.nodes

#output sample
#hostname np=$CPUS type
#nodeName10 np=8 nodeName
awk '
  /^node2ghz.*/ {printf("%s%02d np=2 node2ghz\
", $1, $2)};              \
  /^node3ghz.*/ {printf("%s%02d np=2 node3ghz\
", $1, $2)};              \
  '

\ ~/data/nodes.db > /var/spool/torque/server_priv/nodes

#Lets reload dnsmasq now
killall -HUP dnsmasq

dnsmasq.conf

interface=eth0
dhcp-lease-max=500
domain=myCluster
enable-tftp
tftp-root=/srv/tftp
dhcp-option=3,10.0.0.1
addn-hosts=/etc/dnsmasq.hosts
dhcp-hostsfile=/etc/dnsmasq.nodes

dhcp-boot=net:misc,misc/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:misc,10.0.200.0,10.0.200.255,12h

dhcp-boot=net:headNode,headNode/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:headNode,10.0.0.3,10.0.0.3,12h

dhcp-boot=net:server_sm2400,server_sm2400/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:server_sm2400,10.0.0.3,10.0.0.3,12h

dhcp-boot=net:node2ghz,node2ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node2ghz,10.0.100.0,10.0.100.255,12h

dhcp-boot=net:node3ghz,node3ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node3ghz,10.0.101.0,10.0.101.255,12h
Debian  LILUG  News  Software  Super Computers  2008-03-13T00:30:40