LILUG talks posted
I've finally caught up on rendering all the LILUG presentations. A total of 5
were posted within last 24 hours, among them the much requested Eric S.
Raymond Q/A session. If your clock reads around 1236983978 UET then some of
the talks have not been derived yet. That means you can't stream them or watch
the thumbnails, but you can download them in the glorious OGG Theora format.
Enough talking, find links below.
Enjoy
LILUG
News
2009-03-13T18:48:11-04:00
Mplayer: Subtitles & black bars
If you have ever watched a wide-screen foreign film with subtitles you might
have noticed that the subtitles are usually put inside the picture. I find
this extremely annoying as it makes the subtitles harder to read. It doesn't
make much sense, if you already have black bars from the aspect ratio
adjustment why not use them for subtitles? Fortunately if you use mplayer you
can. Just add the following to your personal mplayer config file
~/.mplayer/config or the global /etc/mplayer/mplayer.conf
ass=1
ass-font-scale=1.5
ass-use-margins=1
vf=expand=:::::8/5
You need to adjust the last line to the aspect ratio of your screen. As a side
effect all videos (even in windowed mode) will have the black bars added to
them to pad them out to the aspect ratio, its a small price to pay.
LILUG
Software
2009-03-13T00:36:17-04:00
Tricky Tricky
while(c1){
switch(c2){
case 1:
aOk();
continue;
case 2:
liveToSeeAnotherDay();
continue;
case 3:
oopsyDaisy();
break;
}
break;
}
Code
Lilug
Software
2009-02-03T12:41:47-05:00
Photo on Linux
I've gotten into a photo mood lately. Both shooting editing and organizing.
With it I've discovered some new useful tools as well gotten to know the ones
I've used before better.
The first and foremost is digiKam its a photo
manager. Its primary job is maintain a database of all your photos. Photo-
managers are not something most people have used so it might need some getting
used to. The interface for digiKam is quite intuitive and easy too pick up.
And for the average photo junkie it will have everything they need. But it
certainly lacks some features which I think a self respecting photo managers
must have. Here are some things I wish it had:
- Way to flag a photo as a different version of another. (They should share all meta-data such as tags and description)
- Split into a backend/frontend for centralized photo management. (KDE4 version supports multiple database roots so this can be used as a workaround)
- Multi user support. If your whole family goes on a trip being able to collaborate on an album is essential
- Export/import album with all meta-data (so one can share a whole album with someone else)
- Save export options of raw images along with raw image.
- HTML album generator needs to include meta-data (description, tags etc..)
- Better gallery2 integration
- better support for raw images (does not scale raw on upload).
- Automatically fill out gallery title and description using local info.
- Ability to preview pictures on select.
- Better error messages.
Most of these issues are not major, especially since some of these will be
solved with the multi-root support of the KDE4 release. I started with the
negatives but it has a lot of cool features also. One of my favorite is
calendar view. Regardless of how your galleries are organized it will use the
EXIF date tag to arrange all your photos by date. It really helps when
organizing photos. Tagging is also very useful, you can tag any photo and then
you can view all photos by particular tag really make it easy to organize
data. DigiKam also has a slew of semi-functional export features such as
gallery2, flickr, and picasa. These are provided through the kipi framework,
they are nice but most require some more work to become completely feature-
full and userfriendly.
Almost forgot, digiKam is also an excellent tool for downloading photos from
cameras. Most cameras are not plain UMS devices so they need special software
to fetch the pictures out of them. If you are on windows you can usually use
the manufacturer software to do this, but on Linux this is a tad complicated.
Unless of course you use digiKam -- which turns the process into a magic
"detect [the camera type] and download" single click operation.
To share my photos with the world I use a web based photo-manager as a front-
end to my local database. Its called gallery. I
have tried this tool in the past and it was just too cumbersome to use (I
ended up writing my own PHP gallery system). But with the kipi export plug-in
to digiKam and the remote plug in to gallery life just become easy.
The last few tools are only important for someone who is seriously into
photography. The first is a gimp plug-in called
ufraw, its basically a frontend to
dcraw. It allows you to preform
advanced raw editing before you import your photo to gimp -- you can adjust
almost any aspect of your raw file
conversion
(lightness, white balance, hue, saturation..). UFRaw is a bit daunting but you
don't always have to use all the features it provides, lightness is probably
the only one you'll have to adjust on a regular basis. Another tool is called
exiftool its used to read and
manipulate EXIF information in pictures. There are times where you can loose
the EXIF data while editing a photo (IE when saving to png in gimp) and using
this tool you can quickly clone the EXIF info of one file onto another using
the -TagsFromFile option. It even supports batch mode, for example "exiftool
-TagsFromFile IMG_%4.4f.CR2 *.png" will copy the EXIF information to all PNGs
from its parent file using the file name as mapping (sample file names:
IMG_2565.png IMG_2573_1.png IMG_2565.CR2 IMG_2573.CR2)
So that's it for now, shoot away. And if you like, you can check out my
public gallery.
LILUG
News
Software
2008-06-17T23:19:55-04:00
CenterIM; History format
My instant message client of choice is centerim (a
fork of centericq). It does everything I
need, send and receive messages in a very simple interface. Now this might
sound like any ordinary client out there. But its special in that it runs
completely in the terminal (ncurses based) -- and its good at it. I've tried
some other terminal based clients and they all feel very cumbersome.
One major inconvenience with ncurses applications is the lack of clearly
defined text areas. So copying text out is not trivial in fact its nearly
impossible. So usually if I need to get text out of the application I just
look in its log files. Unfortunately centerim has not-so-convenient history
log format. It looks something like this:
IN
MSG
1212455295
1212455295
pong
OUT
MSG
1212455668
1212455668
pong
(each message entry is separated by "\f\ " not just "\ ")
So using a little awk magic I wrote a simple converter which parses history
file into something more readable and something you can paste as a quote.
gawk -vto=anon -vfrom=me 'BEGIN {FS="\ ";RS="\f\ ";}{if (match($1,"IN"))
a=to; else a=from; printf("%s %s:\t %s\ ", strftime("%H:%M:%S", $4), a,
$5);for (i=6; i<=NF;i++) printf("\t\t%s\ ", $i);}' /PATH/TO/HISTORY/FILE
You need to modify the -vto and -vfrom values to correspond to your name and
the name of the person you're talking to. You obviously need to also specify
the path to the file. If you don't like the time stamp you can alter the
string passed to strftime (man 3 strftime for format options).
Sample output of the above sample looks like this.
21:08:15 anon: ping
21:14:28 me: pong
LILUG
News
Software
2008-06-04T14:25:12-04:00
Little Color in Your Bash Prompt
I have accounts on many computer systems (around 10) which together add up to
several hundred machines. And I often find myself having ssh sessions open to
multiple machines doing different things simultaneously. More than once I have
executed something on a wrong machine. Most of the case its not a problem, but
every-now and I'll manage to reboot the wrong machine or do something else
equally bad. Its really an easy mistake to make, especially when you have half
a dozen shell tabs open and screen running in many of the tabs.
I had spent some time pondering about a good solution to this problem. I
already had bash configured to show the machine name as part of the prompt
(IE: dotCOMmie@laptop:~$) but it was not enough, its easy to overlook the name
or even the path. So one great day I got the idea to color my prompt
differently on my machines using ANSI color escape
codes. This worked quite well, at a
single glance at the prompt I had an intuitive feel for what machine I was
typing on -- even without paying attention to the hostname in the prompt. But
this solution was not perfect as I would have to manually pick a new color for
each machine.
For the next iteration of the colored prompt I decided to write a simple
program which would take a string (Hostname) as an argument, hash it down into
a small number and map it to a color. I called this little app t2cc (text to
color code), you can download t2cc from
the project page. The source doesn't need any external libraries so you can
just compile it with gcc or use my pre-compiled 32bit and 64bit binaries.
Consider the code public domain.
To use t2cc just drop it into ~/.bash
and edit your ~/.bashrc to set the prompt as follows:
PS1="\[\e[`~/.bash/t2cc $HOSTNAME`m\]\u@\h\[\e[0m\]:\[\e[`~/.bash/t2cc $HOSTNAME -2`m\]\w\[\e[0m\]\$ "
And if you use the same .bashrc for both 32 and 64 bit architectures you can
download t2cc_32 and
t2cc_64 to your ~/.bash and the
following into your ~/.bashrc:
if [ `uname -m` = "x86_64" ]; then
t2cc=~/.bash/t2cc_64
else
t2cc=~/.bash/t2cc_32
fi
PS1="\[\e[`$t2cc $HOSTNAME`m\]\u@\h\[\e[0m\]:\[\e[`$t2cc $HOSTNAME-2`m\]\w\[\e[0m\]\$ "
As you can see from the examples above I actually use 2 hashes of the hostname
a forward hash for the hostname and a backward hash for the path (-2 flag).
This enables more possible color combinations. T2cc is designed to ignore
colors which don't match dark backgrounds (or with -b bright backgrounds),
this ensures that the prompt is always readable.
Initially I wanted to write this all in bash but I couldn't quite figure out
how to convert ASCII character to numbers. If you know how to do this in pure
bash please let me know.
So you might be wondering what does all of this look like?
dotCOMmie@laptop:~/.bash$
LILUG
News
Software
2008-04-16T23:06:36-04:00
SMPT -- Time to chuck it.
E-mail, in particular SMTP (Simple Mail Transfer Protocol) has become an
integral part of our lives, people routinely rely on it to send files, and
messages. At the inception of SMTP the Internet was only accessible to a
relatively small, close nit community; and as a result the architects of SMTP
did not envision problems such as SPAM and sender-spoofing. Today, as the
Internet has become more accessible, scrupulous people are making use of flaws
in SMTP for their profit at the expense of the average Internet user.
There have been several attempts to bring this ancient protocol in-line with
the current society but the problem of spam keeps creeping in. At first people
had implemented simple filters to get rid of SPAM but as the sheer volume of
SPAM increased mere filtering became impractical, and so we saw the advent of
adaptive SPAM filters which automatically learned to identify and
differentiate legitimate email from SPAM. Soon enough the spammers caught on
and started embedding their ads into images where they could not be easily
parsed by spam filters. AOL (America On Line) flirted with other ideas to
control spam, imposing email tax on all email which would be delivered to its
user. It seems like such a system might work but it stands in the way of the
open principles which have been so important to the flourishing of the
internet.
There are two apparent problems at the root of the SMTP protocol which allow
for easy manipulation: lack of authentication and sender validation, and lack
of user interaction. It would not be difficult to design a more flexible
protocol which would allow for us to enjoy the functionality that we are
familiar with all the while address some, if not all of the problems within
SMTP.
To allow for greater flexibility in the protocol, it would first be broken
from a server-server model into a client-server model. That is, traditionally
when one would send mail, it would be sent to a local SMTP server which would
then relay the message onto the next server until the email reached its
destination. This approach allowed for email caching and delayed-send (when a
(receiving) mail server was off-line for hours (or even days) on end, messages
could still trickle through as the sending server would try to periodically
resend the messages.) Todays mail servers have very high up times and many are
redundant so caching email for delayed delivery is not very important.
Instead, having direct communication between the sender-client and the
receiver-server has many advantages: opens up the possibility for CAPTCHA
systems, makes the send-portion of the protocol easier to upgrade, and allows
for new functionality in the protocol.
Spam is driven by profit, the spammers make use of the fact that it is cheap
to send email. Even the smallest returns on spam amount to good money. By
making it more expensive to send spam, it would be phased out as the returns
become negative. Charging money like AOL tried, would work; but it is not a
good approach, not only does it not allow for senders anonymity but also it
rewards mail-administrators for doing a bad job (the more spam we deliver the
more money we make). Another approach is to make the sender interact with the
recipient mail server by some kind of challenge authentication which is hard
to compute for a machine but easy for a human, a Turing test. For example the
recipient can ask the senders client to verify what is written on an
obfuscated image (CAPTCHA) or what is being said on a audio clip, or both so
as to minimize the effect on people with handicaps. It would be essential to
also white list senders so that they do not have to preform a user-interactive
challenge to send the email, such that mail from legitimate automated mass
senders would get through (and for that current implementation of sieve
scripts could be used).
In this system, if users were to make wide use of filters, we would soon see a
problem. If nearly everyone has a white list entry for Bank Of America what is
to prevent a spammer to try to impersonate that bank? And so this brings us to
the next point, authentication, how do you know that the email actually did,
originate from the sender. This is one of the largest problems with SMTP as it
is so easy to fake ones outgoing email address. The white list has to rely on
a verifiable and consistent flag in the email. A sample implementation of such
a control could work similar to the current hack to the email system,
SPF, in which a special entry is made in the DNS
entry which says where the mail can originate from. While this approach is
quite effective in a sever-server architecture it would not work in a client-
server architecture. Part of the protocol could require the sending client to
send a cryptographic-hash of the email to his own receiving mail server, so
that the receiving party's mail server could verify the authenticity of the
source of the email. In essence this creates a 3 way handshake between the
senders client, the senders (receiving) mail server and the receiver's mail
server. At first it might seem that this process uses up more bandwidth and
increases the delay of sending mail but one has to remember that in usual
configuration of sending email using IMAP or POP for mail storage one
undergoes a similar process, first email is sent for storage (over IMAP or
POP) to the senders mail server and then it is sent over SMTP to the senders
email for redirection to the receivers mail server. It is even feasible to
implement hooks in the IMAP and POP stacks to talk to the mail sending daemon
directly eliminating an additional socket connection by the client.
For legitimate mass mail this process would not encumber the sending procedure
as for this case the sending server would be located on the same machine as
the senders receiving mail server (which would store the hash for
authentication), and they could even be streamlined into one monolithic
process.
Some might argue that phasing out SMTP is a extremely radical idea, it has
been an essential part of the internet for 25 years. But then, when is the
right time to phase out this archaic and obsolete protocol, or do we commit to
use it for the foreseeable future. Then longer we wait to try to phase it out
the longer it will take to adopt something new. This protocol should be
designed with a way to coexist with SMTP to get over the adoption curve, id
est, make it possible for client to check for recipients functionality, if it
can accept email by this new protocol then send it by it rather than SMTP.
The implementation of such a protocol would take very little time, the biggest
problem would be with adoption. The best approach for this problem is to
entice several large mail providers (such as Gmail or Yahoo) to switch over.
Since these providers handle a large fraction of all mail the smaller guys
(like myself) would have to follow suit. There is even an incentive for mail
providers to re-implement mail protocol, it would save them many CPU-cycles
since Bayesian-spam-filters would no longer be that important.
By creating this new protocol we would dramatically improve an end users
experience online, as there would be fewer annoyances to deal with. Hopefully
alleviation of these annoyances will bring faster adoption of the protocol.
LILUG
News
WWTS
2008-03-16T22:12:08-04:00
dnsmasq -- buy 1 get 2 free!
I mentioned earlier that we netboot (PXE) our cluster. Before NFS-root begins,
some things have to take place. Namely, the kernel needs to be served, IP
assigned, DNS look-ups need to be made to figure out where servers are and so
on. Primarily 3 protocols are in the mix at this time, TFTP, DHCP, DNS. We
used to run 3 individual applications to handle all of this, they're all in
their own right quite fine applications atftpd,
Bind9,
DHCP (from
ISC). But it just becomes too much to look after, you
have a config file for each of the daemons as well as databases with node
information. Our configuration used MySQL and
PHP to generate all the databases for these daemons. This
way you would only have to maintain one central configuration. Which means you
need to look after yet another daemon to make it all work. You add all of this
together and it becomes one major headache.
Several months ago I had installed openWRT onto a
router at home. While configuring openWRT I came across
something called dnsmasq. By
default, on openWRT, dnsmasq handles DNS and DHCP. I thought it was spiffy to
merge the 2 services .. after all they are so often run together (on internal
networks). The name stuck in my head as something to pay bit more attention
to. Somewhere along the line I got some more experience with dnsmasq, and had
discovered it also had TFTP support. Could it be possible what we use 4
daemons could be accomplished with just one?
So when the opportunity arose I dumped all node address information out of the
MySQL database into a simple awk-parsable flat file. I wrote a short parsing
script which took the central database and spit out a file dnsmasq.hosts (with
name/IP pairs) and another dnsmasq.nodes (with MAC-address/name pairs).
Finally I configured the master (static) dnsmasq.conf file to start all the
services I needed (DNS, DHCP, TFTP), include the dnsmasq.hosts and
dnsmasq.nodes files. Since the dnsmasq.nodes includes a category flag it is
trivial to tell which group of nodes should use what TFTP images and what kind
of DHCP leases they should be served.
Dnsmasq couldn't offer a more simple and intuitive configuration with 1/2 days
work I was able to greatly improve upon on old system and make a lot more
manageable. There is only one gripe I have with dnsmasq, I wish it would be
possible to just have one configuration line per node that is have the name,
IP, and mac address all in one line. If this was the case I wouldn't even need
an awk script to make the config file (although it turned out to be handy
because I also use the same file to generate a nodes list for torque). But its
understandable since there are instances where you only want to run a DHCP
server or just DNS server and so having DHCP and DNS information on one line
wouldn't make much sense.
Scalability for dnsmasq is something to consider. Their website claims that it
has been tested with installation of up to 1000 nodes, which might or might
not be a problem. Depending on what type of configuration your building. I
kind of wonder what happens at the 1000s of machines level. How will its
performance degrade, and how does that compare to say the other TFTP/DHCP
servers (BIND9 is know to work quite well with a lot of data).
Here are some configuration examples:
Master Flat file node database
#NODES file it needs to be processed by nodesFileGen
#nodeType nodeIndex nic# MACAddr
nfsServer 01 1
nfsServer 02 1
headNode 00 1 00:00:00:00:00:00
#Servers based on the supermicro p2400 hardware (white 1u supermicro
box)
server_sm2400 miscServ 1 00:00:00:00:00:00
server_sm2400 miscServ 2 00:00:00:00:00:00
#dual 2.4ghz supermicro nodes
node2ghz 01 1 00:00:00:00:00:00
node2ghz 02 1 00:00:00:00:00:00
node2ghz 03 1 00:00:00:00:00:00
...[snip]...
#dual 3.4ghz dell nodes
node3ghz 01 1 00:00:00:00:00:00
node3ghz 02 1 00:00:00:00:00:00
node3ghz 03 1 00:00:00:00:00:00
...[snip]...
Flat File DB Parser script
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21 | #!/bin/bash
#intput sample
#type number nic# mac addr
#nodeName 07 1 00:00:00:00:00:00
#output sample
#ip hostname
#10.0.103.10 nodeName10
awk '
/^headNode.*/ {printf("10.0.0.3 %s\
", $1)}; \
/^server_sm2400.*/ {printf("10.0.3.%d %s\
", $3, $2)}; \
/^nfsServer.*/ {printf("10.0.1.%d %s%02d\
", $2, $1, $2)}; \
/^node2ghz.*/ {printf("10.0.100.%d %s%02d\
", $2, $1, $2)}; \
/^node3ghz.*/ {printf("10.0.101.%d %s%02d\
", $2, $1, $2)}; \
'
|
\
~/data/nodes.db > /etc/dnsmasq.hosts
#output sample
#mac,netType,hostname,hostname
#00:00:00:00:00:00,net:nodeName,nodeName10,nodeName10
awk '
/^headNode.*/ {printf("%s,net:%s,%s,%s\
", $4, $1, $1, $1)}; \
/^server_sm2400.*/ {printf("%s,net:%s,%s,%s\
", $4, $1, $2, $2)}; \
/^node2ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\
", $4, $1, $1, $2, $1, $2)}; \
/^node3ghz.*/ {printf("%s,net:%s,%s%02d,%s%02d\
", $4, $1, $1, $2, $1, $2)}; \
'
\
~/data/nodes.db > /etc/dnsmasq.nodes
#output sample
#hostname np=$CPUS type
#nodeName10 np=8 nodeName
awk '
/^node2ghz.*/ {printf("%s%02d np=2 node2ghz\
", $1, $2)}; \
/^node3ghz.*/ {printf("%s%02d np=2 node3ghz\
", $1, $2)}; \
'
\
~/data/nodes.db > /var/spool/torque/server_priv/nodes
#Lets reload dnsmasq now
killall -HUP dnsmasq
dnsmasq.conf
interface=eth0
dhcp-lease-max=500
domain=myCluster
enable-tftp
tftp-root=/srv/tftp
dhcp-option=3,10.0.0.1
addn-hosts=/etc/dnsmasq.hosts
dhcp-hostsfile=/etc/dnsmasq.nodes
dhcp-boot=net:misc,misc/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:misc,10.0.200.0,10.0.200.255,12h
dhcp-boot=net:headNode,headNode/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:headNode,10.0.0.3,10.0.0.3,12h
dhcp-boot=net:server_sm2400,server_sm2400/pxelinux.0,nodeServer,10.0.0.2
dhcp-range=net:server_sm2400,10.0.0.3,10.0.0.3,12h
dhcp-boot=net:node2ghz,node2ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node2ghz,10.0.100.0,10.0.100.255,12h
dhcp-boot=net:node3ghz,node3ghz.cfg,nodeServer,10.0.0.2
dhcp-range=net:node3ghz,10.0.101.0,10.0.101.255,12h
Debian
LILUG
News
Software
Super Computers
2008-03-13T00:30:40-04:00
MOTD
You all probably know that the most important thing on any multi user system
is a pretty MOTD. Between some other things in the past couple of weeks I
decided to refresh the MOTDs for the galaxy and Seawulf clusters. I discovered
2 awesome applications while compiling the MOTD.
First is a jp2a, it takes a JPG and converts it to ASCII
and it even supports color. I used this to render the milky way as part of the
galaxy MOTD. While this tool is handy it needs some assistance, you should
clean up and simplify the JPGs before you try to convert them.
The second tool is a must for any form of ASCII-art editing. Its called
aewan (ace editor without a name). It makes
editing a lot easier, it supports coloring, multiple layers, cut/paste/move,
and more. Unfortunately it uses a weird format and does not have an import
feature, so its PITA to import an already existing ASCII snippet -- cut and
paste does work but it looses some information -- like color.
Aewan comes with a sister tool called aecat which 'cats' the native aewan
format into either text (ANSI ASCII) or HTML. Below is some of my handy work.
Because getting browsers to render text is PITA I decided to post the art-work
as an image.
Galaxy MOTD:
Seawulf MOTD:
I also wrote a short cronjob which changes the MOTD every 5 min to reflect how
many nodes are queued/free/down
One more resource I forgot to mention is the ascii generator. You give it a text string and it returns in a
fancy looking logo.
Finally when making any MOTDs try to stick to the max width of 80 and heigh of
24. This way your art work won't be chopped even on ridiculously small
terminals.
Debian
LILUG
News
Software
2008-03-02T23:41:22-05:00
NFS-root
I haven't posted many clustering articles here but I've been doing a lot of
work on them recently, building a cluster for SC07 Cluster Challenge as well
as rebuilding 2 clusters (Seawulf & Galaxy) from the ground up at Stony Brook
University. I'll try to post some more info about
this experience as time goes on.
We have about 235 nodes in Seawulf
and 150 in Galaxy. To boot all the nodes we use
PXE (netboot),
this allows for great flexibility and ease of administration -- really its the
only sane way to bootstrap a cluster. Our bootstrapping system used to have a
configuration where the machine would do a plain PXE boot and then, using a
linuxrc script the kernel would download a compressed system image over TFTP,
decompress it to a ram-disk and do a pivot root. This system works quite well
but it does have some deficiencies. It relies on many custom scripts to
maintain the boot images in working order, and many of these scripts are quite
sloppily written so that if anything doesn't work as expected you have to
spend some time try to coax it back up. Anything but the most trivial system
upgrade requires a reboot of the whole cluster (which purges the job queue and
annoys users). On almost every upgrade something would go wrong and I'd have
to spend a long day to figure it out. Finally, using this configuration you
always have to be conscious to not install anything that would bloat the
system image -- after all its all kept in ram, larger image means more waste
of ram.
During a recent migration from a mixed 32/64bit cluster to a pure 64bit
system. I decided to re-architect the whole configuration to use NFS-root
instead of linuxrc/pivot-root. I had experience with this style of
configuration from a machine we built for the SC07 cluster challenge, how-ever
it was a small cluster (13 nodes, 100cores) so I was worried if NFS-root would
be feasible in a cluster 20 times larger. After some pondering over the topic
I decided to go for it. I figured that linux does a good job of caching disk
IO in ram so any applications which are used regularly on each node would be
cached on nodes themselves (and also on the NFS server), furthermore if the
NFS server got overloaded some other techniques could be applied to reduce the
load (staggered boot, NFS tuning, server distribution, local caching for
Network File systems). And so I put together the whole system on a test
cluster installed the most important software mpi, PBS(torque+Maui+gold), all
the bizarre configurations.
Finally, one particularly interesting day this whole configuration got put to
the test. I installed the server machines migrated over all my configurations
and scripts halted all nodes. Started everything back up -- while monitoring
the stress the NFS-root server was enduring, as 235 nodes started to ask it
for 100s of files each. The NFS-root server behaved quite well using only 8
NFS-server threads the system never went over 75% CPU utilization. Although
the cluster took a little longer to boot. I assume with just 8 NFS threads
most of the time the nodes were just standing in line waiting for their files
to get served. Starting more NFS threads (64-128) should alleviate this issue
but it might put more stress on the NFS-server and since the same machine does
a lot of other things I'm not sure its a good idea. Really a non-issue since
the cluster rarely gets rebooted, especially now that most of the system can
be upgraded live without a reboot.
There are a couple of things to consider if you want to NFS-root a whole
cluster. You most likely want to export your NFS share as read-only to all
machines but one. You don't want all machines hammering each others files.
This does require some trickery. You have to address the following paths:
-
/var
You cannot mount this to a local partition as most package management systems
will make changes to /var and you'll have to go far out of your way to keep
them in sync. We utilize a init script which takes /varImage and copies it to
a tmpfs /var (ram file system) on boot.
-
/etc/mtab
This is a pain in the ass I don't know who's great idea was to have this file.
It maintains a list of all currently mounted file systems (information is not
unlike to that of /proc/mounts). In fact the mount man page says that "It is
possible to replace /etc/mtab by a symbolic link to /proc/mounts, and
especially when you have very large numbers of mounts things will be much
faster with that symlink, but some information is lost that way, and in
particular working with the loop device will be less convenient, and using the
'user' option will fail." And it is exactly what we do. NOTE
autofs does not support the symlink
hack, I have a filed bug in the debian.
-
/etc/network/run (this might be a debianism)
We use a tmpfs for this also
-
/tmp
We mount this to a local disk partition
All in all the NFS-root system works quite well I bet that with some tweaking
and slightly more powerful NFS-root server (we're using dual socket 3.4Ghz
Xeon 2MB cache and 2GB of ram) the NFS-root way of boot strapping a cluster
can be pushed to serve over 1000 nodes. More than that would probably require
some distribution of the servers. By changing the exports on the NFS server
any one node can become read-write node and software can be installed/upgraded
on it like any regular machine, changes will propagate to all other nodes
(minus daemon restarts). Later the node can again be changed to read-only --
all without a reboot.
Debian
LILUG
News
Software
Super Computers
2008-03-02T13:25:11-05:00
«
Page 2 / 4
»