"Ramses" Diskless
Cluster
With the
relatively low cost of high powered PC's the idea to interconnect a
series of machines is not a new one. Beowulf clusters and
clusters of workstation have been used extensively for many years.
However, maintenance of these clusters has always been an issue,
especially as the number of nodes increases. By creating a truly
diskless cluster the maintenance is restricted to a single machine, the
master node.
The "Ramses"
cluster was a truly diskless setup. It was modelled
after Arthur Weaver's Sirius
Cluster
located at Cornell University in Ithaca, New York, with
modifications in software and in hardware.
With the
exception of the master node, every other machine was simply a
box with a motherboard and the minimal requirements to boot the BIOS.
ALL data was stored on the master node. Failures of the
diskless clients did not compromise data. Indeed, the entire cluster
configuration can be restored easily from a single bootable hard disk
of the master node that we saved.
Hardware Design
Our
cluster consisted of 8 dual proccessor AMD Athlon 1900MP processors.
The master node was configured with:
- Dual AMD 1900MP
- Tyan Tiger MPX
(S2466) Motherboard w/ onboard 10/100MB Ethernet
- 3.5 Gigabytes of
PC2100 RAM
- 3Com Vortex
10/100MB Ethernet
- 8MB PCI Video Card
- Dual 60 Gig IDE's
setup with a software Raid (mirrored).
- Antec 840 Case
w/300W Power Supply
The 7 diskless
nodes were all identical and consisted of:
- Dual AMD 1900MP
- Tyan Tiger MPX
(S2466) Motherboard w/ onboard 10/100MB Ethernet
- 512 MB of PC2100 RAM
- 8MB PCI Video Card
- Antec 630 Case
w/300W Power Supply
All
interconnected by:
- 3Com 10/100 Fast Ethernet
Switch
- D-Link 8 port KVM
switch
- Flat Panel Display
- Logitech Mouse and
Keyboard
Hardware was
purchased from Colfax
International.
They provided prompt service and we highly recommend
them.
Software
All
machines were running RedHat Linux 7.2 with the 2.4.18 kernel which was built and
optimized
for our setup. We used LAM-MPI
for distributed computing. As well as using the Ganglia Web monitoring tool
to obtain realtime cluster usage statistics.
Client machines
remotely booted using the 3Com network boot (already in
the network card bios) and tftp to obtain a copy of an optimized
diskless
kernel, and mounted all their file systems from the master node.
Should you wish
to design your own cluster, we have compiled a
step-by-step install recipe which can be found on our Cluster Setup Page page. You are also
encouraged to look at the original SIRIUS
Setup Page
by Arthur Weaver.
There is also
the
former Job Submission Guide that refers
to software once installed on Ramses. The old configuration and data is
still accessible since we disconnected and saved one of the Raid disks.
To boot up the old Ramses master node, simply open the Ramses case and
switch the IDE cable and power connector to the second hard disk.