[Beowulf] raid 1 or ldap?

f.costen@cs.man.ac.uk fumie.costen at manchester.ac.uk
Tue Nov 13 01:50:20 PST 2007

Dear  All,
I am clustering 8 32-bits machines and 5 64 bits machines.
64 bits machines are heavily used for MPI jobs
and rest of the machines are simply being terminal for
PhD students.
One of 64-bits machine has all the users' home
directory and softwares (not kernel)
for both 32bits machines and 64 bits machines
and it exports these directories to client machines via NFS.
About 10 LDAP  users are maintained by the server.
This server has 2 external harddisk connected by firewire
and these two harddisks are used to store the software and home
directories for users under RAID 1 system.
We deliverately did NOT use the internal harddisk
for the home directory because the internal harddisk (each machine has
about 500-750 GB) on each machine
can be used for the calculation.
Raid 1 slows down the system a bit  but we did not expect any reduction
of the speed of calculation using the local harddisk.

When I did a very small calculation
on my laptop which is standing alone,
the execution time was faster than my local cluster machine
( no parallelisation, serial job ).

The laptop has the CPU which is 32bit with 1G of memory and has
vendor_id       : GenuineIntel
model name      : Intel(R) Pentium(R) 4 Mobile CPU 2.00GHz
stepping        : 4
cpu MHz         : 1994.395
cache size      : 512 KB
cpuid level     : 2
bogomips        : 3992.82

The test machine in the cluster has dual core with 4GB of memory and
each core has
vendor_id       : AuthenticAMD
model name      : AMD Athlon(tm) 64 X2 Dual Core Processor 4200+
stepping        : 1
cpu MHz         : 2200.000
cache size      : 512 KB
cpuid level     : 1
bogomips        : 4404.88
TLB size        : 1024 4K pages
clflush size    : 64
cache_alignment : 64
address sizes   : 40 bits physical, 48 bits virtual

The same programme and the same compiler option and the same compiler
are used for these two machines.

On the laptop:
174.650u 4.752s 3:01.70 98.7%

On the local cluster under /home( i.e., under RAID1)
209.457u 1.116s 4:07.05 85.2%

On the local cluster under /local
204.268u 1.696s 3:26.04 99.9%

What I can not understand is that
under /local area we are supposed to be achieving
a good/better performance than the laptop
( at least it was the case when each machine
was standing alone  having the home for each user)
But this is not the case in reality.

When it comes to the situation where we have to run
a large job, the difference is not in the order of "second"
but  order of "days".

I am wondering if any of us in this mailing list
has similar experience in the past and if you have,
I would like to know how you solved this type of problems.
At the moment, I have just transfered the home directories and
software directories into the the internal harddisk (discarding the raid
1 system) as a test and when we calculate and produce the
results under home directories or under the spare local area of the
server, again the system is too slow to bear..

Thank you very much
Best wishes, Fumie

More information about the Beowulf mailing list