CCL:Origin300, Linux Cluster (fwd)

Thu Jul 4 12:44:22 PDT 2002

On Thu, 4 Jul 2002, Eugen Leitl wrote:

> 
> ---------- Forwarded message ----------
> Date: Thu, 4 Jul 2002 12:04:07 -0400
> From: Jianhui Wu <wujih at BRI.NRC.CA>
> To: chemistry at ccl.net
> Cc: amber at heimdal.compchem.ucsf.edu
> Subject: CCL:Origin300, Linux Cluster
> 
> Dear Colleagues,
> 
> I have a budget around $40k CN to shop for a new computer system, which
> will be used for MD simulation, virtual screening and some bioinformative
> stuff. Currently, I am looking at two options: Origin 300 (2 cpu) or PC
> Linux Cluster. I would like to hear your experience with these systems and
> spend the limited budget right.
> 
> (1) An Origin 300 2cpu 500MHZ cost around $35k. Are you using this kind of
> system? Do you have benchmark of MD simulation (such as Amber) for this
> system? Do you regret your purchase?

I cannot help you with the Origin side, but unless it delivers a
stupendous number of floats per clock or has some other parameter
(memory size or speed) that enables critical pathways for MD code, this
is a pretty insane option.  See below.

> (2) What is the best configuration for a PC Linux cluster now?  Is your
> cluster stable enough? For example, it only break down once a month
> instead of once a week or two weeks. How do you take care of the maintance
> issue? Do you keep a spare node to serve as spare parts for other nodes? 
> How much it cost to mantain a PC cluster (service, parts, etc)?

Here I can help you.  There are two or three generic kids of PC cluster
that folks are buying right now, depending on budget, memory utilization
pattern, and so forth.  Let me give you a very quick matrix of the
possibilities.  I'm sure if I make any careless mistakes they will
quickly be corrected by the group:-)

Nodes can be:

  * Single or Dual processor;

  * AMD Athlon or Intel Celeron, P4 (or P4/Xeon) (or Itanium, if you
    like very high end stuff) in clocks that range from ~1GHz on the
    low end to 2.5 GHz or so on the high end;

  * On motherboards with a FSB from 200 to 533 MHz;

  * Memory from 256 MB to perhaps 2 GB, in the form of SDRAM, DDR SDRAM,
    RDRAM;

  * With (switched) 100 BT or 1000BT purely OTS, Myrinet and several
    other high end networks from vendors (without arguing for or against
    any of them);

  * With (UDMA 66,100,133) IDE or SCSI hard disk(s) or no hard disk at
    all;

  * With "misc" (video card, floppy, cd) or not;

  * in 1U, 2U, 3U cases in racks or in mini-to-full towers in OTC
    heavy duty shelving;

Yes, a lot of choices here.  Let me give you a few "typical"
configurations:

2U rackmount case, dual Athlon 1900+ (1600 MHz), 1 GB PC2100 DDR, 40 GB
HD, Tyan 2466 Motherboard (with built in 3c920 100BT PXE-bootable NIC
and serial console), no video (will need a single card to setup bios
orginally for PXE booting and serial console), no floppy.  Barebones,
pure compute power for between (street price) $1600 and $2000 (US) per
node, depending on who builds them and whether they set them up for you
(local PC vendor vs turnkey system vendor).

Budgeting $2600 or thereabouts for a rack and a cheap 100BT switch, UP
server node (in a 4U case) with a gigabit interface to the 100BT switch,
and a few IDE disks configured in an md RAID, with CDROM and floppy,
monitor, keyboard, mouse, null modem serial cable, sundry cat 5 cables
-- the "front end" node, you could afford 14 of these nodes, or 24
processors not counting the front end node.  The 24 processors would
yield 38400 aggregate MHz (compared to 1000 for the Origin) although
your actual floating point performance per processor would of course
depend on things like memory access pattern, ratios of memory access
time(s, at all cache levels) to on-CPU computation time, and more.
Parallel performance 

However, I'd be very surprised if the Origin came within a factor of ten
in aggregate performance on nearly any task, MD included, and might well
be slower on a per CPU basis (so EACH inexpensive dual Athlon
outperformed the much more expensive Origin).

The same setup in tower cases would be at least $100/node cheaper, so
you could get 15 nodes plus a server/frontend node -- if you shopped
hard (prices are dropping all the time) you might even squeeze 16 nodes
and 32 processors out of your budget.

The dual CPU configuration will probably not be optimal if your job is
memory bound.  Also, if your job can exploit SSE2 instructions and has
certain memory access patterns, it will run faster on Intel hardware.
Finally, Intel hardware runs at a much higher clock these days.  An
example of a single processor Intel node might be:

  Intel 1.8 GHz processor, MSI 845 GMax motherboard (533 MHz FSB,
onboard eepro100 and video), floppy (optional but cheap), 40 GB HD, 512
MB DDR, tower case for $600-$700, add $160 to put it in a rackmount
case, add $260 or so to go to a 2.4 GHz processor.

In a tower configuration, you could afford an easy 32 processors this
way, with no memory bus contention.  A stunning 57600 aggregate MHz for
your money.  I THINK that it would fit in a 1U rack case, which would
give you fewer nodes but a much smaller physical footprint.

  Hope this helps.

      rgb

> 
> If you make the choice, what kind of system would you go for it? Any
> suggestion will be greatly appreciated. 
> 
> Best wishes,
> 
> Jian Hui Wu
> 
>  
> 
>  
> 
> 
> -= This is automatically added to each message by mailing script =-
> CHEMISTRY at ccl.net -- To Everybody  | CHEMISTRY-REQUEST at ccl.net -- To Admins
> Ftp: ftp.ccl.net  |  WWW: http://www.ccl.net/chemistry/   | Jan: jkl at osc.edu
> 
> 
> 
> 
> 
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu