[Beowulf] Lifespan of a cluster
Andrew M.A. Cater
amacater at galactic.demon.co.uk
Sun Apr 27 04:04:03 PDT 2014
On Sun, Apr 27, 2014 at 09:45:41AM +0100, Jörg Saßmannshausen wrote:
> Dear all,
> in some of the discussions here I came across the 'lifespan of a cluster'
> argument. What I was wondering is: how long is that in HPC for number
> Is it 3 years (end of warranty), 5 years (making good use of hardware) or
It depends: it depends very much on what you're crunching, for how many people
at what demand profile, how much data you've got ... a whole host of variables.
It also depends on which manufacturer makes your hardware. IBM server, IBM diskshelves
- 3 - 5 year full hardware support for the server, likewise for the shelf, individual
disks may be available for ten years - but you may end up getting factory refurbished
for the last few years. That comes at the high cost of having a service contract but
you do get peace of mind - and salesmen pestering you after year 3 to make that upgrade ...
If you're running a university's set of clusters, then you may find that the
department head who can shout loudest / has the most money gets this year's sexy
hardware and everyone plays shuffle down to the next set of hardware cast off
by the department above.
Power / cooling / rack shape and density also change: if you're still running a
working cluster or three that are 10 years old, your power efficiency is probably
not great and you're costing more to run the cluster than the hardware cycles are
worth - but the refit costs of the data centre start to add up.
There is a good argument on ecology / power and cooling costs alone.
I have lived in a data centre where it was worth it to retrofit screening and
move some racks to create hot/cold aisle separation to decrease overall cooling
costs - but that decision, in itself, cost $$$.
Likewise HPC interconnects and networking are "better" (for some values of
better) on newer technologies
Have you asked this question of your friendly rivals at Imperial and Queen Mary :)
> The reason behind that asking is: I got clusters here which are 10 years old,
> and quite a number of them, and I would like to get a scheme implemented to
> get the hardware replaced every X years with X being the 'lifespan of a
> cluster'. One of the various options which are currently thrown around is to
> move from my local data-centre (3 rooms, one is purely for the backup/file
> storage and the other two for HPC) into the College shared data centre (single
> room). IF we are doing that, I am a bit worried that I get told in 5 years
> time (for the sake of that argument): your clusters are end of lifetime, you
> have to get rid of them as we need space / they are consuming too much energy.
> Thus, I am looking to get some answers for: how long are clusters run
> typically and how is that done in other shared data centres?
> The current funding situation here means it is difficult, if not impossible, to
> get HPC hardware from funding agencies. Even if you get a bit of money, it is
> just enough to get a new node. So most clusters are a bit organically grown
> which makes administration difficult if you want to get really the best out of
> waht you paid for. In an ideal world, I would like to have that replaced every
> 5 years: old kit out, new kit in. In the real world, I got to run the kit
> until it falls apart and hope that the Principal Investigator, i.e. the owner
> of the cluster, got some money to replace the old/broken nodes. Hence the
> questions so I can build up a good case to change there.
Central funding from something like the old SERC [Science and Engineering
Research Council] / joint university projects?
> I hope that makes sense to you.
> All the best from a overcast London!
> Dr. Jörg Saßmannshausen, MRSC
> University College London
> Department of Chemistry
> Gordon Street
> WC1H 0AJ
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
More information about the Beowulf