[Beowulf] how cluster's storage can be flexible/expandable?

Fri Nov 9 04:26:41 PST 2012

On Fri, Nov 9, 2012 at 7:19 AM, Christopher Samuel
<samuel at unimelb.edu.au> wrote:
> So JBODs with LVM on top and XFS on top of that could be resized on
> the fly.  You can do the same with ext[34] as well (from memory).

It also works with hardware external RAID systems, I've done it ~5
years ago - the key is firmware support in the RAID system. Swapped
disks one by one, allowing one to be fully rebuilt before the next one
is changed; here it helps if the firmware allows one disk to be a
perfect copy of another, otherwise you just treat it as a failed disk
which needs to be reconstructed. Once each larger disk is in, the
volume is enlarged on the fly; the firmware will do a rebuild using
(hopefully :)) only the disk areas which were not previously used. So
far it is all done on the RAID system, the host computer doesn't know
anything about it. Afterwards, the kernel needs to be informed that
the volume has grown; IIRC this has required a rescan of that
particular SCSI target. And finally the FS (I used ext3 at the time)
needs to be enlarged (using resize2fs). All without unmounting, users
noticed only that the FS suddenly became larger :)

The access will be slowed down throughout the whole process, as data
needs to be copied between disks (during disk swapping phase), RAID
volume reconstructed (during volume expansion phase) and FS enlarged
(which for ext3 means creating extra inodes, etc.; for FS without
fixed nr. of inodes this phase will probably be very short). The
process will also take long... IIRC after each disk was inserted it
took about 10h for it to be fully integrated, so I was able to
exchange 2 disks/day; this can be different nowadays due to different
disk sizes, disk read-write speed and controller speed. Up to you to
decide whether it makes sense to do it this way or it becomes easier
to declare a downtime :)

> Then there are things like Panasas where you can buy more shelves and
> add them to the bladeset and expand that way.

... but the expansion of volumes requiring a rebalancing of the
objects distribution will also slow down the access. There's no magic
bullet :)

Cheers,
Bogdan