[Beowulf] how cluster's storage can be flexible/expandable?

Jonathan Barber jonathan.barber at gmail.com
Tue Nov 13 02:04:33 PST 2012

On 12 November 2012 11:26, Duke Nguyen <duke.lists at gmx.com> wrote:
> On 11/12/12 4:42 PM, Tim Cutts wrote:
>> On 12 Nov 2012, at 03:50, Duke Nguyen <duke.lists at gmx.com> wrote:
>>> On 11/9/12 7:26 PM, Bogdan Costescu wrote:
>>>> On Fri, Nov 9, 2012 at 7:19 AM, Christopher Samuel
>>>> <samuel at unimelb.edu.au> wrote:
>>>>> So JBODs with LVM on top and XFS on top of that could be resized on
>>>>> the fly.  You can do the same with ext[34] as well (from memory).
>>> We also thought of using LVM on top of RAID disks, but never think of
>>> XFS. Why do we need XFS and how does this compare with GPFS?
>> XFS is not a parallel filesystem like GPFS, but a lot of us use it because it's so easy to grow the filesystem.  I frequently do this with virtual machines to expand their storage.
>> 1)  Add new virtual disk
>> 2)  Scan SCSI buses in Linux
>> 3)  pvcreate the new device
>> 4)  vgextend
>> 5)  lvextend
>> 6)  xfs_growfs
>> Job done.
>> On systems in our data centre, we tend to use XFS for most local filesystems, unless we have a good reason to use something else.
> Is there any reason you do not use GlusterFS on top of XFS? Can I do
> that? With that, can I have a parallel system similar to GPFS?

Not quite. GPFS (and RedHat's GFS) mediate parallel access to the
block devices. So, more than one host can talk to the underlying
storage system - obviously this requires that the block devices are
network accessible via a protocol such as fibre channel or iSCSI, or
that you have multiple hosts connected to the targets through the same
SAS / SCSI bus. (GPFS also allows non-storage attached nodes to talk
to the storage through storage-attached nodes).

With GlusterFS / Lustre (not sure about Ceph) the data is stored on a
host file system (such as XFS) and the GlusterFS / Lustre daemons
manage and mediate access to it - but the data is only accessible
through this daemon.

The difference comes down to your access patterns and how you scale
the systems. With GPFS you basically add more disks and storage
systems (and network) to scale the IO, with GlusterFS / Lustre you add
more servers (each with more disks).

Usually, because servers with disks in them are cheaper than storage
systems, the GlusterFS / Lustre style systems have lower capital
costs. I suspect the running costs are higher due to inefficiencies in
power/cooling for the servers, but I don't know this for sure.

For Lustre, you need a minimum of two nodes, one for the objects (i.e.
the data) and one for the metadata. I don't know if it's possible to
collocate these functions on the same OS? If I remember correctly you
can run GlusterFS and GPFS with just one node.

Finally, for GlusterFS + XFS, there is a FAQ:
XFS is one of three supported filesystems for Gluster bricks, ext3 and
ext4 are also supported.

Using XFS does have size and performance implications that need to be

Gluster makes extensive use of extended attributes, XFS extended
attribute performance in kernel versions < 2.6.39 is very poor. [1]
This makes XFS a poor choice in environments where files are smaller
than ~500MB

Have fun!

> Thanks,
> D.
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf

Jonathan Barber <jonathan.barber at gmail.com>

More information about the Beowulf mailing list