[Beowulf] Question on hgh performance, low cost Fileserver

Michael Will mwill at penguincomputing.com
Mon Nov 21 17:12:23 PST 2005


I have not completely though this out yet, but what about something like 
this:

Have 1U or 2U servers with internal drives split into two (software) 
raid volumes.
Connect them with 1 ethernet cable to a switch fabric to serve half of 
the storage.

Connect them in pairs with the second gigabit ethernet cable directly 
(crossover,
same cable as non-crossover since it is gigabit, most mainboards have 
two nics down
nowadays)
use network-block-device to mirror one of the two groups to the other 
server.

If one node goes down, or a drive in a node goes down, you can take it 
offline without
the data being offline since it is mirrored onto a second system. This 
means you can save
the money for a raid controller and do software raid, but will spend 
more money on
extra drives because of the raid1 across two machines.

If you use heartbeat and service failover, you might even be able to 
automatize the
service takeover between the two machines.

On top of that you can now run PVFS to aggragate the distributed storage 
into a single
image without loosing your data if a node goes down for good.

Michael Will

Paulo Afonso Lopes wrote:
> GFS and GPFS are SAN-based. I do not have any experience with Lustre, but
> it seems (at least in a supported - by the vendor - configuration) to be
> based on a "back-end" SAN.
>
> What you have to deal, using currently available solutions, is with this
> kind of decisions:
> - Do you rate availability/fault tolerance as important?
>   If you do (why else would you say PVFS is not for home dirs?) you must
> use a disk array based solution, either with an FC-SAN or NAS or iSCSI.
>
> Then, you must choose your "file system" (not for the NAS option). You'll
> have to decide if:
> - You need "POSIX locking": if you do, you can't use PVFS
> - You will want to support applications that both require high I/O
> bandwidth and heavy file sharing (RW the same file): if you do, you must
> exclude GFS, use GPFS in "data shipping mode" and modify your applications
>
> (Note: you can have a resilient PVFS configuration if you use a SAN with
> disk arrays instead of "internal" disks, and add some HA software - of
> course, you can "transfer" disks manually, via command scripts, if you do
> not want to use HA software)
>
> You also need to put the "high cost" of a SAN into context: if you want to
> move data at high speeds in a COTS (Gigabit Eth) LAN, you will consume all
> the available CPU (e.g. around 40% of a 2.6GHz Xeon to reach around 80MB/s
> sustained in one node). If you go for "fancy" interconnects (Infiniband,
> Myrinet,...) you are in the same "cost territory" as FC/SANs
>
> By NOT using "asymetrical" file systems (such as PVFS) and using "cluster
> file systems" such as GFS or GPFS you may (depending on your requirements)
> dispense with I/O nodes (client nodes on a SAN can directly access data)
> alltogether...
>
> I have never been involved in a large configuration like the one you're
> planning to build, but I honestly think that you should go for a "mix" of
> HA filesystem (e.g., GFS) for homes, etc. (mostly unshared file access)
> and PVFS for the directories where files for HPC applications do live. I
> don't think there is a single, currently available file system, that can
> do both things well.
>
> HTH
>
> paulo
>
>
>   
>> We are looking into designing a low cost, high performance storage system.
>> Requirements as below:
>>
>> - Starts at 3TB, should scale up by adding more servers to say 10-12TB
>> - Use commodity technologies (x86_64, IB, GE, Linux), preferably all OSS
>> components
>> - Provide high I/O which scales with addition of storage nodes.
>> - To be used for hosting user home dirs so reliability is important
>> - The HPC cluster starts with 6 AMD64 nodes and is expected to scale to
>> 1000+nodes in a year.
>> - Preferably without FC/SAN
>>
>> We do have experience with IBM GPFS, PVFS (1,2), NetApps, PolyServe but
>> not with GFS and LUSTRE.
>>
>> PVFS is not reliable enough for home dirs (OK for scratch), GPFS cannot
>> do RAID5 like striping across nodes, needs SAN for RAID1 like
>>     
> mirroring
>   
>> (cost $$$) , polyserve is too expensive (per CPU pricing)
>>
>> Is GFS or Lustre suitable for the above needs? Any other commercial
>> slution?
>>
>>     
> --
> Paulo Afonso Lopes                        | Tel: +351- 21 294 8536
> Departamento de Informática               | 294 8300 ext.10763
> Faculdade de Ciências e Tecnologia        | Fax: +351- 21 294 8541
> Universidade Nova de Lisboa               | e-mail: pal at di.fct.unl.pt
> 2829-516 Caparica, PORTUGAL
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>   


-- 
Michael Will
Penguin Computing Corp.
Sales Engineer
415-954-2822
415-954-2899 fx
mwill at penguincomputing.com 





More information about the Beowulf mailing list