[Beowulf] Big storage

Bruce Allen ballen at gravity.phys.uwm.edu
Wed Sep 5 06:59:04 PDT 2007


Loic,

Thanks.  In what I wrote I was using 1 TB = 10^12 bytes and 20 TB = 2 x 
10^13 bytes.

When you say that the max capacity with two system disks is 19.6 TB, do 
you mean 19 600 000 000 000 bytes or do you mean 19.6 x 1024^4 = 21 550 
427 904 410 bytes?

Once again, I need to study what you wrote...

Cheers,
 	Bruce


On Wed, 5 Sep 2007, Loic Tortay wrote:

> According to Bruce Allen:
> [...]
>>>>
>>> This is (in my opinion) probably the only real issue with the X4500.
>>> The system disk(s) must be with the data disks (since there are "only"
>>> 48 disks slots) and the two bootable disks are on the same controller
>>> which effectively make this controller a single point of failure (there
>>> are easy ways to move the second system disk to another controller, but
>>> you still need a working "first" controller to boot).
>>
>> Can you boot from a USB device?  You can have an inexpensive RAID-1 USB
>> device for the root and OS.
>>
> You can boot from a USB device, there are 4 ports available (2 on the
> front side, 2 on the back).
>
> We booted a machine from an external DVD drive (there are also virtual
> floppy and DVD drives available through the service processor).
>
>>> Although in our experience, controller failures are rare on the X4500
>>> (one failure in over a year with a few tens of X4500).
>>
>> Did you lose data with a controller failure?  I assume can you just move
>> the 48 disks to another box.
>>
> We did not loose any data due to the controller failure.
>
> The problem occured a few days before a scheduled downtime, the
> mainboard was replaced during the downtime and the machine rebooted
> just fine.
>
> Even if we hadn't been close to a scheduled downtime, the applications
> running on most of our X4500 are fault tolerant enough that we can
> offline a machine for some time without a significant impact.
>
>>
>> It will take me some time to digest your other comments.  But I made a
>> mistake in what I wrote.  I want to have a 48 disk box with 500 GB disks.
>>> From this (raw) 24 TB of storage I want to get 20 TB usable (eg, lose no
>> more than 8 disks of the 40 for redundancy and the OS).  I mistakenly
>> wrote 20/24 disks and 10 TB in my email.  How would you revise your
>> recommendations for 20TB of usable storage?
>>
> With 48 disks, there are also many different possible configurations.
>
> The default one (which only remains if you use the bundled Solaris
> installation), is quite good and gives globally good results.
> It's obvious that Sun has given a lot of thought to this.
>
> If you really want 20 TB (10*2^41 bytes) of usable space, then you
> either need to:
> . wait until Sun provides 750 GB or 1 TB disks (750 GB should be
>   available soon if I'm not mistaken);
> . use a less redundant configuration that will not make the machine
>   resistent to controller failures and probably less resistant to
>   disks failures.
>
> We have an actual usable space of 16.9 TB on our machines (we mostly
> use a minor variation of the Sun layout).
>
>
> The largest possible usable space you can get from a X4500 with 48x500
> GB disks, two system disks and "some" redundancy is 19.6 TB.  But this
> is certainly NOT a configuration you want to use:
>                +-----------------------------------------------+
>                |                Controllers                    |
>                +-----------------------------------------------+
>                |   c5     c4      c7      c6      c1      c0   |
>  +-------------+-----------------------------------------------+
>    ^       7   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
>    |    -------+-----------------------------------------------+
>    |       6   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
>    |    -------+-----------------------------------------------+
>    |       5   |  v1   |  v1   |  v1   |  v1   |  v1   |  v1   |
>    |    -------+-----------------------------------------------+
>    D       4   |  Sys2 |  v1   |  v1   |  v1   |  v1   |  v1   |
>    i    -------+-----------------------------------------------+
>    s       3   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
>    k    -------+-----------------------------------------------+
>    s       2   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
>    |    -------+-----------------------------------------------+
>    |       1   |  v2   |  v2   |  v2   |  v2   |  v2   |  v2   |
>    |    -------+-----------------------------------------------+
>    |       0   |  Sys1 |  v2   |  v2   |  v2   |  v2   |  v2   |
>  +-------------------------------------------------------------+
>
> That's two "raidz1" (single parity) vdevs of 23 disks (2 x 22+P).
>
> This a very bad idea if you consider basic best practices, Sun
> engineers recommendations and, of course, current hardware reliability
> as outlined in the (previously mentionned) article by Bianca Schroeder
> and Garth Gibson.
>
>
> If you want roughly 20 TB but can cope with less, then I suggest you
> use the Sun configuration or one of its minor variation: moving the
> second system disk and/or having 7 identically sized vdevs instead of 6
> (7 x 5+P + 1 x 3+P instead of 6 x 5+P + 2 x 4+P).
>
> We have tested about 25 different ZFS configurations with various I/O
> workloads and unless you're willing to sacrifice available space or
> data security, the Sun layout is the best balanced and also gives
> good or acceptable performance for most workloads.
>
>
> Loïc.
>


More information about the Beowulf mailing list