[Beowulf] non-stop computing

Prentice Bisbal pbisbal at pppl.gov
Wed Oct 26 07:20:34 PDT 2016


How so? By only having a single seat or node-locked license?

Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
http://www.pppl.gov

On 10/26/2016 09:52 AM, Joe Landman wrote:
> Licensing might impede this ...  Usually does.
>
>
> On 10/26/2016 09:50 AM, Prentice Bisbal wrote:
>> There is a amazing beauty in this simplicity.
>>
>> Prentice
>>
>> On 10/25/2016 02:46 PM, Gavin W. Burris wrote:
>>> Hi, Michael.
>>>
>>> What if the same job ran on two separate nodes, with IO to local 
>>> scratch?  What are the odds both nodes would fail in that three week 
>>> period.  No special hardware / software required.  Simple. Done.
>>>
>>> Cheers.
>>>
>>> On Tue 10/25/16 02:24PM EDT, Michael Di Domenico wrote:
>>>> here's an interesting thought exercise and a real problem i have to 
>>>> tackle.
>>>>
>>>> i have a researchers that want to run magma codes for three weeks or
>>>> so at a time.  the process is unfortunately sequential in nature and
>>>> magma doesn't support check pointing (as far as i know) and (I don't
>>>> know much about magma)
>>>>
>>>> So the question is;
>>>>
>>>> what kind of a system could one design/buy using any combination of
>>>> hardware/software that would guarantee that this program would run for
>>>> 3 wks or so and not fail
>>>>
>>>> and by "fail" i mean from some system type error, ie memory faulted,
>>>> cpu faulted, network io slipped (nfs timeout) as opposed to "there's a
>>>> bug in magma" which already bit us a few times
>>>>
>>>> there's probably some commercial or "unreleased" commercial product on
>>>> the market that might fill this need, but i'm also looking for
>>>> something "creative" as well
>>>>
>>>> three weeks isn't a big stretch compared to some of the others codes
>>>> i've heard around the DOE that run for months, but it's still pretty
>>>> painful to have a run go for three weeks and then fail 2.5 weeks in
>>>> and have to restart.  most modern day hardware would probably support
>>>> this without issue, but i'm looking for more of a guarantee then a
>>>> prayer
>>>>
>>>> double bonus points for anything that runs at high clock speeds >3Ghz
>>>>
>>>> any thoughts?
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin 
>>>> Computing
>>>> To change your subscription (digest mode or unsubscribe) visit 
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>



More information about the Beowulf mailing list