[Beowulf] Re: energy costs and poor grad students

Wed Jul 2 07:22:37 PDT 2008

Does your university have public computer labs?  Do the computers run some
variant of Unix?

At UMN, where I did my grad work in physics, there were a number of
semi-public "Scientific Visualization" or "Large Data Analysis" labs that
were hosted in the local supercomputer center.  The center there has a
number of large machines that you had to apply and give a really good
rationale to use, but the smaller development labs (with  2-way to 10-way
sunfires, similar sized sgi's, linux machines, etc) basically sat vacant 5-6
days per week.

Some of the labs had a pbs queue, some had a condor queue, and some just
required that background jobs be "nice +19 ./a.out".  My graduate work
required several large parametric studies which computationally looked like
lots of monte-carlo-ish runs which could be done in parallel.  The beauty of
this was that no message passing was required, so, if there were 23 cores
open one evening at 6pm, and assuming no one would be doing work overnight
(for the next 14 hours), I could start 23 14 hour jobs at 6pm and have a
little less than 2 weeks of cpu work done by 8am the next morning.  I used
(and mentioned) the technique in the paper,
http://www.pnas.org/cgi/content/full/101/37/13431 (search for "computational
impotence").

This only works though if your university's computer labs run a unix-ish os,
and if the sysadmins are progressive.  At the school where I presently teach
similar endeavors have been much harder to start-up.

Nathan Moore

On Wed, Jul 2, 2008 at 8:44 AM, Joe Landman <landman at scalableinformatics.com>
wrote:

> Hi Mark
>
> Mark Kosmowski wrote:
>
>> I'm in the US.  I'm almost, but not quite ready for production runs -
>> still learning the software / computational theory.  I'm the first
>> person in the research group (physical chemistry) to try to learn
>> plane wave methods of solid state calculation as opposed to isolated
>> atom-centered approximations and periodic atom centered calculations.
>>
>
> Heh... my research group in grad school went through that transition in the
> mid 90s.  Went from an LCAO-type simulation to CP like methods.  We needed a
> t3e to run those (then).
>
> Love to compare notes and see which code you are using someday.
> On-list/off-list is fine.
>
>  It is turning out that the package I have spent the most time learning
>> is perhaps not the best one for what we are doing.  For a variety of
>> reasons, many of which more off-topic than tac nukes and energy
>> efficient washing machines ;) , I'm doing my studies part-time while
>> working full-time in industry.
>>
>
> More power to ya!  I did mine that way too ... the writing was the hardest
> part.  Just don't lose focus, or stop believing you can do it. When the
> light starts getting visible at the end of the process, it is quite
> satisfying.
>
> I have other words to describe this, but they require a beer lever to get
> them out of me ...
>
>  I think I have come to a compromise that can keep me in business.
>> Until I have a better understanding of the software and am ready for
>> production runs, I'll stick to a small system that can be run on one
>> node and leave the other two powered down.  I've also applied for an
>> adjunt instructor position at a local college for some extra cash and
>> good experience.  When I'm ready for production runs I can either just
>> bite the bullet and pay the electricity bill or seek computer time
>> elsewhere.
>>
>
> Give us a shout when you want to try the time on a shared resource. Some
> folks here may be able to make good suggestions.  RGB is a physics guy at
> Duke, doing lots of simulations, and might know of resources. Others here
> might as well.
>
> Joe
>
>
>
>> Thanks for the encouragement,
>>
>> Mark E. Kosmowski
>>
>> On 7/1/08, ariel sabiguero yawelak <asabigue at fing.edu.uy> wrote:
>>
>>> Well Mark, don't give up!
>>> I am not sure which one is your application domain, but if you require
>>> 24x7
>>> computation, then you should not be hosting that at home.
>>> On the other hand, if you are not doing real computation and you just
>>> have a
>>> testbed at home, maybe for debugging your parallel applications or
>>> something
>>> similar, you might be interested in a virtualized solution. Several years
>>> ago, I used to "debug" some neural networks at home, but training
>>> sessions
>>> (up to two weeks of training) happened at the university.
>>> I would suggest to do something like that.
>>> You can always scale-down your problem in several phases and save the
>>> complete data-set / problem for THE RUN.
>>>
>>> You are not being a heretic there, but suffering energy costs ;-)
>>> In more places that you may believe, useful computing nodes are being
>>> replaced just because of energy costs. Even in some application domains
>>> you
>>> can even loose computational power if you move from 4 nodes into a single
>>> quad-core (i.e. memory bandwidth problems). I know it is very nice to be
>>> able to do everything at home.. but maybe before dropping your studies or
>>> working overtime to pay the electricity bill, you might want to
>>> reconsider
>>> the fact of collapsing your phisical deploy into a single virtualized
>>> cluster. (or just dispatch several threads/processes in a single system).
>>> If you collapse into a single system you have only 1 mainboard, one HDD,
>>> one
>>> power source, one processor (physically speaking), .... and you can
>>> achieve
>>> almost the performance of 4 systems in one, consuming the power of....
>>> well
>>> maybe even less than a single one. I don't want to go into discussions
>>> about
>>> performance gain/loose due to the variation of the hardware architecture.
>>> Invest some bucks (if you haven't done that yet) in a good power source.
>>> Efficiency of OEM unbranded power sources is realy pathetic. may be
>>> 45-50%
>>> efficiency, while a good power source might be 75-80% efficient. Use the
>>> energy for computing, not for heating your house.
>>> What I mean is that you could consider just collapsing a complete "small"
>>> cluster into single system. If your application is CPU-bound and not I/O
>>> bound, VMware Server could be an option, as it is free software
>>> (unfortunately not open, even tough some patches can be done on the
>>> drivers). I think it is not possible to publish benchmarking data about
>>> VMware, but I can tell you that in long timescales, the performance you
>>> get
>>> in the host OS is similar than the one of the guest OS. There are a lot
>>> of
>>> problems related to jitter, from crazy clocks to delays, but if your
>>> application is not sensitive to that, then you are Ok.
>>> Maybe this is not a solution, but you can provide more information
>>> regarding
>>> your problem before quitting...
>>>
>>> my 2 cents....
>>>
>>> ariel
>>>
>>> Mark Kosmowski escribió:
>>>
>>>  At some point there a cost-benefit analysis needs to be performed.  If
>>>> my cluster at peak usage only uses 4 Gb RAM per CPU (I live in
>>>> single-core land still and do not yet differentiate between CPU and
>>>> core) and my nodes all have 16 Gb per CPU then I am wasting RAM
>>>> resources and would be better off buying new machines and physically
>>>> transferring the RAM to and from them or running more jobs each
>>>> distributed across fewer CPUs.  Or saving on my electricity bill and
>>>> powering down some nodes.
>>>>
>>>> As heretical as this last sounds, I'm tempted to throw in the towel on
>>>> my PhD studies because I can no longer afford the power to run my
>>>> three node cluster at home.  Energy costs may end up being the straw
>>>> that breaks this camel's back.
>>>>
>>>> Mark E. Kosmowski
>>>>
>>>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080702/9fbf614d/attachment.html>