[Beowulf] [OT] HPC and University IT - forum/mailing list?

Gerry Creager N5JXS gerry.creager at tamu.edu
Thu Aug 17 06:28:37 PDT 2006


Michael Huntingdon wrote:
> At 04:13 PM 8/15/2006, Mike Davis wrote:
> 
>> I'm not 100% sure about that Mark. I care about big-A administration. 
>> I care about showing departments what resources are actually 
>> available. I care about what is the most efficient use of limited 
>> University resources. When I meet with researchers they often say that 
>> they had no idea that there were 500+ processors dedicated to research 
>> here.
>>
>> I know that other people have the same issues. Another is the funding 
>> model issue. Which is best overhead, direct, or central budget? Or how 
>> about knowing what resources we each provide our users. Does a given 
>> organization focus on hardware support, software support or both?
>>
>> Those are some of the Big-A issues. Here is one that is both Big-A and 
>> small-A.
>>
>> Running one of the new Sun x4100's with both dualcore processors at 
>> 100% uses <270 watts (as determined by kill-a-watt. That is Big-A 
>> because it means that we can be more efficient in our use of AC and 
>> power. It is small-a for the same reasons. For example spinning up a 
>> v20 uses 250 watts for both processors at full power. I can't discuss 
>> some of my application specific performance due to license 
>> constraints, but I can say that I like the 4100 in general for 
>> Computational Physics and Chemistry.
> 
> 
> I often scratch my head wondering how certain decisions are make at the 
> "central IT" level, so a perspective from the campus that involves both 
> performance and up time (plug in the wall) costs is refreshing. We far 
> to often see a complete disconnect between the two, which very often 
> means that none of the invested parties (at either the 
> NSF/NIH/state/federal level) ever really enjoy the value of each dollar 
> they invest.

My HPC efforts are now involving both cases of 'A' as we're trying to 
change the HPC paradigm on campus from almost solely SMP to a 
combination of memory paradigms to better serve the research community. 
  And the 'we' is not directly in the Administration chain but is from a 
couple of on-campus players who are frustrated with the status quo.  I 
look at infrastructure costs as well as per-node costs, maintenance and 
potential down-time vice spares.  Also, I now have to consider 
networking costs, both on campus and in our commodity, Internet2 and NLR 
dealings.  Bandwidth is no longer free, although a lot of researchers 
still think it is.

> I appreciate that Sun may be suggesting (these days) that their systems 
> are more environmentally friendly; however, given the 
> price/performance/environmental/support...and really crazy extended down 
> time associted with engineering issues, logic at least for some, makes 
> distancing significant IT investment with Sun a decision that follows 
> very few conversations.

Interesting: I've started buying a lot more Sun as their costs are 
comparable to the blade cluster prices I've gotten from other vendors, 
we have seen as good, or better up-times with v2100 through v4200 series 
hardware, and time-to-repair has been stellar, with the on-site Sun 
support.  When we had problems flashing a BIOS upgrade and couldn't 
recover it ourselves, our last real Sun downtime, they had a new 
motherboard in within 6 hours and the system back up an hour later.  I 
haven't suffered from the engineering issues you seem to have encountered.

> My point comes honestly from your comments, which we hold dear.....the 
> growing number of research system/cpu's on campus affect each and every 
> one of us on a daily basis. Having spent this week at the LSS event at 
> Stanford, I am ever more convinced, how diverse the needs...and the 
> number of possible solutions. So that must be a big-A approach with a 
> huge tilt in a not so big-A direction.
> 
> 
>> Another that is both is what submission systems we are using and Why?
>>
>> Same questions, that affect both administration and Administration.
>>
>>
>> Mike davis
>>
>>
>>
>>
>>
>>
>> Mark Hahn wrote:
>>
>>>> beowulf traffic itself is "noise"?  If you are thinking of a "list for
>>>> university deans" or members of research support offices or 
>>>> departmental
>>>
>>> ...
>>>
>>>> administerable and accountable should they get audited) -- then yeah, I
>>>> think a new list or other venue would be very useful.
>>>
>>>
>>> yes.  the overlap is minimal, I believe - I'd say the two approaches 
>>> are even inimical.  someone who is primarily interested in big-A 
>>> Administration will have values opposed to mine as a technologist.
>>> as a random pot-shot, big-a people tend to have great faith in 
>>> negotiating special purchasing relationships with a vendor, or 
>>> believe that integration
>>> is the high-road to success (or an end in itself).  I know, OTOH,
>>> that a vendor who makes a good desktop may make the worlds worst compute
>>> nodes, and that, for instance, the service requirements are nearly 
>>> opposite.
>>> here's my general conclusion about central-IT efforts: if the idea 
>>> (centralized storage, whatever) is so good,
>>> people will beg to use it.  if you have to force people to use it,
>>> you are simply wrong in some way (perhaps subtly).

Just to touch on this, I'm in general agreement, although our big-a have 
just negotiated a big-iron acquisition for a decent price.  It'll go 
into the central IT core for shared computing resources... I'll see how 
well it works out.  The central IT HPC folks were the proximate cause of 
me building my first cluster...

gerry
-- 
Gerry Creager -- gerry.creager at tamu.edu
Texas Mesonet -- AATLT, Texas A&M University	
Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983
Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843



More information about the Beowulf mailing list