Archives


- Beowulf
- Beowulf Announce
- Scyld-users
- Beowulf on Debian

[Beowulf] Computation on the head node

Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.

Search

Jeffrey B. Layton laytonjb at charter.net
Mon May 19 04:40:47 PDT 2008


Joe Landman wrote:
> Perry E. Metzger wrote:
>> checking. A very fast RAID array may be in order -- or it may be
>> completely unnecessary. One can't know without understanding one's
>> application intimately, and that requires testing.
>
> Of course.  But there are quite a few people/groups on this list with 
> decades of HPC experience that might have an inkling if a USB or 
> similar connected drive "is a good idea" for an app, even prior to 
> running it.  Benchmarking is important, but it is important that the 
> benchmark represent real runs.  Experience can provide a rough guide 
> in the case of no benchmark data availability.  With clusters, you run 
> into the very real problem of IO resource contention, quite quickly.  
> Putting lower end IO devices in there rarely makes sense.  Sure, you 
> can benchmark it, and you should if possible.  But it is also not a 
> bad idea to listen to people whom have been working on this stuff for 
> a while, they might have a clue about these things.

Here comes the $64 question - how do you benchmark the IO portion of your
code so you can understand whether you need a parallel file system, what 
kind
of connection do you need from a client to the storage, etc. This is a 
difficult
problem and one in which I have an interest.

The best way I've found is to look a the IO pattern of your code(s). The 
best
I've found to do this is to run an strace against the code. I've written 
an strace
analyzer that gives you a higher-level view of what's going on with the IO.
I'm also working on a tool that can take the strace output and create a 
"simulator"
that will run in a similar manner to the original code but actually 
perform the
IO of the original code using dummy data. This allows you to "give" away
a simple dummy code to various HPC storage vendors and test your 
application.
This code is taking a little longer than I'd hoped to develop :(

Jeff




More information about the Beowulf mailing list