[Beowulf] SGI to offer Windows on clusters

Ryan Waite ryanw at windows.microsoft.com
Sun Jan 21 17:57:20 PST 2007

-----Original Message-----
From: Mark Hahn [mailto:hahn at MCMASTER.CA] 
Sent: Friday, January 19, 2007 12:23 PM
To: Ryan Waite
Cc: Beowulf at beowulf.org
Subject: RE: [Beowulf] SGI to offer Windows on clusters
>> I know some of you aren't, um, tolerant of Microsoft for various
> "tolerant" is the wrong word.  beowulf is, definitionally,
> anyone using *nix has obviously made a decision (aesthetic,
commercial, etc)
> to avoid the dominant OS; this is rejection rather than ill
>> and we've designed for those customers. So, users get an OS, job
>> scheduler, management package, and MPI stack for < $500.
> compared to $0.
> in that light, the question is- does the academic cost include some 
> form of support?

Academic version is provided through MSDN Academic Alliance. Support
depends on the type of agreement. Our dev team also monitors
http://windowshpc.net and provides support there.

>> Our MPI stack is based on MPICH2 but we've made performance and
>> enhancements. The folks at ANL are very talented UNIX developers but
> I cynically guess the "security" enhancements are not really
> but rather simply making it work in the MSFT universe (domain
> etc).  is that correct?

We did some other security work as well. One example is the MPICH2 code
stores password information un-encrypted in the registry. That needed to
be changed. There were a couple similar issues uncovered during our
security review.

When used with our job scheduler we do integrate with Active Directory.
When jobs execute on compute nodes we first create a Windows Job Object
and then log on using the submitting user's credentials. The Job Object
runs with that user's credentials. If the user has their data on a
secured server then they should be able to access that data directly.
Also, if the process running in the Job Object spawns any other
processes they are contained in the same Job Object; if the user's job
is cancelled we can clean up any child processes.

>> Windows is more efficient using async overlapped I/O. We've made
> what _would_ be interesting is to hear about the technical aspects.
> that is: how much difference does this make?  is the impetus to use 
> async mainly to batch completion events (sort of like mpi_waitall)?
> does it affect latency?  how does it compare to normal mpich on the
> hardware under linux?

There were a number of changes. If you think it would be interesting I
could arrange a conference call or an online presentation where we
present our changes and the rationale behind those changes.

> also, is there any documentation on the job scheduler?

Yep. Here's a paper with an overview of the scheduler:

Here's an article on linking with the CCP Job Scheduling API. Our
partners like Mathworks, Ansys, Schlumberger use these APIs to integrate
their apps directly with the cluster job scheduler. The result is that
users don't have to learn job control languages, they just press the
compute button from inside Fluent, etc.

> these kind of specifics would, truely,  be most gratefully "tolerated"
>> ANL for incorporation in future MPICH stacks. We're also the first
>> at Microsoft making these kinds of sizable contributions back to the
>> open source community.
> are the contributions entirely specific to the windows API?
> (I would not call it a contribution if you've simply ported to the 
> architecture you own...)
>> think we're going to help the community bring HPC into mainstream
>> computing.
> I'd be curious to understand what that means.  HPC as implemented by 
> beowulves seems entirely mainstream to me.  (lots of places already
> substituted a GUI button for a queue-submit command...)

Yeah, I think HPC is much more mainstream than before. I'd like to get
to a place where an HPC cluster is just another shared network resource.
Just like you can plug a printer into a network and decide to share it
with your colleagues I think setting up an HPC cluster and sharing it
should be just as simple. 

Ideally you could buy a 16 or 32 node cluster for your workgroup, plug
it in, it images itself (or all the nodes are pre-installed), integrates
with your network security (Active Directory or Kerberos or NIS, etc.),
decide who you want to share it with, and it's ready to run jobs. With
applications like Matlab's DCT and Grid Mathematica integrated directly
with the job scheduler it's easy to submit jobs and the results are
returned directly to the application.

So, mainstream, for us isn't about building bigger and bigger clusters
but about building clusters that are easy to manage, integrated directly
with your apps, and that work well with the rest of the software
infrastructure you have in place.

Woah, starting to sound like a marketing person.

> regards, mark hahn.

More information about the Beowulf mailing list