[Beowulf] General cluster management tools - Re: Southampton engineers a Raspberry Pi Supercomputer

Lux, Jim (337C) james.p.lux at jpl.nasa.gov
Thu Sep 13 14:35:44 PDT 2012

This reminds me of another point I should have brought up in my previous post about people and politics: The myth that commercial software has better support. It seems managers always want to buy commercial software it's supposed to have better support.

 From my experience that is a complete myth. When I was using SGE and Open MPI for my previous cluster, I could send an e-mail to the mailing lists for those packages, and usually get a resolution in a couple of days at most. Often, I'd get a useful response with hours, sometimes minutes. And even more impressive, the resolution would normally come from one of the developers who's "volunteering" his time. Sometimes, my requests/bug reports would lead to updated code I could download and test within a couple of days.

>>>> This may be true.. however, do you have metrics in a nice 1 page summary to prove it?  The commercial vendor sure does.   You put up your "time from problem to time answer received" histogram and ask the vendor to *promise* (with money attached) to beat it.

I could mention many other open-source packages I had similar experiences with, but they're not cluster related.

With commercial vendors, I've found resolutions typically take weeks or months. One case took 18 months. I have one issue open with a vendor now thats about 6 weeks old, and no root-cause resolution in site, just a bunch of workarounds.

Just for the record, I'm NOT an open-source zealot. I just like to get my job done and go home at a reasonable hour.

>>> Metrics are your friend.. Sure, either side can argue that "our problem was harder, so it took longer to solve", but first impressions count.  Your plot with the big hump at 6 hours and no tail past 48 hours trumps the hump at 12 hours with a tail to 168 or 8000 hours..

95% percent closure within X hours vs 20% closure within X hours is a very compelling argument, before you get into the nitty gritty details about the issues being closed.

