[Beowulf] Glenn Lockwood's Thoughts on the NSF Future Directions Interim Report
Prentice Bisbal
prentice.bisbal at rutgers.edu
Mon Feb 2 08:37:13 PST 2015
On 02/02/2015 08:38 AM, Michael Di Domenico wrote:
> Glenn's article is good and hits on many topics correctly (of which
> i've seen, having sat on the vendor side of NSF proposals in a former
> life). However I'm a little concerned by what i perceive of his
> attitude towards stripping funding from centers that don't have the
> technical prowess to run an HPC resources.
>
> NSF's goal is to further science. stripping funding, i don't believe
> is the correct solution. if a center isn't keeping up or doesn't have
> the skills from the start, there should be a mentor put in place from
> one of the other bigger centers. stripping funding is only going to
> shrink the pool of knowledge to a few key installations around the US,
> which probably isn't the best way to spread knowledge. but i do
> concur there is a point where the NSF would probably/already has
> spread itself too thin
>
> seems to me NFS needs to get back into building the HPC community of
> PEOPLE rather then building hero machines at six or seven
> installations across the us.
>
I interpreted it differently. I think he was saying that the NSF funding
for HPC should be concentrated in fewer sites, similar to what the DOE
has done with their leadership computing facilities (LCFs): Argonne
Leadership Computing Facility (ALCF) and Oak Ridge Leadership Computing
Facility (ORLCF). By concentrating their resources in fewer locations,
they can take advantage of economies of scale:
1. Pay for two large data centers instead of 5 or 10
2. Higher a somewhat larger, but much more talented staff whose talents
can be spread out over several clusters and storage systems rather than
many smaller support staffs with (most likely) less capabilities for
each site.
And on, and on.
By committing heavily to less sites, it's easier for the NSF to focus on
providing a stable financial footing, than having to constantly spread
the money around many different sites like they're broadcasting seeding
a lawn.
TL;DR: Put all your eggs into 2-3 baskets, and keep a really good eye on
those baskets.
Regarding your comment about 'hero' systems: I read a paper a couple of
years ago that the large majority of computational scientists don't need
these massive exascale systems - most only need a 'department'-sized
cluster with ~1024 cores. I believe SDSC did their own study with XSEDE
data and came to the same conclusion (Glenn actually told me this. I'm
not sure if this is published anywhere).
This reminds me of 'The Long Tail'
(http://en.wikipedia.org/wiki/Long_tail): The hero systems cater to the
small percentage of extremely talented computational scientists at the
top of their fields, and the long tail, which is your 'average'
computational science PI or grad student at universities around the
world, still has to rely on an antiquated small departmental cluster.
because the NSF focuses on the hero users to the detriment of the long
tail, which actually represents the bulk of their funded scientists.
More information about the Beowulf
mailing list