[Beowulf] commercial clusters
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Mark Hahn hahn at physics.mcmaster.caFri Sep 29 10:49:48 PDT 2006
- Previous message: [Beowulf] commercial clusters
- Next message: [Beowulf] commercial clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
> very well taken. There are an enormous number of people who could use "big > computation" if it were "easy to use" and "cheap enough". $10K is a maybe. to me, if dell started selling $10k windows-cluster-in-a-box that was really at the windows-drooler level, it would be a huge shame. vast amounts of truely crappy jobs would be run, and vast amounts of cycles would be burned on the screen-saver. I'm not arguing against either the dumbing down into compute-appliances (where appropriate), or against people spending their money as they see fit. I just think there is massive value in centralized compute facilities because users can get programming help, professionally managed hardware, cheaper cycles and efficient interleaving of multiple user's bursts of demand. admittedly, that is precisely the model I've spent my last 5 years on, but I think it still makes sense. it's not the only model, but it has some real advantages over others such as the one mentioned above, or the grid fantasy (fungible computation too cheap to meter). but one of the other points in this thread was the current fad of multi-cores. let's face it - it's a fad, which doesn't imply that there's no substance driving it, or that it will vanish without a trace. CPU designers are facing an embarassment of transistors at 65 or 45nm, and a relatively no-thought way to use em up is just to replicate the same old design. I _do_ mean to imply that this is a cop-out, and I really do believe that once we get to 4-core chips, someone will embarass the other chip vendors by implementing a genuinely thoughtful microarchitectural response to the transistor surplus. caches are great, but scaling them to just use up the chip area is not smart. relying on that approach is saying: I bet no one else in the industry is smart enough to think of something better. come on! I'm not even a chip designer and I can think of lots of smarter things to do. create a "load-history cache" which, like a branch-history cache, tries to figure out whether there's a predictable stream of loads coming from one instruction. provide an instruction which lets the programmer/compiler hint how many times to speculatively unroll a loop (literature says there are plenty of useful speculation, and that the trick is to avoid drowning in it). figure out an on-chip fabric that lets you have lots of independent register files without dumbly partitioning the chip into cores, since static partitioning always leads to fragmentation and poor utilization. have a smarter cacheline that will notice that it only gets re-used an average of 3 times in the 57 clocks following enstatement, and so shifts itself into L3 proactively. or notice that some code sequences are relatively urgent (dependent) and others are pretty slack (speculatively unrolled iterations of a loop, perhaps), so schedule them smarter. how about a miss-history buffer that notices when you write a value to a non-owned line that later gets moved to other cores and becomes shared, so preemptively updates them. most of these ideas are crazy in one way or another, but they're a lot more interesting than more cookie-cutter chips... and fundamentally, Amdahl's law argues against too-rabid multi-coring.
- Previous message: [Beowulf] commercial clusters
- Next message: [Beowulf] commercial clusters
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
