[Beowulf] Register article on Opteron - disagree
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Robert G. Brown rgb at phy.duke.eduMon Nov 22 09:59:03 PST 2004
- Previous message: [Beowulf] Register article on Opteron - disagree
- Next message: [Beowulf] Register article on Opteron - disagree
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
On Mon, 22 Nov 2004, Patrick Geoffray wrote: Warning and Disclaimer: The following is a Rant, a genuine rgb Rant, and hence represents a significant investment of time just to read (imagine what it took me to write:-). The faint of heart should quit now (and get some work done today...;-) <rant> > Hi Robert, > > No, it's a statistical study. The Top500 list is relevant for what it > was designed for: tracking the evolution of the HPC market, ie where > people put their money. > > Don't look at the top 10, don't look at the order, look at the 500 > entries as a snapshot of the known 500 biggest machines. In this > context, you are looking for the trends: vector vs scalar, SMP versus > MPP, industry vs academia, etc. > > Sure the vendors pay attention to it, and it's certainly a marketing > fight for the top10, but they cannot really influence the whole list, IMHO. Ah, but for whom is this snapshot useful? Who is, as you put it, looking? To me not at all -- my clusters have never intentionally resembled or been influenced in any way by anything on the top 500 list. To most of the people on this list, it is of very little use, again, even including those seeking to build a fairly advanced and powerful cluster or those who have built one that is represented there. It includes too little detail to support cluster design decisions or even to properly represent the considerable work that went into e.g. ASCI Red. It isn't just the top 10 that has to be ignored to get down to where the clusters are relevant to the needs or means of, say, a University based research group, or department, or even centralized computer cluster resources for an entire University -- it is almost the entire list, or rather, even where the clusters MIGHT be relevant to a given design goal, you cannot tell from anything on the list. The list is used, I suspect, primarily by groups like: groups, especially corporate groups, seeking to purchase a turnkey supercomputer cluster (following one of a few more or less accepted designs) just as once they would have bought a turnkey supercomputer. Vendors of hardware promoted by the list, to see what the competition is selling (and by "vendors" I mean stockholders, the board, sales and management, and design and engineering, each for their own purposes). Grant seekers, granting agencies, and grant reviewers (often the more ignorant of the above) concerned with the >>prestige<< associated with membership in the top500. People in general whose mouths might actually utter "OK, so I'm about to spend all this money on a cluster -- just where does that put me in the top 500?" (Something I've actually heard more than once, usually without a word about what the cluster is actually designed to >>do<<. Even from people that I >>know<< know better.) Of these groups, the one whose needs are best served by the list are the vendors, by far, as it is the perfect marketing tool for high margin sales backed by remarkably little design effort. The fact that there ARE so many duplicate clusters is clear evidence that it has become an ideal vehicle for selling a class of boilerplate, turnkey cluster especially in the middle to lower end of the range. "Buy our "over the counter" cluster (tuned, of course, to your budget range and needs within reason) and we'll guarantee a top 500 listing, which can't hurt with the funding agency, with the board, with the many people who control the spending of the money and who seek public exposure and prestige for marketing reasons of their own but who DON'T know enough to actually judge competing designs on intelligent grounds, eh?" > > a) It lists identical hardware configurations as many times as they > > are submitted. Thus "Geotrace" lists positions 109 through 114 with > > identical hardware. If one ran uniq on the list, it would probably end > > up being the top 200. At most. Arguably 100, since some clusters > > differ at most by the number of processors. What's the point, if the > > site's purpose is to encourage and display alternative engineering? > > None at all. > > When you take a snapshot, you take everybody on the picture, even the > N-tuples. If "Geotrace" has indeed 15 identical clusters (it is quite > frequent in industry to have several identical clusters BTW), they > should be on the list, otherwise your snapshot of the market is wrong. > > Why would you expect *all* of the entries to be different ? Who cares about a "snapshot of the market" except marketers or economists? Is this "computer science" and I missed it? As I noted in the last installment of the rant and up above, the people who most care all have something to sell, and to sell to a relatively ignorant marketplace, with basically no exception. Vendors to a class of client whose decision-making process is influenced by the simple and predictable metric of top 500 position, vendor management groups seeking to sell their strategy to boards whose approval is influenced by the number of top 500 clusters sold (and ditto vendor sales departments seeking to sell their sales strategy and performance to middle management and ditto vendor boards seeking to sell their strategy and performance to shareholders). Clients seeking to sell the clusters they want to buy to funding entities that are capable of understanding how many "megaflops" (or gigaflops, or teraflops) they are buying and the prestige associated being on the "in" list where they are utterly incapable of actually judging a cluster's design, suitability for any given purpose, or comparative cost-benefit. Even the "bragging rights" are a form of self-sales, self-deception, to the extend that they distract one from a proper design process. It's politics, not technology, driven. If the list were actually going to be >>useful<< to thoughtful engineers, would-be cluster purchasers or even would be sellers of clusters OUTSIDE of marketing and the marketplace, if it were to be TECHNICALLY userful, then obviously compressing the identical designs down to a single entry and presuming that all clusters built according to that design will have identical performance according to this metric (perhaps along with a count) makes tremendous sense. Information theory. It permits useful TECHNICAL information to be communicated far more efficiently, at the expense of sales demographics information, which is primarily useful to sales persons and weak-minded clients. > > b) It focusses on a single benchmark useful only to a (small) fraction > > of all HPC applications. I mean, c'mon. Linpack? Have we learned > > >>nothing<< from the last twenty years of developing benchmarks? Don't > > Haven't you learn the rule #1 of benchmarking ? > Rule #1: There is no set of benchmarks that is representative enough. I write benchmarking tools. Perhaps badly (as Greg has pointed out, it is Not Easy to write a really good benchmark:-) but well enough to have learned all sorts of rules about benchmarking and to write about those rules frequently on this list. What you are calling rule #1 is more commonly phrased the other way around: The only truly reliable benchmark is your own application. This presupposes that your purpose is to use benchmarks as some sort of universal metric from which one can PREDICT performance on your application. The truth is, one often CAN use benchmarks for this purpose but it is an expert-friendly process and not one that can -- usually or generally -- be reduced to a single number like "number of Linpack FLOPS". It presupposes a fairly deep knowledge of both the application itself and where and how it spends time and what its bottlenecks are, and the tools used to provide direct MICRObenchmark peeks right at the relevant performance dimensions. Have we not just spent a week and a half or thereabouts discussing networking hardware that provides bleeding edge latency, distinct topology-related performance scaling patterns, at costs that can actually be compared (to a degree of approximation)? >>This<< is what is useful (to some, but not to me in particular at the moment it is but one of many non-universal but very useful performance metrics), and is one of many reasons that this mailing list is a critical resource to real cluster designers and users where the top 500 club -- I mean "list" -- is not. Aside from a brief moment where vendors participating in the discussion got carried away with dueling Shameless Plugs (which was, by the way, annoying and clearly inappropriate for the list which is NOT to be used as a marketing vehicle except indirectly, where you can let your microbenchmark numbers and over the counter costs speak for themselves, or at least be spoken for by actual users) this was a really, really lovely discussion of the highest end networking technologies and bound to be of real use to lots of people seeking to optimize cost-benefit for actual cluster designs with known communication needs. It was very informative for others who DON'T have those needs -- it is very interesting to hear topology discussed on the list once again, as there was a period where it was only rarely considered as a design choice. And don't get me wrong -- vendors, especially including yourself, participated >>well<< in this discussion for the most part -- vendors are welcome on this list BECAUSE who could know their hardware (or supporting software) better? None of us, however, want the list to deteriorate to a de facto spamfest where marketing noise starts to outweigh signal and even the list host Scyld exercises extreme discretion in using the list to plug product without open solicitation to do so. > Linpack is a yard stick, that measures 3 things: > 1) the system compute mostly right. > 2) the system stays up long enough to be useful. > 3) the system is big enough to be in the snapshot. > > For these requirements, Linpack is just fine. What do any of these mean? "Mostly right"? You mean linpack can get wrong answers on a system that isn't openly incompetent in its basic engineering or that linpack validates that the cluster engineers in the top 500 aren't openly incompetent? How does keeping a brand new cluster up long enough for the seller (not even the client, but often the seller!) to complete a single benchmark demonstrate that it will still be running at a productive level in six weeks let alone six months? "Look, the cluster we sold you actually survived its vendor burn-in and completed linpack without crashing" is, I suppose, useful information (considering the alternative:-), but the real hardware burn-in is the next four to six weeks AFTER delivery, and the real test of the cluster's long term reliability is whether the seller goes bankrupt fullfilling the service contract or whether problems cost so much of the user's productivity that it is severely compromised. Which requires looking at the cluster 1, 6, 12, 24 months down time's stream...which WOULD be a really useful thing to publish, wouldn't it? But it isn't on the top500 list -- if half of the clusters there were broken nasty time sinks that had their administrators cursing and losing hair we'd never know it -- from the list data. The final measurement is obviously of no TECHNICAL use at all, however useful it is to marketers and however much it is the point of the whole list. Pissing contest is just about right. > I will tell you why the vendors would hate this: it takes a lot of time, > and they are not paid for this. Tuning Linpack on large machine is > already very time consuming. You want 10 more benchmarks to run ? Sure, > who will run them ? > > BTW, how do you cook up a cluster to run Linpack well ? I hope that you see the contradiction inherent in these two paragraphs. So on the one hand, vendors would hate having to provide real benchmarks because a) they would have to run them (for the most incapable or ignorant of their clients) in order to be able to provide them; b) they would have to make sure that their performance on the benchmarks is good, or they'll have to actually adjust prices to reflect their performance measurements; c) tuning a cluster design to optimize performance on any single benchmark or metric IS possible and in fact is SOP, but it is also expert friendly, time consuming and unavoidable, since running a simple test untuned might give Bad Numbers see b). Most vendors I've talked to, BTW are actually pretty happy to provide you with benchmark results, where they can, and provide access to systems where you can create your own where they cannot. It gives them an opportunity to convince you that they are worth the money, and most vendors are proud of their products and really feel that they are a good value and worth selecting on the basis of price/performance. And finally, benchmarks ARE paid for (as is everything else) -- ultimately by the consumer. Which one would I rather have a company provide me -- a glossy magazine advertisement showing racks of neat boxes and smiling geekoid administrators with their feet up on a desk, or hard benchmarks and other data on the actual PERFORMANCE and RELIABILITY of that same cluster? Hmmm.... Then you ask me how to cook up a cluster to run Linpack well? The same way one cooks up a cluster to run ANYTHING well. By analyzing the application, determining the bottlenecks, and investing the money spent on the cluster (including the de facto tuning process, which vendors SELL their clients, after all) in a balanced design. The problem is that a cluster design tuned to run Linpack well may be horribly balanced for running a distributed Monte Carlo application well, or a genomics application well, or a large scale cosmology simulation well, and so top 500 ranking is misleading even where no tuning occurs (and I'll guess that tuning always occurs, at least a bit). If one is running an application for which 100 BT communications is more than adequate, spending money to equip every node with Myrinet (not to pick on any single high-end network, you understand;-) is silly, I'm sure you'll admit. However, it is quite possible that a really big cluster built on 100 BT with an impressive aggregate compute capacity by many measures might lose out to a much smaller cluster with a more linpack-friendly balance between CPU and IPC. The fact that it costs 1/3 as much for the aggregate CPU it provides to ITS primary application is also somehow lost. Similarly, MANY clusters that are absolutely perfect for their particular application that DO use very high end communications but are truly IPC bound at (say) 64 nodes may be literally the fastest clusters in the world at their particular class of application -- top >>1<<, not just top 500 -- but still fail completely to make the top 500 at all. How can these clusters even be defended? "Oh, so you built the world's fastest cluster, did you -- how come it isn't on the top 500 list, then..." It's especially amusing that you'd try to defend Linpack in particular, as it has a long and checkered history of vendor tuning and intervention. Yes, it is rehabilitated and probably quite reliable at this point, but history has already shown that vendors WILL bend designs and skew results to make sales based on single metrics. lmbench, on the other hand, is very easy to build on a system, very easy to run (well, not openly DIFFICULT to build and run;-) and produces a standardized report that can be directly compared across systems to obtain meaningful information about how fast those systems perform a wide range of relevant micro-scale operations. stream is similarly directly useful for a wide class of programs. So is netpipe and netperf, the former in the direct context of selected IPC mechanisms. I suppose one >>can<< tune for parts of these benchmarks, but generally only at the hardware design level or driver design level, and the design improvements are (at that level) likely to actually DELIVER their benefits to all folks for whom the microbenchmark is relevant (those whose code is significantly bottlenecked by the measure and who use the improved hardware or drivers). That's WHY we can have discussions about network latency on this list. There are tools to measure it, and we can agree to use the same tool and we can agree that it provides a FAIR measure of performance that is RELEVANT to certain classes of application and affects overall performance in predictable, if complex, ways. When was the last time somebody had a serious discussion about linpack performance on list? It happens periodically, but isn't terribly interesting when it does as there are generally better measures for much of what it collectively tests. > > > c) It totally neglects the cost of the clusters. If you have to ask, > > What is the price of a machine ? List price ? Including software ? > machine room ? service ? support ? how long the support ? > You would need to define a benchmark to get comparable prices, and trust > the vendors to comply with it. Actually, I think that it would be remarkably easy to assemble a top 500-like list that would be actually useful and include both a suite of benchmark results for a variety of relevant cluser metrics, or to morph the exiting top 500 list into a form that would be useful to somebody besides marketers and their all-too-willing prey (ooo, did I say that;-). Yes, most of the changes would be anathema to marketers and certain vendors, who rely on the NON-flatness of the playing field to harvest margin. Interestingly, I really think that vendors WOULD benefit from this as much as anybody, just as my own experience is that most vendors are happy to provide real benchmark results anyway (or the opportunity to roll your own). Real business performance is linked to clear, everybody-agrees value delivered for the money spent. You want your customers to be happy a year, two years, four years down the road, which won't be the case if they come to realize that the design you sold them was a total rip-off. One can sell a price-performance loser for a while on the basis of reputation or the ignorance of one's clientele, but in the long run this is likely to be a losing strategy. At worst having a balanced-performance multi-metric top-500 (with different categories of membership and search tools to flatten across categories) would level the field and encourage the marketplace itself to determine fair prices between the various hardware alternatives. Competition is good, as long as the "fitness" used in the marketing genetic algorithm isn't a heavily skewed function relative to real fitness. > > best the cost after the vendor discounted the system heavily for the > > advertising benefit of getting a system into the top whatever. > > You mean I cannot really buy the VT cluster for $5M dollars ? What a > pity :-) I was thinking of this very cluster while writing much of these rants, of course. If ever there was a cluster built for the wrong reasons, in the wrong way, this has to be right at the top of the "Kids, Don't Try This at Home" cluster list. Build it once (for a fortune) and it sucks. Rebuild it (for a second fortune) and it doesn't suck, but nobody has a particular use for it or funding stream for it and it requires its own a small power plant just to keep it running. Maybe things are better by this point, but that's the impression I've gotten of the project, at any rate. However, there is little doubt (in MY mind) that the mindset that produced this cluster just to piss a bit further than the next one and attract media attention and to establish a certain platform as a viable cluster base for purely marketing purposes is rife within the top 500, and only thinly disguised at that. And it works. Look at the number of Power-based systems appearing in the Top 500. Vendor clout matters, and membership in "the club" is important and useful to those that can afford it. > > I could go on. I mean, look at the banner ads on the site. Vendors > > love this site. If it didn't exist, they'd go and invent it. > > Who is paying for the website, the bandwidth, the booth at SC, the > operational expenses ? The banner ads... Who is, and who should be, are very useful questions. Who is is likely a mix of the vendors and the granting agencies that support the sponsoring institutions and those institutions themselves (out of opportunity cost time if nothing else in the latter case). I wouldn't be surprised if there was grant support openly behind the project; there almost certainly is grant support behind the groups/individuals that do the actual work of running it even if that support is nominally for doing something else (common enough in academica). Who should be is indeed the major funding agencies, either singly or together. Let's face it -- the website and the bandwidth are likely nearly irrelevantly small expenses. A database (a small and non-complex one at that). A straightforward lookup tool and associated links. I've built sites like this myself in my copious free time, and that may well be how it is run this very day -- a good design and it runs itself, mostly. A cheap server on a network with decent backbone bandwidth and spare capacity (most universities and government labs would qualify). Hiring a fraction of a full-time person to run the website at and spending (say) $10K/year on hardware might cost $50K or even $100K, including overhead. To provide a >>useful resource<< to support all the many, many would be cluster users seeking NSF funding to BUY those clusters would be money well-spent and would result in saving 10 to 50 times the investment (at a guess) within the NSF alone. IF the information provided by the resource was actually (technically) useful! > There are no such things as "turnkey and mass produced" clusters, except > when a customer buy several instances of the same configuration. Most > configs are different. There is "different" and "significantly different", where I don't view systems with as being "different" unless the difference is in one of the primary design metrics actually exercised by the test being used. However, so fine, I agree. Everything is different, even if one is getting a Western Scientific Special (just one big enough to get on the top 500 list). Then let's CELEBRATE the difference, let's RECORD the difference. Instead of a "top 500" site that celebrates a single metric and a system description that is little more than a vendor name and a number of processors (giving a bit of a lie to this assertion, by the way) let's shoot for a "Cluster Registry" site where folks can "register" their clusters in some detail. Include a lot more detail -- CPU, clock, motherboard, memory, network(s), disk(s), network topology and yes, actual price paid for the cluster itself, component wise or in aggregate, exclusive of any software beyond the operating system (where relevant) and general purpose programming tools and libraries, e.g. the compiler (listed as separate entities if possible). Then run a standardized suite of benchmark tools on it to get an ARRAY of performance metrics, and price metrics, many of them microbenchmark results (and hence difficult to "tune" as they measure properties of the actual hardware architecture, operating system, and compiler chosen, along with critical libraries along e.g. network communications paths where relevant). Store all of these in a nice, extensible database. Insert a nice, powerful search engine that can look up and sort results according to nearly any specified range of design and expenditure metrics, and yes, you'd have a site that would be very, very useful. ESPECIALLY if you included two Very Important Fields: Owner comments (where they have the opportunity to bash the vendor if the pile of PC's they are sold turns out to be a pile of rubbish six months later) and owner contact information. If I >>did<< see a cluster in such a registry that had good price/performance in the spectrum of metrics that is relevant to my application(s), I'd very definitely love to have an email address so I could ask the owner if they are still satisfied and if there are any gotchas I should engineer out if possible. We Could Do This. Or at least, it could be done. I once attempted to do it for Duke (and ended up with a php/mysql based site that "worked" so I >>know<< it can be done) but it does require a bit more than the time a single, heavily overworked physicist who is learning php and mysql WHILE writing the site can contribute without actually getting paid for it... There was a time back at the very beginning when the beowulf website provided a service a bit like this, but yes, it is difficult to get folks to register their clusters without some visible benefit. Maybe that's the miracle of the top 500 list -- it is useful enough to vendors that they ARE willing to register the cluster more or less for you just to get that market visibility, and some data on a single metric is better than none at all. Sigh. > Let me say it again: the Top 500 is not designed to find the best > machine for your needs ! That the job of a RFP and the associated suite > of acceptance tests !!! > > when you want to spend enough dollars to buy you a 1 TFlops machine, you > do your homework: you identify the set of benchmarks that you have > confidence in, you ask vendors to bid, you choose one and you test it > for acceptance. You don't tape the Top500 list to the wall and throw a > needle. If you do, I have this nice bridge to sell :-) While I agree, I think there is a school in Virginia that is in the market for just such a bridge, and that while yes the process you describe is the ideal, the RFPs I've read from actual granting agencies seeking to give away large sums of money are not infrequently seeking linpack MFLOPs or (more recently) Spec* numbers in lieu of actual tuned design metrics. I'm idealist enough to hope that in the end good engineering does win out, and that the groups that DO identify the actual performance bottlenecks and spend the money optimally are rewarded with the actual grants. Oops, pardon me, I've got to go polish my rose-colored glasses with Adam Smith's invisible hands -- they fogged up from my tears of idealist joy...:-) > > useful information about things like latency, bandwidth, interconnect > > saturation for various communications patterns, speed for a variety of > > actual applications including several with very different > > computation/communication patterns (ranging from embarrassingly parallel > > to fine grained synchronous). Scaling CURVES (rather than a single > > silly number) would be really useful. > > Sure, and you do it all again every 6 months. > > Seriously, take something like bandwidth: it is pipelined or not ? > cold-start or not ? Back-to-back or through a switch ? how many stages > in the switch ? do you count a megabits as 1000000 bits or 1048576 bits > ? Let's imagine the mess when we talk about communication patterns... What mess. That's what we HAVE been talking about on this list. It is what this list DOES talk about. And yes, you do it all again, and again, and again as systems evolve and new technologies emerge and become cost-competitive with the old, which is why Myricom cannot rely on its market kudos of years past to guarantee future success. To the extent that cluster design is rational and not top 500 marketing hype, precisely the issues you raise are relevant to the would-be cluster owner, and there are real metrics that CAN be derived for accepted tools to provide the answers. Complex yes -- unavoidably so for complex network-parallel applications. However, complexity isn't necessarily a mess, and even a mess can be straightened out and organized according to certain principles. I'm not looking for the moon, here, BTW. I just think that we could do better than ranking everything according to Linpack (only). In fact, I think it would be difficult, literally, to do any worse! I mean, how about aggregate bogomips? I suppose that would be worse. Maybe. > > I mean, this site is "sponsored" by some presumably serious computer > > science and research groups (although you'd never know it to look at all > > The Top500 team is 4 people, with the same guy doing most of the work > since the list was created. There is no "groups" behind it. I was referring to the three institutions listed in the upper left hand corner of the site (as toplevel "groups"). Also, everybody who works for a University or government lab typically works for a "group" with some specified funding stream, and Dongarra, at least, is a serious computer scientist whose work I (generally) greatly admire. Maybe they haven't been able to get e.g. NSF to fund it directly. Maybe they actually make money from it and use running the list to fund other stuff. I don't know, I just find streaming ads annoying. Would we all still use this (the beowulf) list if vendors hammered us with visual ads and sales every time a technical issue was raised as a topic of discussion? I very much doubt it. Yet the top 500 list IS clearly very influential of purchase decisions -- so much so that vendors go to enormous lengths and great expense to ensure that they are represented there if at all possible and to maximize their representation there when it IS possible. > > sponsoring institutions are listed). If they want to do us a real > > public service, they could do some actual computer science and see if > > they couldn't come up with some measure a bit richer than just R_max and > > R_peak.... > > BTW, the 4 people aforementioned know quite a bit of computer science... > The fact is that the Top500 is a statistical study, not a public > service. You can use it as it is or not, but it does not really matter. As I said, I'm more than aware that they are competent CS people. I'm also highly aware of statistical studies and the value thereof (statistics is one of my many hobbies:-), and if the top500 is a statistical study, it is in my opinion a very narrow one. It requires voluntary participation (self-selection) and excludes by design a vast range of high performance computing operations to further narrow its range. These distortions bias the results, although yes the bias is uniformly applied and creates a self-consistent landscape. You might as well use Mensa members (only) as the basis of meaningful statistical measures of intelligence in the general populace. Try not to think too hard about all the brilliant people who don't belong to Mensa, including all the people who may not even test out that well in I.Q. but somehow manage to win Nobel prizes anyway. Also try not to think too hard about the kind of person that would WANT to belong to Mensa, and why. Not to particularly pick on another "top 500"-like club... I'm illustrating a point in statistical surveys with self-selection as opposed to unbiased sampling. However, the value of statistics is generally the correlations that it establishes, even though (yes) correlation is not causality and YMMV. Most of what my rant is concerned with is that the correlations that CAN be established from the top 500 list are so shallow as to be (by your own admission) nearly useless to real cluster design or informed purchase decisions, however much they ARE very useful to vendors seeking an oversimplified bazaar to sell their products. Still, your point is well taken -- list usage is a voluntary matter, and it can be taken for whatever it is worth even if it isn't worth all that much. Even a survey of the coffee-drinking preferences of Mensa members might mean something useful to coffee marketers, even if it was something much narrower than what they might have hoped for. > > AMD has more or less "owned" the price/performance sweet spot for the > > last two years. If you have LIMITED money to spend and want to get the > > What is the "price/performance sweet spot" ? > How do you measure it ? How do you know that AMD owned it ? For most consumers the bottom line is, quite correctly, how fast reliable hardware runs their application for the least cost. The sweet spot is thus typically where numerical performance by a variety of metrics (e.g. sure, linpack, but also SpecFP, SpecInt, and especially user's own applications) is the greatest for the least money spent, per node, compared to equivalently equipped hardware from other vendors. And yes, this is approximate and yes, I can fancy it up as much as you like but we'd still both know what I mean. To measure it, one ULTIMATELY runs one's own application. Alternatively, especially during the cluster design process, one considers a variety of metrics (many of which are mentioned above) with some known or expected relevance to one's own application, in a hardware configuration capable of providing performance on a par across the alternative being considered. For CPU/motherboard/memory choices, that would typically be networking and disk. Numerical performance, in turn, is often influenced strongly by memory bandwidth for certain applications, by processor design (e.g. pipelining etc) and compiler and OS support. So run your application on it if you can, run some benchmarks that tend to have performance variability that is related to how your application's performance varies (if you are lucky and know of some), then compare prices from vendors that provide all of what you need outside of the primary performance predictor space. I know you know all of this and work with it (for networking hardware) for a living -- it is just that yes, there is a sweet spot, both for "each" cluster purchaser/user, and an aggregate/average sweet spot for all cluster purchase/users. As for how I know it, from my own tests (limited as they may be) and other purely anecdotal experience, of course. When thinking of a cluster design for a new cluster, I run lmbench, cpu_rate, my own application ;-), the applications of many of my good friends who use clusters for a living, look up spec results and might even look up linpack results for Opterons or Athlons before them at a variety of clock and memory configurations. I look at the price of equivalently equipped and supported CPUs in a given class across the range of clocks supported by each to the extent that patience permits. While there are doubtless exceptions (although I have yet to find one) I've just gone through such a process and the Opterons at this particular moment in time (In My Opinion and as of Some Weeks Ago, things change rapidly) deliver the most general purpose "numerical" performance per dollar spent, in reliable and consistent hardware configurations from a variety of vendors, end of story. The Athlons and Durons were actually narrower winners in their day, but in the case of the Opteron it really has been little to no contest, EVEN with the 2.4 kernels wasting a goodly chunk of the potential of the hardware. With 2.6 kernels they simply fly. Not terribly surprising, actually. The technical and performance advantage of the 64 bits is at least partly offset by Intel's commanding market advantage, causing a relative price inversion so that AMD's best is priced better than Intel's best. Number 2 has to try harder. Now, if you object (perfectly reasonably) that THIS isn't an objective, reproducible, statistically supported conclusion but instead is anecdotal bullshit that might or might not be true, you'd be perfectly correct (not that it would change my opinion:-). I'd counter that this is precisely why having a MEANINGFUL statistical sampling of MANY performance measures on real production clusters at many scales and purposes would be so very useful, and why the top 500 list is so -- disappointing -- in contrast. > The Top500 list tells you that people put their money on Intel more than > on AMD. If you assume that people do they homework when they buy such > machines, you may deduct that Intel was more attractive in the bidding > process. There are so many flaws with this observation that I won't even bother addressing them. Next you'll be arguing that Windows is a technically superior operating system because Look! Countless Surveys Prove that more systems are sold running Windows than all of the alternatives put together! Not to mention the self-fullfilling prophecy aspect of the top 500 feeding upon itself. Perhaps the reason there are so many Intel systems represented is because there have been so many Intel systems represented in the past, and because people really ARE strongly influenced in their purchase decisions by the top 500 list itself. Perhaps it is because Intel twists the arms of many of the potential vendors of systems at that scale to use only their processor in everything that they sell or pay more for everything (shrinking their margins) just like Microsoft does with Windows. Perhaps it is because they charge such a high margin compared to the actual performance that they can afford to discount cluster components heavily to specific "big" purchasers and still make money while giving the consumer the illusion that they are "getting a bargain". Perhaps it is the eternal "TCO" chant of marketers everywhere seeking to justify their high-margin sales (sometimes with reason, mind you -- I'm just observing that this particular mantra plays to the advantage of the top player in any given market). I mean, c'mon. Even the best of humans have a bit of sheep within them. > I can understand how the Top500 is not matching your expectations, but > the truth is that it was designed to match them. What you are looking > for just does not exist. It's not utopia, you could really imagine a > company following the SPEC model: vendors pay for submitting (a lot of) > results and results are reviewed seriously. All you need is to convince > vendors to spend a lot of money on it. Is the HPC market big enough for > that ? I doubt it. I agree that the top 500 model is not matching my expectations, and I think that (deep down) we agree as to the groups for whom it IS matching expectations. However, we aren't limited to just the Spec model. We could use a different model altogether, one that we create and that need not cost a lot of money on anybody's part to play. I have great hopes for Doug's efforts to build a suite of cluster benchmarking and testing tools. Another group that CAN play the role of watchdog in this sort of thing is the news media -- a magazine, for example. Hmmm, the magazine I write for, for example. Even with magazines one has to be a bit careful -- they rely on vendors for advertising dollars -- but generally to my own experience they manage to present a fair and balanced view of performance analyses and tests. Magazines have most definitely developed their own suites of testing programs many times in the past and published regular comparative reviews of the available hardware on that basis. And of course you can trust ME implicitly...;-) So two (of many) possible funding/support models is for CMW (for example) or an NSF-supported site (for another example) to set up a "rich" database like the one I describe above, and to provide a prepackaged, ready to build and run suite of open source benchmarks to support a unified and consistent measure of performance. The cluster community appears to be big enough to support its own magazine and its own targeted advertising -- maybe it has grown to where it CAN support a real database of real performance metrics across a broad base of cluster owners. Note that I say just "build and run", the benchmark suite, not "tune", at least at the software level. Ideally, this would be a matter of e.g. rpmbuild or ./.configure;make (possibly after editing a single toplevel configuration file to select e.g. compiler and network and disk resources to be tested and to specify the test cluster). The tool itself could even be made autoreporting -- run it and it refreshes your password- or cookie-protected entry in the master table. So I think it is doable, and doable for very little more effort than that being spent already on the top 500 list. I also think that it is a WORTHWHILE, DESIREABLE, GOOD thing to do. Perhaps it will happen. Now, if I could only repackage the above and turn it into a CMW column, Life would be Good. Sigh. Heck, maybe I will. Then I'd get paid for writing it...;-) </rant> rgb Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu
- Previous message: [Beowulf] Register article on Opteron - disagree
- Next message: [Beowulf] Register article on Opteron - disagree
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
