[Beowulf] 512 nodes Myrinet cluster Challanges
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.Clements, Brent M (SAIC) brent.clements at bp.com
Wed May 3 13:35:17 PDT 2006
- Previous message: [Beowulf] running out of rsh ports
- Next message: [Beowulf] NFSv3 client hangs - tcp v/s udp.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I find that most large "supercomputers" are still nothing more than compute farms that have an execution daemon and policy monitor to manage the compute farm. Brent Clements This message may contain confidential and/or privileged information. If you are not the addressee or authorized to receive this for the addressee, you must not use, copy, disclose, or take any action based on this message or any information herein. If you have received this message in error, please advise the sender immediately by reply e-mail and delete this message. Thank you for your cooperation. ________________________________ From: beowulf-bounces at beowulf.org on behalf of Robert G. Brown Sent: Wed 5/3/2006 3:24 PM To: Vincent Diepeveen Cc: beowulf at beowulf.org; Patrick Geoffray Subject: Re: [Beowulf] 512 nodes Myrinet cluster Challanges On Wed, 3 May 2006, Vincent Diepeveen wrote: > Let me react onto next small quote: > >> You are somehow convinced that institutions buying clusters are brain >> dead and always get ripped off. Some are, but most are not. You don't >> have all of the informations used in their decision process, so you draw >> invalid conclusions. > > Ok a few points: > > First a simplistic fact: > 0) the biggest and vaste majority of supercomputers get bought by > (semi-)government organisations What does this mean? Are Universities semi-government organizations? How about IBM, Cray, Hitachi? How about oil companies? And just what is "a supercomputer"? In the old days it meant a big iron system; now it more often means an HPC cluster of some sort or another that is "a computer" in the beowulfish sense more by virtue of naming than of function per se as its capabilities may or may not be brought to bear on a single problem for any significant fraction of its duty cycle. I ordinarily have little use for the top500 site, but eyeballing it one might possibly conclude that the biggest supercomputers do indeed get bought by the government (as the BIGGEST ones cost so much that nobody else can afford them). So the top 100 is dominated by government, by Universities (purchased, doubtless, with government money if that makes them "semi-government") and big computer companies. >From there on down your statement is just plain wrong. Corporations appear to be in the slight majority in at least the second 100 and beyond, followed by Universities and with the government third, although this is based on eyeball counts because I'm too lazy to do better. Somewhere on the top500 site or a derived site there is probably a pie chart with all of this information predigested, though, for googlers out there. Lots of oil companies, banks, semiconductor companies, telecommunications companies, engineering firms, car companies, aerospace companies. Pretty much who you'd expect to be interested in large scale cluster computers on a commercial basis, actually -- and who can afford them. As always, the top500 presents a very skewed view of the supercomputing world, as I >>personally<< consider a 32 node beowulf cluster running a single very fine-grained synchronous task that peaks in its parallel scaling speedup for the IPC channel used at 32 nodes to be "a supercomputer". A well-designed, cost-benefit optimal one at that. A 32 node generic grid-style cluster is arguably "a supercomputer" in that it significantly improves on doing von Neumann-style serial computing on a single processor. Who >>knows<< what the distribution of ownership is of 16, 32, 128 node clusters (things below the top500 radar) or corporate computers that they didn't bother to register on the top500 site (perhaps because running their benchmark is, really, a waste of corporate resources and time unless the vendor does it free, which is what likely DID happen for many of the entries). > That in itself should already raise questionmarks on our heads. Even though > some companies need massive > power, they somehow seem to be smarter and manage to do the job in a > different way. Do >>what<< job in a different way? WHAT different way. If you want to do large scale HPC-generic computations, there aren't really a hell of a lot of options. It is "do them on a cluster", or perhaps "do them on a cluster". And then there are those that just "do them on a cluster". A VERY FEW of the clusters are big-iron-y clusters with custom IPC channels. The "biggest and vast majority" of them are just plain old beowulf-style or grid-style HPC compute clusters made up of a large pile of "independent" motherboards, memory, and processors interconnected by a bus-linked network. The network is minimally 100BT (or more reasonably by this point 1000BT) but goes up to and includes Myrinet, Infiniband, Quadrics, etc. at the high end, where high end means: a) Highest bandwidth and/or b) Lowest latency. The high end interconnects are expensive. The low end ones are cheap. At the highest end, things get all techno-detailed and price/performance depends on a lot of things including (in a very specific way) the TASK MIX and SIZE OF THE CLUSTER and DESIGN OF THE MOTHERBOARD (especially the cpu, bus and memory subsystems), YMMV, Caveat Engineer, it's Your Money so Spend It Wisely. So what exactly did you mean about "a different way"? Something different from piling up CPUs and memory and interconnecting them with an OTC network? Does that belong on this list? > However i can now only talk and speak about how government works, as all > those machines belong to > semi-government. > > There is now suddenly a great distinction between university level and > government level. > SOME universities do pretty well actually and the points underneath do not > apply to them > > a) Let me give a simple example. In the supercomputer europe 2003 report > they still seem to not know that > opteron processors support SSE/SSE2, whereas, i'm guessing this report > was used to order for 2004/2005 > machines. > > As a result of that intel processor had an advantage over amd, whereas > that AMD processor in SSE2 already > was performing in tests for me in DFT faster than any intel processor. > Even though i reported > this to the persons in question and the organisation, i never got > feedback. > > When i read that report 2003 which was available in start 2004, i > already knew what kind of machine > they were gonna order for 2005. >From which we can conclude that some people are stupid, and others are greedy? That's no surprise, and what's your point (beyond that)? For the most part, even government machines (at least in the US) are designed one at a time for a specific task, are grant funded for performing that task, are purchased and assembled and perform that task and any others that might come up in the venue where they operate. There is no such thing as a "supercomputer report" that recommends Intel vs AMD and plenty of knowledgeable people on this list and elsewhere have selected Opterons for years now. The biggest problem with Opterons three years ago was that operating system/distro and compiler support sucked (in Linux, at least), not the chips. But as of FC2 and beyond (in linux distro terms) they most definitely did not suck, and nobody not stupid or greedy (and getting some sort of a goose from Intel, or suspicious of any non-Intel solutions for stupid reasons) would have written off Opterons from that point on. > > b) hardware moves fast. What is fast this year, is slow next year. A simple > indexation *previous* year what > was available that year and only use *that* to order a machine for > *next* year, is slow moving. > Companies in general are faster there. Vincent, companies are NEVER faster in the cluster world. They cannot afford to be, literally, for one thing (with the possible exception of e.g. IBM etc that MAKE the computers as well as build the clusters). Beowulf-style clusters were "invented", if that is the right term for a distributed collaborative event that spanned dozens of venues, between roughly 1989 and 1996, by a mix of University and government supported workers in the fields of computer science and physics (mostly). I personally credit Dongarra's PVM team with the invention, since before PVM commodity clusters pretty much rsh-based workstation compute farms (nothing to sneeze at, BTW, and I was right there using them in precisely that way). >From PVM on, clusters have ALWAYS been done fastest and best by governments and University's or computer companies drawing on the brains and soft technologies developed at government and University centers. In the process the successful design (clusters of independent COTSish systems interconnected with COTS IPC channels ie networks) has absolutely wiped out the "corporate" best effort of the day, which was almost invariably a big iron supercomputer design (even if that design was a cluster in disguise). Today only a handful of special purpose vector-type machines still survive AFAIK, and a whole lot of list time has been devoted to making clusters out of vector processors (e.g. DSPs) of one sort or another. Companies only copy(d) the successful models developed on this list (among other places, but a WHOLE LOT was done by participants on this list) when it was "safe" to do so, when the engineering was all cut and dried (and there were enough people trained to do it), and when in particular the cost/benefit was so painfully obvious that they had no choice. > c) if you learn what price gets paid for the hardware effectively, you will > fall off your chair. The so called public > costs that governments sometimes tell publicly simply aren't true what > they *effectively-all-in* pay. > > A few good guys excepted, but in general it's simply not their money, so they > don't care. > Sometimes there are good reasons for this. So it's not like that the persons > having the > above actions are bad guys. It's easier to get 20 million euro from goverment > sometimes than to get 1 million. Sigh. There is probably some truth to this, unfortunately, because of the nonlinear memetic-evolutionary advantage accrued to "status" and "visibility" in our collective/democratic world. We don't have any all-wise and all-knowing masters telling "us" the best things to fund, the best prices to pay, so the decisions get made by people on the basis of what they know and a collective decisioning process. So what DO they know? What they've heard of, what has made an impact, a splash. It is all about marketing. A secondary point is that the government often has an ulterior motive in funding big apparently pointless computers for grand-challenge type projects. It supports the cash flow and R&D of the companies that bring us each new generation of miracles. Some of the largesse spills over into much more humble programs. The large scale purchases push manufacturing margins out to where prices collectively lower. It keeps the countries to which those companies belong competitive on the global market. Some of this is decided fairly deliberately in HPC, just as it is in e.g. aerospace industries, defense industries -- "strategic resource" companies get big contracts to ensure that there continue to be strategic resource companies that can FULFILL big contracts, and that our country's companies can build bigger faster things than your country's companies. This is all geopolitick stuff. Both of these things of course appear in many other venues, not just computing. Government funded research projects, high energy physics, and so on -- this is known as "politics" and it makes the world go round and is a GOOD thing, as I for one would really, really not want the world to be run masters, not EVEN if they really were all-wise and all-knowing. > d) It is definitely the case that i do not see 'bad' persons on those spots, > it's just that the average government official knows nothing from > contracts. They have doctor and professor titles, > because one day they were good somewhere in some domain, and perhaps > even are today; they aren't on that > spot because they are good in contracts. Government jobs pay based upon > number of titles you've got, > they don't pay based upon how good you are in closing deals and closing > good contracts. They sit there because > they are good in working with commissions which are good for your > health; such commissions usually have nice meetings, > unlike companies where meetings are usually way harder as tough > decisions get forced there. > > Find me 1 burocrat on entire planet who is good in contracts and sits at > the right spot to decide what happens. Oh, you're just being cynical. Which is your god-given right, of course. However, there are good people in government, bad people in government, in between people... same as anywhere else, really. Politics consists of two people with different taste trying to decide what movie to watch together, for God's sake, and scales up from there. It beats resolving the issue with clubs. > I saw this in politics even more than in HPC. Those sectors > go about a lot more money than several tens of millions. Highend > supercomputing > really is about little money if you compare it with other infrastructural > projects. In telecommunication > i remember a meeting i had with a manager of a telecommunication company who > nearly wanted to > beat me up when en passant i remarked to him that such a wireless station of > around 60 watt i would never > want in my garden, as 60 watt straight through my head i do not consider as a > healthy way of life. > Even if they would offer me 20k euro a year for it, like they do to farmers > for example. Sigh. 60 watts is the power given off by a -- wait for it -- 60 watt light bulb. Your head (brain) consumes roughly 1/3 of the calories/watts your body burns -- call it 30 watts total (out of a fairly typical 100W total output from a human body). A 60 watt light bulb doesn't exactly keep you warm on a cold winter day from a single meter away. Putting your head right on it, sure -- get 20 watts hitting it from a few centimeters away and it will get warm, maybe dangerously warm if you sit in a beam focus and hold still. This is the shorthand reason that this is silly. If you want the long version, I have to talk about skin-depth, frequencies, and so on. The long and short of it is that the power differential associated with that 60W transmitter some meters away inside the tissue of your skin, let alone your brain, is SO LOW compared to what it experiences from the SUN while WORKING in your garden that you might as well decide to avoid all sources of power (including your computers) for the rest of your life as it might damage your health being warm. Now Mr. Sun is, as a matter of fact, dangerous as all hell. Expose your skin to HIM and you'll get radiation poisoning in short order. That's because he gives off dangerous amounts of UV, not microwaves, and UV has a short enough wavelength that it can affect things in a quantum way instead of by gross absorption in the form of heat. Directional microwaves are not without their dangers as well. 60W can cook a hot dog if they've got noplace else to go until they are absorbed. But omnidirectional or dipole-pattern 60W from tens of meters away just isn't an issue. (Which reminds me of the equally silly claim that living near a HV power line caused cancer, in spite of computations that showed that at MOST milliwatts of very low frequency radiation were being emitted. Maybe from ozone or something but damn sure not from EM radiation). > Compared to that HPC is near holy. It is not bad for health. > It's good for health in fact as it avoids for example atomic test blasts and > at a very cheap price too. Oh, you mean that it enables every small country in the world to design implosion bombs without any need to build 25,000 pits and blow them up one at a time to develop an effective implosion design (as was done by the US in pre-HPC WWII)? Or design thermonuclear devices, enhanced radiation devices, and the like without first having to build an actual implosion trigger? So that if/when they DO manage to get a reactor going that can cook up some plutonium it is easy to build bombs "without tests"? Yeah, very healthy;-) Of course, if you can extract U-235 it doesn't matter since any idiot can build a bomb out of U-235. Take two subcritical masses, one in each hand, and slam them together as hard as you can. Boom. With a little effort, you can design one that will go off when you aren't actually there. > So in comparision to telecommunication and high voltage power and gas pipes > where you see sometimes > complete idiocy which is near to or equal to corruption and mixed interests; > i'm 100% sure this isn't the case in HPC. > > The talk i had with next director looks worse than his actions > are in fact. He's a good guy and really did do his best, but next > conversation is typical for government. > > We were standing at a certain big supercluster. 1 meter away from it. > > In the huge sound it produced he said: "we bought this giant machine for a > lot of money > and i was promised at the time that we could simply > upgrade it a few years later by putting in a new processor. > Now we have already take it out in production by 2006, > with by then completely outdated processors. Just upgrading those cpu's > would still make it a powerful machine. New dual core 700Mhz processors are > expected to > arrive by 2005 or even 2006 and will not be socket compatible. So we got > tricked and in > fact lost a lot of money by buying this machine as it can't get upgraded." > > Me: "But you signed a contract which takes care you can sue them now?" This is a common enough mistake, actually, but it isn't the mistake that you think it is. The real mistake is in thinking that you can EVER economically upgrade a computer system at the processor level. The answer is ALMOST always no as little as one year later and beyond. I made it myself around 1990 (purchasing a 2-processor SGI 220S thinking that we could upgrade it to more processors and memory), at their "bargain" price of some $20,000 per processor pair. When I learned, three years later, that my $5000 desktop workstation was faster than both processors put together (that had cost around $60,000 or thereabouts in a refrigerator-sized unit only three years before) it was a crash course in Moore's Law and its implications in purchasing and designing supercomputers -- 1993 was also when I started using PVM, through no real coincidence (I'd already gotten computations up to twenty-odd processors the rsh-hard way over 10B2 ethernet). We sold this machine when the annual software maintenance on its operating system cost more than a workstation that was twice as fast -- for $3000 less shipping, to somebody who needed it because they ran software that used its graphics engine. There are lots of ways to AVOID making this mistake, of course. Dealing with an honest vendor is one, as an honest vendor would never assert that ANY COTS could be economically upgraded at the processor level two to three years later. Hiring a consultant who isn't a complete idiot is another (there are so MANY people on this list who are competent in this regard). Hell, just asking the list and getting a free-as-in-you'd-owe-me-a-beer consult from me or any of a few dozen others on the list would have done it. Lacking somebody with this sort of experience or knowledge helping out, though, and using a somewhat sly vendor, yeah, well, you pays your dues one way or another to learn. But what, again, does this have to do with government vs corporate vs University? If it is a sin, well hey, I'm a sinner too -- they sound so >>convincing<< when they tell you it is upgradable, after all. And sometimes it isn't even a "lie" -- it is just a mistake of the vendor or their sales rep, who may have THOUGHT it would be upgradable, been TOLD it would be upgradeable. Things change, people get them wrong sometimes. Even experience isn't a totally safe guide -- perhaps there ARE systems out there that somebody has managed to upgrade at the processor level without gritting their teeth and ultimately wasting a lot of money. rgb > > [ear deafening silence] > > Best regards, > Vincent > > Director DiepSoft > www.diep3d.com > > p.s. certain people in HPC only seem to react when they feel attacked or > insulted > >>> They either give the job to a friend of them (because of some weird demand >>> that just 1 manufacturer can provide), >>> or they have an open bid and if you can bid with a network that's $800 a >>> port, then that bid is gonna get taken over >>> a bid that's $1500 a port. >> >> The key is to set the right requirements in your RFP. Naive RFPs would >> use broken benchmarks like HPCC. Smart RFPs would require benchmarking >> real application cores under reliability and performance constraints. >> >> It's not that "you get what you pay for", it's "you get what you ask for >> at the best price". >> >>> This where the network is one of the important choices to make for a >>> supercomputer. I'd argue nowadays, because >>> the cpu's get so fast compared to latencies over networks, it's THE most >>> important choice. >> >> In the vast majority of applications in production today, I would argue >> that it's not. Why ? Because only a subset of codes have enough >> communications to justify a 10x increase in network cost compared to >> basic Gigabit Ethernet. Your application is very fine grain, because it >> does not compute much, but chess is not representative of HPC workloads... >> >>> My government bought 600+ node network with infiniband and and and.... >>> dual P4 Xeons. >>> Incredible. >> >> again, you don't know the whole story: you don't know the deal they got >> on the chips, you don't know if their applications runs fast enough on >> Xeons, you don't know if they could not have the same support service on >> Opteron (Dell does not sell AMD for example). >> >> By the way, your gouvernment is also buying Myrinet ;-) >> http://www.hpcwire.com/hpc/644562.html >> >>> I believe personal in measuring at full system load. >> >> Ok, you want to buy a 1024 nodes cluster. How do you measure at full >> system load ? You ask to benchmark another 1024 nodes cluster ? You >> can't, no vendor has a such a cluster ready for evaluation. Even if they >> had one, things change so quickly in HPC, it would be obsolete very >> quickly from a sale point of view. >> >> The only way is to benchmark something smaller (256 nodes) and define >> performance requirements at 1024 nodes. If the winning bid does not >> match the acceptance criteria, you refuse the machine or you negociate a >> "punitive package". >> >>> The myri networks i ran on were not so good. When i asked the same big >>> blue guy the answer was: >>> "yes on paper it is good nah? However that's without the overhead that >>> you practical have from network >>> and other users". >> >> Which machine, which NICs, which software ? We have 4 generations of >> products with 2 different software interfaces, and it's all called >> "Myrinet". >> >> On *all* switched networks, there is a time when you share links with >> other communications, unless you are on the same crossbar. Some sites do >> care about process mapping (maximize job on same crossbar or same >> switch), some don't. From the IBM guy's comment, I guess he doesn't know >> better. >> >>> A network is just as good as its weakest link. With many users there is >>> always a user that hits that weak link. >> >> There is no "weak" link in modern network fabrics, but there is >> contention. Contention is hard to manage, but there is no real way >> around except having a full crossbar like the Earth Simulator. Clos >> topologies (Myrinet, IB) have contention, Torus topologies (Red Storm, >> Blue Gene) have contention, that's life. If you don't understand it, you >> will say the network is no good. >> >>> That said, i'm sure some highend Myri component will work fine too. >>> >>> This is the problem with *several* manufacturers basically. >>> They usually have 1 superior switch that's nearly unaffordable, or just >>> used for testers, >>> and in reality they deliver a different switch/router which sucks ass, to >>> say polite. >>> This said without accusing any manufacturer of it. >>> >>> But they do it all. >> >> Not often in HPC. The HPC market is so small and so low-volume, you >> cannot take the risk to alienate customers like that, they won't come >> back. If they don't come back, you run out of business. >> >> Furthermore, the customer accepts delivery of a machine on-site, testing >> the real thing. If it does not match the RFP requirements, they can >> refuse it and someone will lose a lot of money. It has happened many >> times. It's not like the IT business when you buy something based on >> third-party reviews and/or on specsheet. Some do that in HPC, and they >> get what they deserve, but believe me, most don't. >> >> Patrick >> >> > > _______________________________________________ > Beowulf mailing list, Beowulf at beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu _______________________________________________ Beowulf mailing list, Beowulf at beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
- Previous message: [Beowulf] running out of rsh ports
- Next message: [Beowulf] NFSv3 client hangs - tcp v/s udp.
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list