From hahn at mcmaster.ca Sun Apr 1 11:22:33 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> Message-ID: > Electromagnetics Research Symposium" in Verona. There appears > to be a considerable buzz now around FDTD calculations on GPUs. the very latest gen GPUs (G80 and as yet unreleased R600) make very interesting coprocessors for vector-ish calculations which can be expressed using integer or single-precision operations. > Has anyone any experience of this? How do these products stack > up against the traditional Beowulf solution? they _are_ in the spirit of Beowulf, which is all about hacking commodity hardware to suit HPC purposes. > We are planning to buy a new Beowulf in the next few months. I'm > wondering whether I should set aside some funds for GPU instead > of CPU... as with any purchase, you need to figure out what your workload needs, and how you can feed it. GPGPU requires substantial custom programming effort - there is no standardized interface (like MPI) to do it. GPGPU makes a lot of sense where you have a research project which: - has some large amount of high-level programming resources (say, a top grad student for at least 6 months). - is going to be seriously limited on normal hardware (ie, runs will take 2 years each). - has some promise of running well on GPU hardware (very SIMD, needs to fit into limited memories, integer or 32b float, etc) the speedup from a GPU is around an order of magnitude (big hand wave here). the main drawback is that effort is probably not portable to other configs, probably not to the future, and is probably in conflict with development of portable/scalable approaches (say f90/MPI). really, this issue is quite similar to the tradeoff in pursuing FPGA acceleration. in short, I think the opportunity for GPU is great if you have a pressing need which cannot be practically satisfied using the conventional approach, and you're able to dedicate an intense burst of effort at porting to the GPU. as far as I know, there are not any well-developed libraries which simply harness whatever GPU you provide, but don't require your whole program to be GPU-ized. the cost of sharing data with a GPU is significant, but blas-3 might have a high enough work-to-size ratio to make it feasible. 3d fft's might also be expressible in GPU-friendly terms (the trick would be to utilize not fight the GPU's inherent memory-access preferences.) perhaps some MCMC stuff might be SIMD-able? I doubt that sequence analysis would make much sense, since GPUs are not well-tuned to access host memory, and sequence programs are not actually that compute-intensive. I'd guess that anything involving sparse matrices would be difficult to do on a GPU. my organization will probably build a GPU-oriented cluster soon; I'm pushing for it, but I'm fearful that we might not have users who are prepared to invest the intense effort necessary to take advantage of it. I have some suspicion also that when Intel and AMD talk about greater integration between CPU and GPU, they're headed in the direction of majorly extended SSE, rather than something which still has parts called shader, vertex or texture. From hahn at mcmaster.ca Sun Apr 1 11:25:19 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <8e6393ac0703302312v1840c22fw60efa957e0365a40@mail.gmail.com> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0703302312v1840c22fw60efa957e0365a40@mail.gmail.com> Message-ID: > If you want to use GPUs for computations, I suggest that you take a look at > CUDA > (http://www.nvidia.com/cuda). The SDK is available for free and it is > using a C like syntax (so you don't need to write shader and be > familiar with OpenGL or DX9 ). there's ATI/AMD's CTM effort as well, as well as several independent ones. www.gpgpu.org is a great resource to start with. From hahn at mcmaster.ca Sun Apr 1 13:07:07 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> Message-ID: > CUDA comes with a full BLAS and FFT library (for 1D,2D and 3D transforms). I read the CUDA doc, but I guess I was focusing on the language itself. > You can have relevant speed up even for 2D transforms or for a batch of 1Ds. I assume this is only single-precision, and I would guess that for numerical stability, you must be limited to fairly short fft's. what kind of peak flops do you see? what's the overhead of shoving data onto the GPU, and getting it back? (or am I wrong that the GPU cannot do an FFT in main (host) memory? > You can offload only compute intendive parts of your code to the GPU > from C and C++ ( writing a wrapper from Fortran should be trivial). sure, but what's the cost (in time and CPU overhead) to moving data around like this? > The current generation of the hardware supports only single precision, > but there will be a double precision version towards the end of the > year. do you mean synthetic doubles? I'm guessing that the hardware isn't going to gain the much wider multipliers necessary to support doubles at the same latency as singles... > PS: I work on CUDA at Nvidia, so I may be a little biased... I did guess from the nvidia-limited nature of your reply, but thanks for confirming it. >> as far as I know, there are not any well-developed libraries which simply by "well-developed", I did also mean "runs on any GPU or at least not a single vendor"... From laytonjb at charter.net Sun Apr 1 13:15:29 2007 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> Message-ID: <461012E1.8020006@charter.net> Mark Hahn wrote: >> The current generation of the hardware supports only single precision, >> but there will be a double precision version towards the end of the >> year. > > do you mean synthetic doubles? I'm guessing that the hardware isn't > going to gain the much wider multipliers necessary to support doubles > at the same latency as singles... The next gen of hardware will support native double precision (AFAIK). I've heard it should be out this year, but I'm sure Mass can't comment on it. Jeff From hahn at mcmaster.ca Sun Apr 1 14:17:20 2007 From: hahn at mcmaster.ca (Mark Hahn) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <461012E1.8020006@charter.net> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> <461012E1.8020006@charter.net> Message-ID: > The next gen of hardware will support native double precision (AFAIK). my point is that there's native and there's native. if the HW supports doubles, but they take 8x as long, then there's still a huge reason to make sure the program uses only low-precision. and 8x (WAG, of course) may actually be enough so that a 4-core, full-rate SSE CPU to beats it... From deadline at clustermonkey.net Sun Apr 1 14:39:48 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> Message-ID: <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> I just posted some interesting news on Cluster Monkey. http://www.clustermonkey.net//content/view/192/1/ -- Doug From landman at scalableinformatics.com Sun Apr 1 18:50:57 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> Message-ID: <46106181.40401@scalableinformatics.com> Mark Hahn wrote: >> Has anyone any experience of this? How do these products stack >> up against the traditional Beowulf solution? > > they _are_ in the spirit of Beowulf, which is all about hacking > commodity hardware to suit HPC purposes. ... but it would be hard to fit into a 1U case, and the 200+ watt power requirements could be daunting to smaller supplies. Not that I am against GPUs as accelerators, on the contrary. Just be aware that GPUs with significant calculation capability also will require a rather significant power supply and cooling airflow. Right now, accelerated computing is in its infancy. You have host based (SSE*), and host attached GPUs, FPGAs, APUs (Accelerator Processor Units) in general such as ClearSpeed et al. You can think of clusters as accelerators in the sense that they provide a larger number of cycles per unit time to your application. There are no single APIs to bind them all though. A number of the APU people are realizing that there is significant benefit to providing acceleration behind existing popular interfaces, as it lowers the barrier to adoption and usage. If your code is designed with FFTW in mind and you have to re-organize your arrays to suit another FFT implementation, this can be both annoying for the programmer, and inefficient. Regardless of the accelerator you choose, expect *some* rewriting of code at minimum. Current GPUs are focused upon singles and ints. As Jeff noted, doubles should be coming. As Mark noted, slow doubles aren't useful. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Sun Apr 1 21:43:13 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] Which is better; 4 Mountains or 5 Hills In-Reply-To: <428810f20703310104p42d8fc81nfd77b2a1de0601@mail.gmail.com> References: <428810f20703310104p42d8fc81nfd77b2a1de0601@mail.gmail.com> Message-ID: On Sat, 31 Mar 2007, amjad ali wrote: > Hi All, > > Which one of the following two choices are better (assuming both > clusters have nearly same cost): > > 1) A 4-node cluster having 8 AMD Opteron of 2.6GHz each and 2GB RAM/node > > 1) A 5-node cluster having 10 AMD Opteron of 2.2GHz each and 2GB RAM/node What is the meaning of this word "better"? Better for what? A) 4x2.6 = 10.4 aggregate GHz (whatever that means) Can do a distributed calculation that fits into 8 GB RAM Fewer faster nodes means less Amdahlian penalty for parallel code. Probably uses less power, has slightly lower aggregate maintenace costs B) 5x2.2 = 11 aggregate GHz Can do a distributed calculation that fits into 10 GB RAM More slower nodes means a bit more flexibility and less penalty if a node is down Probably uses more power, has higher costs. Which is "better"? B) has more aggregate GHz (and presumably FLOPS by at least some measures) so it is better. It can also hold a larger partitioned problem, which for some people makes it MUCH better, and for others just wastes money. A) might well run faster for some kinds of parallel code, and it probably costs less in the long run, once (expensive) human time and infrastructure costs are accounted for. > Consider GiGE (max 1000mbps) as the interconnect. > Is there any significant performance difference between the two? Sure, maybe, for some code. For example, try running a 9 GB partitioned computation on A and oooo, slow (as it swaps to run at all). Try running a computation with lots of little messages, but there are just plain fewer of them between four nodes than for five, and A might shine. > Does the compasison depends on the MPI communication among nodes, for > a given application? It depends on EVERYTHING. The task mix, how the task(s) are executed, whether, how, and how much they intercommunicate during execution, how big they are (or might be) and whether four boxes plugged into your only available circuit doesn't blow the breaker where five does. rgb > > Thanking all in anticipation. > Regards. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Sun Apr 1 23:24:22 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] Server room design consulting In-Reply-To: <1175104216.28106.236.camel@rahl.acomp.usf.edu> References: <1175104216.28106.236.camel@rahl.acomp.usf.edu> Message-ID: On Wed, 28 Mar 2007, Daniel Majchrzak wrote: > We have a dedicated cluster room, email server room, and networking room > that have slowly evolved over the years. Due to budget constraints in > the past no one has ever done an analysis of our electricity and AC. > (We've had the facilities people in, but their analysis wasn't any > better than our own guestimates. ) When increases in either power or > cooling were necessary it was either done piecemeal or not done at all. > (We have had some orders to come up with some "zero-dollar" solutions). > Now all three rooms are about to go through some equipment expansions. > While we can make some rough estimates, and we could go to the > university's facilities people, we thought we would try to get some > funds together to hire some professional services so that it gets done > right. Has any one on the list had any experiences they would like to > share on hiring these kind of consultants. Referrals? Can anyone give > a (very) rough estimate of what we should expect to pay? My own experience with power and AC people from normal contracting firms is that they are immensely clueless about computer server room infrastructure. I haven't ever hired a consultant, but I do consulting on this from time to time. I'd guess that you'll pay perhaps $150 to $200/hour for consulting, but it might well depend on what kind of consultant you got and where you got them. Again, based on my experience with area computer management contracting houses, they are not really competent with infrastructure issues. A really good electrical contractor MIGHT have somebody. Liebert or APC might be able to refer you. An architect with experience in server room design might be your best bet. rgb > > Thanks, > > Dan > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From ballen at gravity.phys.uwm.edu Mon Apr 2 01:47:25 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: Hmmm is the April 1 date purely a coincidence?? On Sun, 1 Apr 2007, Douglas Eadline wrote: > > I just posted some interesting news on Cluster Monkey. > > http://www.clustermonkey.net//content/view/192/1/ > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From rbw at ahpcrc.org Mon Apr 2 06:35:14 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0703302312v1840c22fw60efa957e0365a40@mail.gmail.com> Message-ID: <46110692.2000007@ahpcrc.org> Mark Hahn wrote: >> If you want to use GPUs for computations, I suggest that you take a >> look at CUDA >> (http://www.nvidia.com/cuda). The SDK is available for free and it is >> using a C like syntax (so you don't need to write shader and be >> familiar with OpenGL or DX9 ). > there's ATI/AMD's CTM effort as well, as well as several independent > ones. > www.gpgpu.org is a great resource to start with. This connects back to an earlier posting of mine which drew a "dead cat bounce" for a response ... ;-) ... , but you should also definitely look at both the offerings of PeakStream and RapidMind. PeakStream (like CUDA) provides libraries and a development environment (their current focus is GPUs), but abstract the idea of co-processing one step further to a virtual machine (Mitrion-C does the same for FPGAs) connected to a master serial processor. Any additional co-processing resource (any of several flavors of GPUs, a CELL SPE, or non-rank zero,homogenous multi-cores, etc.) can provide the horse power for the data parallel accelerations. They claim that once a particular backend for the VM is available you will be able run your code without a recompile on it or any other supported backend. I think the RapidMind product is similar. Peakstream has a nice white paper on their web site and the main Rapid Mind paper is: Data-parallel Programming on the Cell BE and the GPU using RapidMind Developoment Platform by Mike McCool. The jist of my earlier posting was "Does such a data-parallel VM abstraction have a future in an HPC world of heterogeneous on and off-chip co-processors?" Its presence as part of the Mitrion-C and PeakStream programming models suggests someone with money believes as much. rbw > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- -- Richard B. Walsh Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 > > "Making predictions is hard, especially about the future." > > Nils Bohr ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From rbw at ahpcrc.org Mon Apr 2 06:45:55 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> <461012E1.8020006@charter.net> Message-ID: <46110913.2040007@ahpcrc.org> Mark Hahn wrote: >> The next gen of hardware will support native double precision (AFAIK). > > my point is that there's native and there's native. if the HW supports > doubles, but they take 8x as long, then there's still a huge reason to > make sure the program uses only low-precision. and 8x (WAG, of course) > may actually be enough so that a 4-core, full-rate SSE CPU to beats it I would be surprised if they "faked" double precision is this way. GPUs are the widest thing you can get in a processor. My WAG is that they will provide true/fast 64-bit (minus the same IEEE 754 twiddles) by coalescing 32-bit ... reducing the floating point width of a given core by half, but still delivery lots of FLOPs. Especially with the G80, it makes to think of these GPUs and multi-core SIMD processors. rbw > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- -- Richard B. Walsh Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 > > "Making predictions is hard, especially about the future." > > Nils Bohr ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From amjad11 at gmail.com Sun Apr 1 02:02:53 2007 From: amjad11 at gmail.com (amjad ali) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! Message-ID: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> Hi All, Would any of you please like to share usage-experience/views/comments about Windows Compute Cluster Server 2003 based Beowulf Clusters? What in your opinion is the future of such clusters? How you compare these with the LINUX CLUSTERS? regards. From jlforrest at berkeley.edu Sun Apr 1 09:58:49 2007 From: jlforrest at berkeley.edu (Jon Forrest) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors? In-Reply-To: <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> References: <460AFFA1.6070103@berkeley.edu> <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> Message-ID: <460FE4C9.6040204@berkeley.edu> Douglas Eadline wrote: > > I am constantly amazed at how many people buy the > latest and greatest node hardware and then connect > them with a sub-optimal switch (or cheap cables), thus reducing > the effective performance of the nodes (for parallel > applications). Kind "penny wise and pound foolish" as they say. > I sincerely appreciate all the comments about my problem. I will reply to them in due time. However, I'd like to comment on this, which admittedly is off-topic from my original posting. I don't disagree with what you're saying. The problem is how to recognize "sub-optimal" equipment. For example, I see three tiers in ethernet switching hardware: 1) The low-end, e.g. Netgear, Linksys, D-link, ... 2) The mid-end, e.g. HP Procurve, Dell, SMC, ... 3) The high-end, e.g. Cisco, Foundry, ... What I, as a system manager, not as an Electrical Engineer, have trouble understanding, is what the true differences are between these levels, and, at one level, between the various vendors. These days I suspect that many of the vendors are using ASICs made by other chip companies, and the many vendors use the same ASICs. Assuming that's true, where's the added value that justifies the cost differences? Sometimes the value is in the "management" abilities of a device. I don't deny this can be a major selling point in a large enterprise environment, but in a 30-node cluster, or a small LAN, it's hard to justify paying for this. In terms of ethernet performance, once a device can handle wirespeed communication on all ports, where's the added value that justifies the added cost? I'm looking for empirical answers, which aren't always easy to find, and sometimes to understand. In the case of my cluser, it was configured and purchased before I got here, so I had nothing to do with choosing its components but I have to admit that I'm not sure what I would have done differently. Cordially, Jon Forrest Unix Computing Support College of Chemistry 173 Tan Hall University of California Berkeley Berkeley, CA 94720-1460 510-643-1032 jlforrest@berkeley.edu From mfatica at gmail.com Sun Apr 1 12:30:32 2007 From: mfatica at gmail.com (Massimiliano Fatica) Date: Tue Oct 7 01:13:44 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> Message-ID: <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> Mark, CUDA comes with a full BLAS and FFT library (for 1D,2D and 3D transforms). You can have relevant speed up even for 2D transforms or for a batch of 1Ds. You can offload only compute intendive parts of your code to the GPU from C and C++ ( writing a wrapper from Fortran should be trivial). The current generation of the hardware supports only single precision, but there will be a double precision version towards the end of the year. Massimiliano PS: I work on CUDA at Nvidia, so I may be a little biased... On 4/1/07, Mark Hahn wrote: > as far as I know, there are not any well-developed libraries which simply > harness whatever GPU you provide, but don't require your whole program to > be GPU-ized. the cost of sharing data with a GPU is significant, but > blas-3 might have a high enough work-to-size ratio to make it feasible. > 3d fft's might also be expressible in GPU-friendly terms (the trick would > be to utilize not fight the GPU's inherent memory-access preferences.) > perhaps some MCMC stuff might be SIMD-able? I doubt that sequence analysis > would make much sense, since GPUs are not well-tuned to access host memory, > and sequence programs are not actually that compute-intensive. I'd guess > that anything involving sparse matrices would be difficult to do on a GPU. From mfatica at gmail.com Sun Apr 1 17:53:52 2007 From: mfatica at gmail.com (Massimiliano Fatica) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> Message-ID: <8e6393ac0704011753l28e8727drb944bc1734cb7dc2@mail.gmail.com> On 4/1/07, Mark Hahn wrote: > > I assume this is only single-precision, and I would guess that for > numerical stability, you must be limited to fairly short fft's. > what kind of peak flops do you see? what's the overhead of shoving > data onto the GPU, and getting it back? (or am I wrong that the GPU > cannot do an FFT in main (host) memory? I will run some benchmark in the next days ( I usually do more than just an FFT). I remember some numbers for SGEMM (real SGEMM C=alphaA*B+beta*C), 120 Gflops on board, 80 Gflops measured from the host (with all the I/O overhead) , for N=2048. > > > You can offload only compute intendive parts of your code to the GPU > > from C and C++ ( writing a wrapper from Fortran should be trivial). > > sure, but what's the cost (in time and CPU overhead) to moving data > around like this? It depends on your chipset and from other details ( cold access, data in cache, pinned memory): it goes from around 1GB/s to 3GB/s. > > > The current generation of the hardware supports only single precision, > > but there will be a double precision version towards the end of the > > year. > > do you mean synthetic doubles? I'm guessing that the hardware isn't > going to gain the much wider multipliers necessary to support doubles > at the same latency as singles... > Can't comment on this one..... :-) Massimiliano From weikuan.yu at gmail.com Mon Apr 2 06:44:38 2007 From: weikuan.yu at gmail.com (Weikuan Yu) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] HotI 2007 Call for Papers -- Deadline (April 9) is approaching Message-ID: <461108C6.5050701@gmail.com> -------------------------------------------------------------------- Apologies if you received multiple copies of this posting. Please feel free to distribute it to those who might be interested. -------------------------------------------------------------------- Hot Interconnects 15 IEEE Symposium on High-Performance Interconnects August 22-24, 2007 Stanford University Palo Alto, California, USA Hot Interconnects is the premier international forum for researchers and developers of state-of-the-art hardware and software architectures and implementations for interconnection networks of all scales, ranging from on-chip processor-memory interconnects to wide-area networks. This yearly conference is very well attended by leaders in industry and academia. The atmosphere provides for a wealth of opportunities to interact with individuals at the forefront of this field. Themes include cross-cutting issues spanning computer systems, networking technologies, and communication protocols. This conference is directed particularly at new and exciting technology and product innovations in these areas. Contributions should focus on real experimental systems, prototypes, or leading-edge products and their performance evaluation. In addition to those subscribing to the main theme of the conference, contributions are also solicited in the topics listed below. * Novel and innovative interconnect architectures * Multi-core processor interconnects * System-on-Chip Interconnects * Advanced chip-to-chip communication technologies * Optical interconnects * Protocol and interfaces for interprocessor communication * Survivability and fault-tolerance of interconnects * High-speed packet processing engines and network processors * System and storage area network architectures and protocols * High-performance host-network interface architectures * High-bandwidth and low-latency I/O * Tb/s switching and routing technologies * Innovative architectures for supporting collective communication * Novel communication architectures to support grid computing Submission Guideline o Extended deadline: April 9th, 2007 o Notification of acceptance: May 15, 2007 o Papers need sufficient technical detail to judge quality and suitability for presentation. o Submit title, author, abstract, and full paper (six pages, double-column, IEEE format). o Papers should be submitted electronically at the specified link location found on http://www.hoti.org o For further information please see http://www.hoti.org/hoti15/cfp.html About the Conference - Conference held at the William Hewlett Teaching Center at Stanford University. - Papers selected will be published in proceedings by the IEEE Computer Society. - Presentations are 30-minute talks in a single-track format. - Online information at http://www.hoti.org GENERAL CO-CHAIRS * John W. Lockwood, Washington University in St. Louis * Fabrizio Petrini, Pacific Northwest National Laboratory TECHNICAL CO-CHAIRS * Ron Brightwell, Sandia National Laboratories * Dhabaleswar (DK) Panda, The Ohio State University LOCAL ARRANGEMENTS CHAIR * Songkrant Muneenaem, Washington University in St. Louis PANEL CHAIR * Daniel Pitt, Santa Clara University PUBLICITY CO-CHAIRS * Weikuan Yu, Oak Ridge National Laboratory PUBLICATION CHAIR * Luca Valcarenghi, Scuola Superiore Sant'Anna FINANCE CHAIR * Herzel Ashkenazi, Xilinx TUTORIAL CO-CHAIRS - TBA REGISTRATION CHAIR * Songkrant Muneenaem, Washington University in St. Louis Webmaster * Liz Rogers, LRD Group Steering Committee o Allen Baum, Intel o Lily Jow, Hewlett Packard o Mark Laubach, Broadband Physics o John Lockwood, Stanford University o Daniel Pitt, Santa Clara University Technical Program Committee * Dennis Abts Cray, Inc. * Adnan Aziz University of Texas, Austin * Alan Benner IBM * Keren Bergman Columbia University * Andrea Bianco Politecnico di Torino * Piero Castoldi Scuola Superiore Sant'Anna * Sarang Dharmapurikar Nuova Systems * Hans Eberle Sun Microsystems Laboratories * Wu-chun Feng Virginia Tech * Juan Fernandez University of Murcia * Ada Gavrilovska Georgia Institute of Technology * Paolo Giaccone Politecnico di Torino * Mitchell Gusat IBM Zurich Research Laboratory * Ron Ho Sun Microsystems Laboratories * Doan Hoang University of Technology, Sydney * D. N. (Jay) Jayasimha Intel * Isaac Keslassy Technion * Venkata Krishnan Dolphin Interconnect Solutions * Tal Lavian Nortel Networks Labs, UC Berkeley * Bill Lin University of California, San Diego * Olav Lysne Simula Research Laboratory * Pankaj Mehra HP Labs * Rami Melhem University of Pittsburgh * Cyriel Minkenberg IBM Zurich Research Laboratory * Gregory Pfister IBM * Craig Stunkel IBM T.J. Watson Research Center * Anujan Varma University of California at Santa Cruz * Zuoguo (Joe) Wu Intel From gerry.creager at tamu.edu Mon Apr 2 07:04:42 2007 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <46110913.2040007@ahpcrc.org> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> <461012E1.8020006@charter.net> <46110913.2040007@ahpcrc.org> Message-ID: <46110D7A.9090506@tamu.edu> Richard Walsh wrote: > Mark Hahn wrote: >>> The next gen of hardware will support native double precision (AFAIK). >> my point is that there's native and there's native. if the HW supports >> doubles, but they take 8x as long, then there's still a huge reason to >> make sure the program uses only low-precision. and 8x (WAG, of course) >> may actually be enough so that a 4-core, full-rate SSE CPU to beats it > I would be surprised if they "faked" double precision is this way. GPUs > are the widest thing > you can get in a processor. My WAG is that they will provide true/fast > 64-bit (minus the same > IEEE 754 twiddles) by coalescing 32-bit ... reducing the floating point > width of a given > core by half, but still delivery lots of FLOPs. Especially with the > G80, it makes to think of these > GPUs and multi-core SIMD processors. In discussions w/ Mike McCool of PeakStream at SC06, I think Mark is correct. At this time, I believe they're stiull faking DP. Look for hardware enhancements 3-4Q this calendar year. Gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From maraz at ics.forth.gr Mon Apr 2 07:04:25 2007 From: maraz at ics.forth.gr (Manolis Marazakis) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? Message-ID: <46110D69.608@ics.forth.gr> Hmmm ... Perhaps Geyser? technology would tie nicely with the recent discovery of CPU-clock relativistic effects ? http://www.reghardware.co.uk/2007/04/01/cpu_time_dilation/ /* This link appeared in slashdot ... */ Best regards, Manolis Marazakis. -- Computer Architecture and VLSI Laboratory, Institute of Computer Science, Foundation for Research and Technology - Hellas (FORTH), Vassilika Vouton, Heraklion, Greece GR-71110. Tel: +30.2810391438, +30.2810391699 Fax: +30.2810391601 E-mail: maraz@ics.forth.gr From rbw at ahpcrc.org Mon Apr 2 07:10:15 2007 From: rbw at ahpcrc.org (Richard Walsh) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] OT? GPU accelerators for finite difference time domain In-Reply-To: <46110D7A.9090506@tamu.edu> References: <1175147950.5777.13.camel@ceiriog.eclipse.co.uk> <8e6393ac0704011230o50e4071enffe5986e85e497bd@mail.gmail.com> <461012E1.8020006@charter.net> <46110913.2040007@ahpcrc.org> <46110D7A.9090506@tamu.edu> Message-ID: <46110EC7.9060708@ahpcrc.org> Gerry Creager wrote: > > Richard Walsh wrote: >> Mark Hahn wrote: >>>> The next gen of hardware will support native double precision (AFAIK). >>> my point is that there's native and there's native. if the HW supports >>> doubles, but they take 8x as long, then there's still a huge reason to >>> make sure the program uses only low-precision. and 8x (WAG, of course) >>> may actually be enough so that a 4-core, full-rate SSE CPU to beats it >> I would be surprised if they "faked" double precision is this way. GPUs >> are the widest thing >> you can get in a processor. My WAG is that they will provide true/fast >> 64-bit (minus the same >> IEEE 754 twiddles) by coalescing 32-bit ... reducing the floating point >> width of a given >> core by half, but still delivery lots of FLOPs. Especially with the >> G80, it makes to think of these >> GPUs and multi-core SIMD processors. > > In discussions w/ Mike McCool of PeakStream at SC06, I think Mark is > correct. At this time, I believe they're stiull faking DP. Look for > hardware enhancements 3-4Q this calendar year. Sorry ... Did not mean to suggest that true 64-bit was available in hardware from anyone >>today<<, only that when it comes (later this year) it will not be "faked". The graphics folks need it for better dynamic range in contrast and brightness I think. rbw -- -- Richard B. Walsh Project Manager Network Computing Services, Inc. Army High Performance Computing Research Center (AHPCRC) rbw@ahpcrc.org | 612.337.3467 > > "Making predictions is hard, especially about the future." > > Nils Bohr ----------------------------------------------------------------------- This message (including any attachments) may contain proprietary or privileged information, the use and disclosure of which is legally restricted. If you have received this message in error please notify the sender by reply message, do not otherwise distribute it, and delete this message, with all of its contents, from your files. ----------------------------------------------------------------------- From deadline at clustermonkey.net Mon Apr 2 07:34:44 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors? In-Reply-To: <460FE4C9.6040204@berkeley.edu> References: <460AFFA1.6070103@berkeley.edu> <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> <460FE4C9.6040204@berkeley.edu> Message-ID: <40810.192.168.1.1.1175524484.squirrel@mail.eadline.org> Jon, I hear your frustration. You are quite right that many of the ASICs are the same. Implementation is important. In terms of clusters, there are no hard and fast rules for switches. i.e. I have fund some cheap GigE switches (like the SMC 8508T) to be real performer (8 ports/Jumbo Frames for under $100) I just got an SMC CGS16 to use in my test rack. So I am a little partial to SMC at this point. However, I have not tested the CGS16 fully so it may not live up to my expectations. In the past I have found Foundry and Extreme to work quite well, and give the price you pay they should. I think the trick is to find the bargains that still perform well. As I see it, there three ways to buy a good switch: 1. Hire a consultant to help with the cluster (their experience can save money and head-aches on other issues as well) 2. Use Google and this list to see what you can find about a particular switch, but be warned most people do not push switches the way HPC users do so what is good for the back office may not be good for the cluster. (and pretty much ignore vendor data sheets) 3. Get some evaluation switches (or a least test them within the 30-day return period) for specific applications you plan to run. This is probably the best way to proceed. Unfortunately there does not seem to be an easy way to really test a switch. The easiest thing to do is to run netpipe on two ports to establish a baseline. Choose the switches that provides the best netpipe results. Then run netpipe on ports at the same time and see if there is degradation. This however is not the whole story, some performance may depend on port choice (i.e. ports may span multiple ASICs) and performance may vary. Also, to full test a switch I would assume that you would want to test every port combination while the other ports were at some constant network load. So you can probably see why it is hard to test switches. In any case, these treads on the list should help as well (quite informative): http://www.beowulf.org/archive/2006-March/015282.html http://www.beowulf.org/archive/2006-April/015295.html http://www.beowulf.org/archive/2006-April/015340.html Finally, I am open to any one who can come up with a reasonably good switch test, maybe combination of applications and synthetic tests so that we can at least eliminate the poor performers. I would like to post this kind of data on ClusterMonkey. -- Doug > Douglas Eadline wrote: > >> >> I am constantly amazed at how many people buy the >> latest and greatest node hardware and then connect >> them with a sub-optimal switch (or cheap cables), thus reducing >> the effective performance of the nodes (for parallel >> applications). Kind "penny wise and pound foolish" as they say. >> > > I sincerely appreciate all the comments about my problem. I will reply > to them in due time. However, I'd like to comment on this, which > admittedly is off-topic from my original posting. > > I don't disagree with what you're saying. The problem is how > to recognize "sub-optimal" equipment. For example, I see > three tiers in ethernet switching hardware: > > 1) The low-end, e.g. Netgear, Linksys, D-link, ... > > 2) The mid-end, e.g. HP Procurve, Dell, SMC, ... > > 3) The high-end, e.g. Cisco, Foundry, ... > > What I, as a system manager, not as an Electrical Engineer, > have trouble understanding, is what the true differences > are between these levels, and, at one level, between > the various vendors. > > These days I suspect that many of the vendors are using > ASICs made by other chip companies, and the many vendors > use the same ASICs. Assuming that's true, where's the > added value that justifies the cost differences? Sometimes > the value is in the "management" abilities of a device. > I don't deny this can be a major selling point in a > large enterprise environment, but in a 30-node cluster, > or a small LAN, it's hard to justify paying for this. > > In terms of ethernet performance, once a device > can handle wirespeed communication on all ports, > where's the added value that justifies the added > cost? I'm looking for empirical answers, which > aren't always easy to find, and sometimes to understand. > > In the case of my cluser, it was configured and purchased > before I got here, so I had nothing to do with choosing > its components but I have to admit that I'm not > sure what I would have done differently. > > Cordially, > > Jon Forrest > Unix Computing Support > College of Chemistry > 173 Tan Hall > University of California Berkeley > Berkeley, CA > 94720-1460 > 510-643-1032 > jlforrest@berkeley.edu > > !DSPAM:460fe4d2220232889862676! > -- Doug From gerry.creager at tamu.edu Mon Apr 2 08:15:31 2007 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Server room design consulting In-Reply-To: References: <1175104216.28106.236.camel@rahl.acomp.usf.edu> Message-ID: <46111E13.1020207@tamu.edu> At least in my area, Liebert does have some competent folks who could help. Tell their salesman to get one on them involved, and in general, don't take the Liebert salesman's word that he's necessarily competent. I've run across exceptions, but the last time we got burned, it was pretty bad (and, no, I wasn't involved in trusting the salesman, but that's a whole 'nother story). gerry Robert G. Brown wrote: > On Wed, 28 Mar 2007, Daniel Majchrzak wrote: > >> We have a dedicated cluster room, email server room, and networking room >> that have slowly evolved over the years. Due to budget constraints in >> the past no one has ever done an analysis of our electricity and AC. >> (We've had the facilities people in, but their analysis wasn't any >> better than our own guestimates. ) When increases in either power or >> cooling were necessary it was either done piecemeal or not done at all. >> (We have had some orders to come up with some "zero-dollar" solutions). >> Now all three rooms are about to go through some equipment expansions. >> While we can make some rough estimates, and we could go to the >> university's facilities people, we thought we would try to get some >> funds together to hire some professional services so that it gets done >> right. Has any one on the list had any experiences they would like to >> share on hiring these kind of consultants. Referrals? Can anyone give >> a (very) rough estimate of what we should expect to pay? > > My own experience with power and AC people from normal contracting firms > is that they are immensely clueless about computer server room > infrastructure. I haven't ever hired a consultant, but I do consulting > on this from time to time. I'd guess that you'll pay perhaps $150 to > $200/hour for consulting, but it might well depend on what kind of > consultant you got and where you got them. Again, based on my > experience with area computer management contracting houses, they are > not really competent with infrastructure issues. A really good > electrical contractor MIGHT have somebody. Liebert or APC might be able > to refer you. An architect with experience in server room design might > be your best bet. > > rgb > >> >> Thanks, >> >> Dan >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.458.4020 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From rgb at phy.duke.edu Mon Apr 2 08:39:36 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors? In-Reply-To: <40810.192.168.1.1.1175524484.squirrel@mail.eadline.org> References: <460AFFA1.6070103@berkeley.edu> <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> <460FE4C9.6040204@berkeley.edu> <40810.192.168.1.1.1175524484.squirrel@mail.eadline.org> Message-ID: To jump in late, let me reiterate that before jumping to any conclusions concerning proximate cause of your networking difficulty, it is best to systematically figure out what it might be. a) List points of possible failure. They are in very general terms: ethernet switch(es), cables, NICs, kernel/drivers, software. b) On the switches one can have two or three general classes of failure. The simplest and most pernicious is probably "bad ports". A port can be "bad" because its internal electronics is screwed, maybe by spot heating (resulting in intermittant problems or port death) or by the little bitty wires in the RJ45 socket getting bent or deforming over time so that solid contact is no longer made with an inserted cable. This happens -- especially if the wires aren't properly supported going into the port socket so that they exert a torgue that pressures the contact wires in a warm system over a long time. Dust or corrosion can also contribute to spotty electrical connectivity inside the port connection itself. "Bad ports" can usually be identified by the fact that any system plugged into the port has a high chance of having problems where that same system, plugged into a different port on a different switch, works flawlessly. The solution to bad ports is throw out-of-service-contract switches away, buy new ones with a four year contract on them (tier 1 or tier 2 if possible) and move on with life. Switches are cheap -- human time is expensive. c) cables can always be bad, especially if they are dangling, homemade, have been moved around a lot. Again, the little contact pins deform under certain heat/pressure circumstances. Also the PLASTIC THEY ARE MADE OF can deform under warm pressure so that they no longer seat properly in a port socket, or the little "snap" on the back wears down so that they can wiggle to where connectivity is mediocre. Note that in wiring situations with patch panels, bad cable/port connections extend recursively between system and port. "Bad cables" can be detected one of several ways. The best way is to invest in a halfway decent cable tester. The cheap one I own can be snapped into any socket with a short and RELIABLE patch cable or accept any cable being tested. It then transmits voltage on each wire pair one at a time that is reflected at the far end and lights up little LEDs that tell you if any pairs are faulting. This is good for cable breaks and bad contacts, but can "pass" marginal cables that are making contact but arcing a bit. To do better, you have to use a higher quality tester that actually puts a data signal on the line, or use e.g. a laptop and a secondary reliable system to test the secondary cabling route compared to a "known good" point-to-point (direct) cable hookup. This approach can with care help detect bad ports on switches as well, although it can be difficult to resolve problems with switches from problems with ports. d) Your problem is very unlikely to be bad NICs, but if it is it will show up when you use a known-good crossover cable to directly connect your laptop to the suspect NIC and observe significant problems with connectivity (bad ping rates, especially on ping floods). Ping is actually a fairly powerful tool, as is traceroute. netpipe is pretty good for testing interfaces as well. e) Integrated kernel/driver/card problems are far from unknown, especially for certain cards. At one point in time, for example, RTL 8139 cards were cheap and nearly ubiquitous -- and sucked unimaginably. They effectively didn't buffer incoming traffic, and one could override the kernel/driver's ability to process asynchronous arriving packets with ease. Consequently they'd "work" until one tried to send a high speed stream of small packets to one, at which point they'd basically fail to receive 9 out of 10 of them. I actually managed to get a netperf out of a 100 Mbps card of something like 1 Mbps -- the other 99% of the packets were basically dropped and either lost (UDP) or had to be retransmitted (TCP). This is a very difficult problem to diagnose. General symptoms that should make you suspect this is a problem include -- having debugged the switch and physical connection and found that it works fine for "known good" NICs on both ends. Getting decent connectivity for certain "low stress" applications or low loads, but losing more and more packets as you increase the network load on the NIC. Obvious kernel-based network error messages in /var/log/messages. And the kicker -- buy a known-good ethernet card, one you are certain works perfectly in the kernel (the listvolken can probably give you half a dozen recommendations if you don't already favor 3coms or intels or the like). Swap it into a system that is having problems and leaving everything else the same (wire, port, etc) repeat the test. If the problem goes away, it's a bad sign for the NIC. Solution is probably to just put known good NICs in the systems and stop using the (usually onboard and sucky) NIC. NICs are cheap, time is dear. Note well that you've already had excellent advice concerning autonegotiation of duplex and speed. This SHOULD NOT be a problem with pretty much any modern (unmanaged) switch and card, but old-timers remember well that it once was and might be again. Or if the switch is a managed switch, then god only knows what state it is in and debugging this becomes a major PITA. My own solution is still to dump the switch and get a new one. You can get really lovely 48 port dual power supply gigabit switches for a kilobuck or so from Dell, or you can spend a lot less and get perfectly reasonable unmanaged switches from any of a half-dozen vendors. One good way to debug things is just to get a new switch and see if it solves all your problems -- how many hours do you have to waste before a few hundred dollars worth of new hardware is cheap? f) Software (e.g. MPI flavor, PVM, userspace socket code). This too is a bitch to debug as there are a near infinity of ways to write buggy code, sometimes buggy code where the failure only occurs under certain rarely-accessed modes of operation of the software involved. The diagnosis here is one of exclusion. If ping, traceroute, netperf or netpipe, hardware network testers, card swapping, switch swapping all yield no problems but the "problem" persists for your application throughout, you should suspect that your application is somehow buggy. This is by no means impossible, depending on just what is being done with the sockets. If you conclude that it is likely to BE your application, you have only two or three possible routes. * fix it yourself * get somebody to fix it for you * throw up your hands in disgust and use a different tool to accomplish the same task. Which one works depends on whether the tool is open source or commercial, your coding skills, availability of service or support forums, contact with the developers, etc. The idea of all of the above is to set up a diagnosis tree and run down it >>systematically<< until you figure out what is wrong. This method has never failed me through numerous diagnoses of every possible mode of failure over twenty years. I've seen NICs with intermittant heat-sensitive malfunctions (and figured it out), NICs with visibly toasted components (afio), bad cables and ports galore (afio), bad switches (afio), bad drivers (afio), and bad software (afio, without saying that I was always able to fix the latter). If you proceed very systematically you can eventually end up where one answer, however unlikely it might appear, is the truth. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From smulcahy at aplpi.com Mon Apr 2 07:28:37 2007 From: smulcahy at aplpi.com (stephen mulcahy) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors? In-Reply-To: <460FE4C9.6040204@berkeley.edu> References: <460AFFA1.6070103@berkeley.edu> <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> <460FE4C9.6040204@berkeley.edu> Message-ID: <46111315.5030509@aplpi.com> Hi Jon, Things I look out for in switches in general are reliability and build quality. I'd have some cheaper switches which worked but got worryingly warm to touch. The 3com switches we use in our office in general tend to be solid and don't seem to heat up as much as some of the SMCs. Having said that, I've heard good things (here mostly) about some specific SMC switches. I generally don't pay for managed switches unless I have clear needs to work with my traffic at that level. I don't have those needs for a small office or department environment. For a cluster, given the budgets typically involved, I'm inclined to err on the side of a switch with a good reputation and a more extensive feature-set then I actually need since it is such a critical piece of the picture. For clusters, the overall bandwidth of the switch is also a huge issue. It's still not clear to me how reliable manufacturers figures for switch bandwidth are though. The procurve we have in our cluster seems to be performing well, and as I said, I've heard good things about some of the SMCs (tigers?) but short of going with what others are using successfully I'm not sure. Has anyone tested a dozen switches in a lab for backplane bandwidth? I'm sure the more experience members will have more concrete pointers but maybe my comments give you a starting point - it's an interesting, and very relevant, question. -stephen Jon Forrest wrote: -- Stephen Mulcahy, Applepie Solutions Ltd, Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. http://www.aplpi.com From deadline at clustermonkey.net Mon Apr 2 09:12:29 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> Message-ID: <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> > Hi All, > > Would any of you please like to share usage-experience/views/comments > about Windows Compute Cluster Server 2003 based Beowulf Clusters? As a point of clarification, there is no such thing as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" This link may help: http://en.wikipedia.org/wiki/Beowulf_cluster > > What in your opinion is the future of such clusters? > > How you compare these with the LINUX CLUSTERS? You will not find much information on this list as it mainly focuses on Linux Beowulf style clusters. My understanding is MS is working on the "ease of use" and integration issues. You may find more information at http://www.winhpc.org/ -- Doug > > regards. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:46110fba89467511819938! > -- Doug From epaulson at cs.wisc.edu Mon Apr 2 09:45:39 2007 From: epaulson at cs.wisc.edu (Erik Paulson) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> Message-ID: <20070402164539.GA20577@swingline.cs.wisc.edu> On Mon, Apr 02, 2007 at 12:12:29PM -0400, Douglas Eadline wrote: > > Hi All, > > > > Would any of you please like to share usage-experience/views/comments > > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > > As a point of clarification, there is no such thing > as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" > This link may help: > > http://en.wikipedia.org/wiki/Beowulf_cluster > I don't know why everyone is so obsessed with saying "Your Beowulf must run an (F)OSS operating system to be a Beowulf." You can build a Beowulf out of Windows. God only knows why you'd want to, but you can. Just to invent a little bit of evidence, Thomas Sterling edited a book called "Beowulf Cluster Computing with Windows" http://www.amazon.com/Beowulf-Computing-Scientific-Engineering-Computation/dp/0262692759/ref=pd_sim_b_5/002-1371173-4594458?ie=UTF8&qid=1175531228&sr=8-1 It was actually two books - a "Beowulf Cluster Computing with Linux" and a "Beowulf Cluster Computing with Windows". 75% of the text was the same. (We wrote a chapter in it - we used the same chapter, with latex macros \iflinux and \ifwinnt for whichever book was being built) The Linux book way outsold the Windows book, and so there was no second edition of the Windows book. My guess is that most everyone had the good sense to say "Windows as the base OS for my cluster? No thanks" > > > > What in your opinion is the future of such clusters? > > > > How you compare these with the LINUX CLUSTERS? > > You will not find much information on this list as > it mainly focuses on Linux Beowulf style clusters. > The parallel programming part of this list applies to Windows as much as it applies to Linux (or FreeBSD or Darwin or HURD) -Erik From peter.st.john at gmail.com Mon Apr 2 09:55:55 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> Message-ID: Amjad, MS WIn3.11 was a GUI application on top of a small, reliable OS (DOS). Since then, however, MS has integrated the OS with the GUI, which is awkward for developing interprocess communication and booting light kernels. Since XP, there is further integration of the OS with licencing and security material, which can't be stripped off without comprimising some big MS concerns. Basically, Unix was designed to be a development environment; VMS was designed for safe and efficient DP; MS was designed for marketing (and arguably, an easier end-user experience). The main reason I know of to use MS for a server is to commoditize MS-certified admins, who are cheaper than RGB is. Even IBM uses a zillion instances of Linux over VMWare on a 390 to make a web farm. I'd be interested in a benchmark comparison of MS cluster vs linux cluster, but I pause at wondering who would bother. Of course I'd merely expect that the OS would be too big to make sense on a compute node; I'm told that Vista has an efficient kernel, but if MS stripped off the layers people would just make 1-node "clusters" to get around the licensing material, which would be counterproductive to MS. But suppose someone wanted to do a test to find out (instead of merely assuming). If the result came out well for MS, I'd want to be paid by MS; they exist for marketing, they are richer than Croesus, it would be foolish to do anything for them for free. But that would impugn objectivity. And if the result were poor for MS, nobody would care because that's our expectations (pardon me for speaking for others). So I just don't see that anyone at all is motivated to do an objective comparison. But I'd be very interested anyway in an honest attempt, if you hear of one. Salaam, Peter On 4/1/07, amjad ali wrote: > > Hi All, > > Would any of you please like to share usage-experience/views/comments > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > > What in your opinion is the future of such clusters? > > How you compare these with the LINUX CLUSTERS? > > regards. > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/d7abfb32/attachment.html From James.P.Lux at jpl.nasa.gov Mon Apr 2 10:17:41 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> Message-ID: <6.2.3.4.2.20070402100940.02fb3088@mail.jpl.nasa.gov> At 09:12 AM 4/2/2007, Douglas Eadline wrote: > > Hi All, > > > > Would any of you please like to share usage-experience/views/comments > > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > >As a point of clarification, there is no such thing >as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" >This link may help: > >http://en.wikipedia.org/wiki/Beowulf_cluster I know that most people interpret Beowulf to include only those with open source software (hence leaving out things like Windows), but I think that really, the distinguishing feature of a classic Beowulf cluster is that it uses "consumer commodity" sorts of things (e.g. inexpensive PCs intended for the consumer market). I think one could fairly say that Windows falls in the "consumer commodity" category. Now, WinCCS isn't exactly in the commodity area, but, then, neither are some of the high performance interconnects currently used, and that doesn't exclude them from discussion. And, of course, some Beowulf clusters have been built using Macs and Suns, with their respective not-entirely-open operating systems. It IS true that the "mainstream" of Beowulfery is Linux oriented, and, as Doug comments later, that IS the focus of the list. As far as Windows CCS goes, there have been a couple discussions on the list about what CCS is, how it fits in to the clustering world, and how such things might differ from the traditional Linux model. Check the archives. > > > > What in your opinion is the future of such clusters? > > > > How you compare these with the LINUX CLUSTERS? > >You will not find much information on this list as >it mainly focuses on Linux Beowulf style clusters. > >My understanding is MS is working on the "ease of use" >and integration issues. You may find more information >at http://www.winhpc.org/ > > -- > Doug > >-- >Doug James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From greg.lindahl at qlogic.com Mon Apr 2 10:18:20 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: <20070402171820.GA5081@dhcp-2-200.internal.keyresearch.com> On Sun, Apr 01, 2007 at 05:39:48PM -0400, Douglas Eadline wrote: > > I just posted some interesting news on Cluster Monkey. > > http://www.clustermonkey.net//content/view/192/1/ I can easily believe that rgb's documentation has a lot of redundancy in it... what the filter is doing is re-deriving the AI that wrote the documentation in the first place... which is probably much smaller than the document. -- g From greg.lindahl at qlogic.com Mon Apr 2 10:34:25 2007 From: greg.lindahl at qlogic.com (Greg Lindahl) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> Message-ID: <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> On Mon, Apr 02, 2007 at 12:55:55PM -0400, Peter St. John wrote: > The main reason I know of to use > MS for a server is to commoditize MS-certified admins, who are cheaper than > RGB is. Not on a per-word basis! rgb is far more productive! -- g From James.P.Lux at jpl.nasa.gov Mon Apr 2 10:36:18 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <20070402164539.GA20577@swingline.cs.wisc.edu> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> Message-ID: <6.2.3.4.2.20070402103446.02d16ed0@mail.jpl.nasa.gov> At 09:45 AM 4/2/2007, Erik Paulson wrote: >On Mon, Apr 02, 2007 at 12:12:29PM -0400, Douglas Eadline wrote: > > > Hi All, > > > > > > Would any of you please like to share usage-experience/views/comments > > > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > > > > As a point of clarification, there is no such thing > > as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" > > This link may help: > > > > http://en.wikipedia.org/wiki/Beowulf_cluster > > > >I don't know why everyone is so obsessed with saying "Your Beowulf must run >an (F)OSS operating system to be a Beowulf." > >You can build a Beowulf out of Windows. God only knows why you'd want to, but >you can. Perhaps because you have the boxes given to you? (e.g. Cornell) Perhaps because you relish the thrill? Perhaps because you have a huge investment in applications specific software tailored to Windows that you want to use? >Just to invent a little bit of evidence, Thomas Sterling edited a book >called "Beowulf Cluster Computing with Windows" > >http://www.amazon.com/Beowulf-Computing-Scientific-Engineering-Computation/dp/0262692759/ref=pd_sim_b_5/002-1371173-4594458?ie=UTF8&qid=1175531228&sr=8-1 > >It was actually two books - a "Beowulf Cluster Computing with Linux" and a >"Beowulf Cluster Computing with Windows". 75% of the text was the same. (We >wrote a chapter in it - we used the same chapter, with latex macros \iflinux >and \ifwinnt for whichever book was being built) > >The Linux book way outsold the Windows book, and so there was no second >edition of the Windows book. My guess is that most everyone had the good >sense to say "Windows as the base OS for my cluster? No thanks" > > > > > > > > What in your opinion is the future of such clusters? > > > > > > How you compare these with the LINUX CLUSTERS? > > > > You will not find much information on this list as > > it mainly focuses on Linux Beowulf style clusters. > > > >The parallel programming part of this list applies to Windows as >much as it applies to Linux (or FreeBSD or Darwin or HURD) CP/M forever >-Erik > >_______________________________________________ >Beowulf mailing list, Beowulf@beowulf.org >To change your subscription (digest mode or unsubscribe) visit >http://www.beowulf.org/mailman/listinfo/beowulf James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From peter.st.john at gmail.com Mon Apr 2 10:45:53 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: The one node refusing to send the doc, and the other note receiving it anyway, cracked me the *** up! Thanks P.S. the Google April Fool's actually got me thinking (network via plumbing). Water conducts acoustics real well. So a free peer-to-peer network within a city, or some counties, would be easy and require no new infrastructure. I imagine the bandwidth would be weak and certainly there'd be a problem getting between cities, but I don't think the water and sewer utiltiies claim rights to the acoustic bandwidth (unlike hijacking phone lines with stuff that would interfere with existing telephony) so all free. On 4/1/07, Douglas Eadline wrote: > > > I just posted some interesting news on Cluster Monkey. > > http://www.clustermonkey.net//content/view/192/1/ > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/6f5b8ee5/attachment.html From peter.st.john at gmail.com Mon Apr 2 10:50:28 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> Message-ID: Greg, cheaper per billable hour, I would never never say that a MCxE for any x is more valuable than RGB. RGB writes many words per minute, which an NT admin would spend many minutes per word reading... well except the good ones of course. No allusions to the Golden Mountain intended or implied. Peter On 4/2/07, Greg Lindahl wrote: > > On Mon, Apr 02, 2007 at 12:55:55PM -0400, Peter St. John wrote: > > > The main reason I know of to use > > MS for a server is to commoditize MS-certified admins, who are cheaper > than > > RGB is. > > Not on a per-word basis! rgb is far more productive! > > -- g > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/b8ce1503/attachment.html From deadline at clustermonkey.net Mon Apr 2 11:01:44 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <20070402164539.GA20577@swingline.cs.wisc.edu> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> Message-ID: <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> > On Mon, Apr 02, 2007 at 12:12:29PM -0400, Douglas Eadline wrote: >> > Hi All, >> > >> > Would any of you please like to share usage-experience/views/comments >> > about Windows Compute Cluster Server 2003 based Beowulf Clusters? >> >> As a point of clarification, there is no such thing >> as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" >> This link may help: >> >> http://en.wikipedia.org/wiki/Beowulf_cluster >> > > I don't know why everyone is so obsessed with saying "Your Beowulf must > run > an (F)OSS operating system to be a Beowulf." Because I believe, "the art of Beowulf" has a rich history of development that is based on the original definition from "How to Build a Beowulf" by Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese: ".. a collection of personal computers interconnected by widely available networking technology running anyone of several open source Unix-like operating systems. " I would not want to see it usurped by other clustering efforts. I believe that if we do not protect against revisionist history, then all of a sudden WCCS is now "Beowulf" computing. Such things, in my opinion dis-honor all the people (who I respect) that have contributed to this community. To me it is almost akin to removing author credit in open source software. A short aside. I overheard a conversation at SC-2000 about the origin of Beowulf from a MS representative "Beowulf was a copy of the Microsoft Wolfpack software. They chose that name so it would seem like Wolfpack some how". Truth is a slippery fish. Certainly Thomas Sterling can rework the definition as he pleases (he co-authored it), And I am not disparaging WCCS or any other clustering method. I just want to keep the credit where credit is due. So I stand as a defender of the faith, as it where. -- Doug > > You can build a Beowulf out of Windows. God only knows why you'd want to, > but > you can. > > Just to invent a little bit of evidence, Thomas Sterling edited a book > called "Beowulf Cluster Computing with Windows" > > http://www.amazon.com/Beowulf-Computing-Scientific-Engineering-Computation/dp/0262692759/ref=pd_sim_b_5/002-1371173-4594458?ie=UTF8&qid=1175531228&sr=8-1 > > It was actually two books - a "Beowulf Cluster Computing with Linux" and a > "Beowulf Cluster Computing with Windows". 75% of the text was the same. > (We > wrote a chapter in it - we used the same chapter, with latex macros > \iflinux > and \ifwinnt for whichever book was being built) > > The Linux book way outsold the Windows book, and so there was no second > edition of the Windows book. My guess is that most everyone had the good > sense to say "Windows as the base OS for my cluster? No thanks" > > >> > >> > What in your opinion is the future of such clusters? >> > >> > How you compare these with the LINUX CLUSTERS? >> >> You will not find much information on this list as >> it mainly focuses on Linux Beowulf style clusters. >> > > The parallel programming part of this list applies to Windows as > much as it applies to Linux (or FreeBSD or Darwin or HURD) > > -Erik > > > !DSPAM:46113337114068298414181! > -- Doug From rgb at phy.duke.edu Mon Apr 2 11:22:56 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> Message-ID: On Mon, 2 Apr 2007, Douglas Eadline wrote: >> Hi All, >> >> Would any of you please like to share usage-experience/views/comments >> about Windows Compute Cluster Server 2003 based Beowulf Clusters? > > As a point of clarification, there is no such thing > as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" > This link may help: > > http://en.wikipedia.org/wiki/Beowulf_cluster > >> >> What in your opinion is the future of such clusters? >> >> How you compare these with the LINUX CLUSTERS? Allow me to assume the Lotus position (as best as my creaky old joints permit). A few words of mantra to calm my spirit and purge the Daemons that afflict it whenever "Windows" is mentioned. A swig of beer to help calm them still more:-). There. I'm ready now. They compare with linux clusters by being: * Expensive (per seat) for licenses, if one respects that sort of thing. * Heavily commercial in application orientation. * Closed source -- if and where they don't work, you won't be able to fix them, tune them, alter them in any way. * Did I mention expensive? It is alleged that using it, it will be miraculously simpler to set up and configure to turn a pile of PCs into a cluster and then reliably run at least certain classes of largely commercial cluster applications, and that the ease of management will pay for the expense. It is my belief that this allegation is generally false, although in specific cases (e.g. shops that have no linux experience whatsoever and immensely deep pockets) it may be true when the marginal cost of a linux-skilled human (or training up one of your MSCEs to become linux-skilled) exceeds the marginal cost of a Windows cluster compared to a linux cluster. Linux clusters these days can be set up and built with a fairly minimal skillset, so the true economic marginal benefit is dubious. As in I routinely remotely advise students ranging from bright high school students in computer clubs to graduate students in various disciplines (not necessarily computer science) all over the world (for example, I currently have 2-3 groups I am directly advising in India). All of these student groups manage to craft functional linux clusters, often out of old and decrepit parts that (from what I hear) Vista would laugh at hysterically, perhaps, but never run on. If a bright high school student can manage to rig up a working linux cluster, one certainly HOPES that an MCSE can manage it on opportunity cost time in nearly all environments. Furthermore, it is quite trivial to set up linux clusters that boot diskless these days (and hard to beat diskless boot systems for easy of management), and it has finally become fairly easy to set up virtual clusters for embarrassingly parallel programs using e.g. Xen or VMware. One can go to the vmware website and -- for free -- download a windows player and virtual ready to run "instant cluster" environment and run a linux cluster node as a windows subtask. In this way once VMware itself is installed and configured as a good old Windows task there is NO configuration, installation, setup (beyond creating a virtual network so nodes can find one another). One isn't even forced to choose between Win and Lin clusters -- one can do both at the same time and simply decide which one your hardware will be today, with the caveat that node costs scale UP with Win and DOWN with Lin. With all that said, my opinion on the future of Win clusters is that they probably have one. With the advent of Universal System Virtualization (at the CPU hardware level) we are on the threshold of a new era in computing, one where the barrier between operating environments lowers to almost nothing. Microsoft has little choice but to participate in this or risk being badly hurt by free Linux environments that will run Windows as a virtual task "out of the box". They could find themselves in three years as being known as the most expensive linux desktop environment if they don't at least try to make linux one of Windows' least expensive desktop environments (that comes with order of ten to twenty thousand software packages ready to run). And in server environments the pressure is even greater -- people are eager to shrink their server environment to where hardware utilization is optimized instead of having one or more mostly idle servers per application, as is the RULE in Win server shops. Virtualize or die. So we'll just have to see where the chaos this creates takes us. I predict a wild ride for the next 60 months. Single server boxes with 4-8 cores, lots of memory, and running as many as 16 to 20 distinct server environments on top of 2-3 distinct operating systems are not out of the question. Windows cluster environment has the first 24 months of that to grow without that dominating everything, but from 36 months out my crystal ball gets very, very fuzzy. All I can say is that everybody needs to start learning to do their Xen Meditations...;-) rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Mon Apr 2 11:35:30 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> Message-ID: On Mon, 2 Apr 2007, Greg Lindahl wrote: > On Mon, Apr 02, 2007 at 12:55:55PM -0400, Peter St. John wrote: > >> The main reason I know of to use >> MS for a server is to commoditize MS-certified admins, who are cheaper than >> RGB is. > > Not on a per-word basis! rgb is far more productive! And on the list, I work for beer and pretzels. Top that, Microsoft....;-) Off the list I do get a bit more expensive. Greg, the GEEZER daemon met my rgbbot and they ran off together. I'm down to wrting my own txet again, two fingers at a time. They did say before they left that they'd already asked AND answered all the questions that would ever be asked on the ilst, so that the list could go ahead and shut down now. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From rgb at phy.duke.edu Mon Apr 2 11:37:28 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: On Mon, 2 Apr 2007, Peter St. John wrote: > The one node refusing to send the doc, and the other note receiving it > anyway, cracked me the *** up! Thanks > > P.S. the Google April Fool's actually got me thinking (network via > plumbing). Water conducts acoustics real well. So a free peer-to-peer > network within a city, or some counties, would be easy and require no new > infrastructure. I imagine the bandwidth would be weak and certainly there'd > be a problem getting between cities, but I don't think the water and sewer > utiltiies claim rights to the acoustic bandwidth (unlike hijacking phone > lines with stuff that would interfere with existing telephony) so all free. Are you seriously suggesting that people should deliberately adopt a shitty network? (:-o> rgb > > On 4/1/07, Douglas Eadline wrote: >> >> >> I just posted some interesting news on Cluster Monkey. >> >> http://www.clustermonkey.net//content/view/192/1/ >> >> >> -- >> Doug >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From deadline at clustermonkey.net Mon Apr 2 11:36:39 2007 From: deadline at clustermonkey.net (Douglas Eadline) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: <46041.192.168.1.1.1175538999.squirrel@mail.eadline.org> kind of gives new meaning to the term dropped packets ... > The one node refusing to send the doc, and the other note receiving it > anyway, cracked me the *** up! Thanks > > P.S. the Google April Fool's actually got me thinking (network via > plumbing). Water conducts acoustics real well. So a free peer-to-peer > network within a city, or some counties, would be easy and require no new > infrastructure. I imagine the bandwidth would be weak and certainly > there'd > be a problem getting between cities, but I don't think the water and sewer > utiltiies claim rights to the acoustic bandwidth (unlike hijacking phone > lines with stuff that would interfere with existing telephony) so all > free. > > On 4/1/07, Douglas Eadline wrote: >> >> >> I just posted some interesting news on Cluster Monkey. >> >> http://www.clustermonkey.net//content/view/192/1/ >> >> >> -- >> Doug >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> > > > !DSPAM:46114154123312004121031! > -- Doug From rgb at phy.duke.edu Mon Apr 2 11:58:13 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> Message-ID: On Mon, 2 Apr 2007, Douglas Eadline wrote: > ".. a collection of personal computers interconnected > by widely available networking technology running anyone of several > open source Unix-like operating systems. " > > I would not want to see it usurped by other clustering efforts. > I believe that if we do not protect against revisionist history, then > all of a sudden WCCS is now "Beowulf" computing. Such things, in my > opinion dis-honor all the people (who I respect) that have contributed > to this community. To me it is almost akin to removing author credit > in open source software. I second the motion. I should point out that this link: http://www.beowulf.org/overview/faq.html#1 Which has been there since Kragen Sitaker put it there back in oh, maybe 1997 or so, and he put it there directly from the original beowulf.org project description. There are good reasons for this to be part of the definition, as well. The Windows cluster is very likely not a >>beowulf<< for a variety of reasons, not just one. It is a form of cluster in a box, but the cluster it makes up seems very likely to be more suited for gridware than "real parallel code", in particular development code. The latter is where having an open source operating system and programming environment becomse so crucial. I can remember any number of times on this list (doubtless preserved in the archives somewhere) where serious problems with various aspects of the network were revealed and (eventually) repaired BECAUSE everything -- kernel, drivers, and application -- were open source. Try doing that with Microsoft's product. Ha. > A short aside. I overheard a conversation at SC-2000 about > the origin of Beowulf from a MS representative "Beowulf > was a copy of the Microsoft Wolfpack software. They chose > that name so it would seem like Wolfpack some how". > Truth is a slippery fish. In the hands of lying bastards it is. But Hitler fully understood the pervasive nature of the big lie, and MS uses it to perfection in their ongoing campaign of FUD. > Certainly Thomas Sterling can rework the definition as he pleases > (he co-authored it), And I am not disparaging WCCS or > any other clustering method. I just want to keep the credit > where credit is due. > > So I stand as a defender of the faith, as it were. And as a defender of the historical record against revisionism. THAT is worth fighting about. rgb > > -- > Doug > >> >> You can build a Beowulf out of Windows. God only knows why you'd want to, >> but >> you can. >> >> Just to invent a little bit of evidence, Thomas Sterling edited a book >> called "Beowulf Cluster Computing with Windows" >> >> http://www.amazon.com/Beowulf-Computing-Scientific-Engineering-Computation/dp/0262692759/ref=pd_sim_b_5/002-1371173-4594458?ie=UTF8&qid=1175531228&sr=8-1 >> >> It was actually two books - a "Beowulf Cluster Computing with Linux" and a >> "Beowulf Cluster Computing with Windows". 75% of the text was the same. >> (We >> wrote a chapter in it - we used the same chapter, with latex macros >> \iflinux >> and \ifwinnt for whichever book was being built) >> >> The Linux book way outsold the Windows book, and so there was no second >> edition of the Windows book. My guess is that most everyone had the good >> sense to say "Windows as the base OS for my cluster? No thanks" >> >> >>>> >>>> What in your opinion is the future of such clusters? >>>> >>>> How you compare these with the LINUX CLUSTERS? >>> >>> You will not find much information on this list as >>> it mainly focuses on Linux Beowulf style clusters. >>> >> >> The parallel programming part of this list applies to Windows as >> much as it applies to Linux (or FreeBSD or Darwin or HURD) >> >> -Erik >> >> >> !DSPAM:46113337114068298414181! >> > > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From landman at scalableinformatics.com Mon Apr 2 11:59:39 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> Message-ID: <4611529B.5080200@scalableinformatics.com> Robert G. Brown wrote: >> Not on a per-word basis! rgb is far more productive! > > And on the list, I work for beer and pretzels. Top that, > Microsoft....;-) You do? (rushes to get an RGB contract in place ....) > > Off the list I do get a bit more expensive. D'oh! Imported beer. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From peter.st.john at gmail.com Mon Apr 2 11:59:54 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: On 4/2/07, Robert G. Brown wrote: > > On Mon, 2 Apr 2007, Peter St. John wrote: > > > The one node refusing to send the doc, and the other note receiving it > > anyway, cracked me the *** up! Thanks > > > > P.S. the Google April Fool's actually got me thinking (network via > > plumbing). Water conducts acoustics real well. So a free peer-to-peer > > network within a city, or some counties, would be easy and require no > new > > infrastructure. I imagine the bandwidth would be weak and certainly > there'd > > be a problem getting between cities, but I don't think the water and > sewer > > utiltiies claim rights to the acoustic bandwidth (unlike hijacking phone > > lines with stuff that would interfere with existing telephony) so all > free. > > Are you seriously suggesting that people should deliberately adopt a > shitty network? > > (:-o> > > rgb Ah, it was just a pipe-dream. Among beowulf-gurus Robert you remain on the throne! (B-{|} (glasses with goatee) Peter > > > On 4/1/07, Douglas Eadline wrote: > >> > >> > >> I just posted some interesting news on Cluster Monkey. > >> > >> http://www.clustermonkey.net//content/view/192/1/ > >> > >> > >> -- > >> Doug > >> _______________________________________________ > >> Beowulf mailing list, Beowulf@beowulf.org > >> To change your subscription (digest mode or unsubscribe) visit > >> http://www.beowulf.org/mailman/listinfo/beowulf > >> > > > > -- > Robert G. Brown http://www.phy.duke.edu/~rgb/ > Duke University Dept. of Physics, Box 90305 > Durham, N.C. 27708-0305 > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu > > > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/5df6983e/attachment.html From landman at scalableinformatics.com Mon Apr 2 12:07:08 2007 From: landman at scalableinformatics.com (Joe Landman) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> Message-ID: <4611545C.2050907@scalableinformatics.com> Douglas Eadline wrote: > I believe that if we do not protect against revisionist history, then [...] you mean like how now with WCCS2k+3 clustering and HPC is *now* (suddenly magically spontaneously) "mainstream" ? This is just something I personally take issue with. The entire explosive growth of clustering has driven HPC hard into the mainstream. This happened long before it was a glimmer in their eyes. 6+ years of explosive growth, going from noise in the statistics to dominating the statistics. Then along they came with WCCS2k+3. Their entry is late into the cycle. And if you listen to the comments of the senior execs, it makes one wonder how committed they are to HPC and clusters as compared to how committed they are to battling Linux. This is not to diminish their efforts. WCCS2k+3 is likely reasonably good for some subset of groups. Microsoft has some good people there, and playing with the W2k+3 x64 on our JackRabbit unit was fun. They still need a real POSIX subsystem, and hopefully, someday, they will give in, and get cygwin or mingw to be fully supported/shipping using their compilers/tools. Though I expect to see airborn and stable flight from porcine critters about the same time. Too bad, as that would likely ease adoption/porting issues. Tremendously. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com phone: +1 734 786 8423 fax : +1 734 786 8452 or +1 866 888 3112 cell : +1 734 612 4615 From peter.st.john at gmail.com Mon Apr 2 12:23:26 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <4611545C.2050907@scalableinformatics.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> <4611545C.2050907@scalableinformatics.com> Message-ID: A couple weeks ago a kid (by which I mean, energetic person) who makes his living via MS products (as I often do) said that Vista solved the problem of viruses, so there won't be viruses any more. We discussed it (and he deserved some credit for patience, because my intial reaction was overtly dismissive). It turned out that what he meant was that MS has integrated Virus Scanning with the OS, so instead of downloading a dictionary of up-to-date keywords (like "An**na Kourni**kova", I still fear to type her name) for every byte in your system to be matched against, from a 3d party vendor, you will get it automatically when you do your regular, probably automatic, OS update. I remmember when (that AK virus) came out, I read the VBS and wrote up a study of it, which I mailed to a NT Admin friend of mine. HIs corporate firewall bounced it (because it had "that" name in the **subject**, there was NO attachment at all). The Firewall guy replied to me and sent it through to the intended recipient, but I was astonished at what Virus Scanners **really** are. I'm sorry, but I think anyone who would put an OS that integrates all this stuff, onto a compute node in a cluster, is a moron. I'm sorry. Peter On 4/2/07, Joe Landman wrote: > > Douglas Eadline wrote: > > > I believe that if we do not protect against revisionist history, then > > [...] > > you mean like how now with WCCS2k+3 clustering and HPC is *now* > (suddenly magically spontaneously) "mainstream" ? > > This is just something I personally take issue with. The entire > explosive growth of clustering has driven HPC hard into the mainstream. > This happened long before it was a glimmer in their eyes. 6+ years of > explosive growth, going from noise in the statistics to dominating the > statistics. Then along they came with WCCS2k+3. > > Their entry is late into the cycle. And if you listen to the comments > of the senior execs, it makes one wonder how committed they are to HPC > and clusters as compared to how committed they are to battling Linux. > > This is not to diminish their efforts. WCCS2k+3 is likely reasonably > good for some subset of groups. Microsoft has some good people there, > and playing with the W2k+3 x64 on our JackRabbit unit was fun. They > still need a real POSIX subsystem, and hopefully, someday, they will > give in, and get cygwin or mingw to be fully supported/shipping using > their compilers/tools. > > Though I expect to see airborn and stable flight from porcine critters > about the same time. Too bad, as that would likely ease > adoption/porting issues. Tremendously. > > -- > > Joseph Landman, Ph.D > Founder and CEO > Scalable Informatics LLC, > email: landman@scalableinformatics.com > web : http://www.scalableinformatics.com > phone: +1 734 786 8423 > fax : +1 734 786 8452 or +1 866 888 3112 > cell : +1 734 612 4615 > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/98665d98/attachment.html From peter.st.john at gmail.com Mon Apr 2 13:15:05 2007 From: peter.st.john at gmail.com (Peter St. John) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> Message-ID: If you'd like to keep the history of Beowulfry straight for the coming generation, I urge someone to fix up the reference to "Thomas L Sterling" in the Wikipedia article, http://en.wikipedia.org/wiki/Beowulf_%28computing%29 ; apparently there are two Thomas Sterling's, and the article had had a link to the wrong one, so now there is no link. If you click on the "broken" link (to his name, in the article), it puts you in the editor to create the page (if none exists still, when you do this). Then a minimal article would just be: == Thomas Sterling == Co-author of the seminal work "How to Build a Beowulf" regarding [[Beowulf_%28computing%29| Beowulf]] strategy of cost-effective high-performance computing with commodity hardware and open-source software. That's it. Drop me a line if I can help in any way. I'm sure lot's of you could do great justice to this individual; though I spent hours doing the mindlessly-wiki-compliant article for the late mathematician Leonard Carlitz http://en.wikipedia.org/wiki/Leonard_Carlitz . Just publishing a book is plenty "notability" to satisfy the wiki standards; here you have a whole mailing list of rocket scientists to back you up. Peter "Those who do not know history are doomed to repeat it. Those who do not learn meta-history are doomed to have it reinvented on their heads." -- Euphistopheles On 4/2/07, Douglas Eadline wrote: > > > On Mon, Apr 02, 2007 at 12:12:29PM -0400, Douglas Eadline wrote: > >> > Hi All, > >> > > >> > Would any of you please like to share usage-experience/views/comments > >> > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > >> > >> As a point of clarification, there is no such thing > >> as a "Windows Compute Cluster Server 2003 based Beowulf Clusters" > >> This link may help: > >> > >> http://en.wikipedia.org/wiki/Beowulf_cluster > >> > > > > I don't know why everyone is so obsessed with saying "Your Beowulf must > > run > > an (F)OSS operating system to be a Beowulf." > > Because I believe, "the art of Beowulf" has a rich history of development > that is based on the original definition from "How to Build a Beowulf" by > Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese: > > ".. a collection of personal computers interconnected > by widely available networking technology running anyone of several > open source Unix-like operating systems. " > > I would not want to see it usurped by other clustering efforts. > I believe that if we do not protect against revisionist history, then > all of a sudden WCCS is now "Beowulf" computing. Such things, in my > opinion dis-honor all the people (who I respect) that have contributed > to this community. To me it is almost akin to removing author credit > in open source software. > > A short aside. I overheard a conversation at SC-2000 about > the origin of Beowulf from a MS representative "Beowulf > was a copy of the Microsoft Wolfpack software. They chose > that name so it would seem like Wolfpack some how". > Truth is a slippery fish. > > Certainly Thomas Sterling can rework the definition as he pleases > (he co-authored it), And I am not disparaging WCCS or > any other clustering method. I just want to keep the credit > where credit is due. > > So I stand as a defender of the faith, as it where. > > -- > Doug > > > > > You can build a Beowulf out of Windows. God only knows why you'd want > to, > > but > > you can. > > > > Just to invent a little bit of evidence, Thomas Sterling edited a book > > called "Beowulf Cluster Computing with Windows" > > > > > http://www.amazon.com/Beowulf-Computing-Scientific-Engineering-Computation/dp/0262692759/ref=pd_sim_b_5/002-1371173-4594458?ie=UTF8&qid=1175531228&sr=8-1 > > > > It was actually two books - a "Beowulf Cluster Computing with Linux" and > a > > "Beowulf Cluster Computing with Windows". 75% of the text was the same. > > (We > > wrote a chapter in it - we used the same chapter, with latex macros > > \iflinux > > and \ifwinnt for whichever book was being built) > > > > The Linux book way outsold the Windows book, and so there was no second > > edition of the Windows book. My guess is that most everyone had the good > > sense to say "Windows as the base OS for my cluster? No thanks" > > > > > >> > > >> > What in your opinion is the future of such clusters? > >> > > >> > How you compare these with the LINUX CLUSTERS? > >> > >> You will not find much information on this list as > >> it mainly focuses on Linux Beowulf style clusters. > >> > > > > The parallel programming part of this list applies to Windows as > > much as it applies to Linux (or FreeBSD or Darwin or HURD) > > > > -Erik > > > > > > !DSPAM:46113337114068298414181! > > > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20070402/b9c2597a/attachment.html From rgb at phy.duke.edu Mon Apr 2 13:17:45 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <4611529B.5080200@scalableinformatics.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> <4611529B.5080200@scalableinformatics.com> Message-ID: On Mon, 2 Apr 2007, Joe Landman wrote: > Imported beer. ...and those fat, soft pretzels with cheezy-mustard sauce or rolled in asiago parmesan and garlic. rgb -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From James.P.Lux at jpl.nasa.gov Mon Apr 2 13:48:46 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> Message-ID: <6.2.3.4.2.20070402134556.02e9e9e8@mail.jpl.nasa.gov> At 11:01 AM 4/2/2007, Douglas Eadline wrote: > > On Mon, Apr 02, 2007 at 12:12:29PM -0400, Douglas Eadline wrote: > >> > Hi All, > >> > >Because I believe, "the art of Beowulf" has a rich history of development >that is based on the original definition from "How to Build a Beowulf" by >Thomas Sterling, John Salmon, Donald J. Becker and Daniel F. Savarese: > > ".. a collection of personal computers interconnected > by widely available networking technology running anyone of several > open source Unix-like operating systems. " > >I would not want to see it usurped by other clustering efforts. >I believe that if we do not protect against revisionist history, then >all of a sudden WCCS is now "Beowulf" computing. Such things, in my >opinion dis-honor all the people (who I respect) that have contributed >to this community. To me it is almost akin to removing author credit >in open source software. I concede that I was incorrect in my contention about Beowulf being just commodity, and not necessarily opensource. I beg forgiveness for a moment of misremembering (my copy of Sterling, et.al., is at home on my bedside table). Mistakes were made, can't we all just along, etc.etc.etc. {I do, however, retain the inclination to play devil's advocate on occasion, just because a good discussion helps crystallize the answers...} James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From James.P.Lux at jpl.nasa.gov Mon Apr 2 13:51:35 2007 From: James.P.Lux at jpl.nasa.gov (Jim Lux) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <4611529B.5080200@scalableinformatics.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402173425.GA5320@dhcp-2-200.internal.keyresearch.com> <4611529B.5080200@scalableinformatics.com> Message-ID: <6.2.3.4.2.20070402134937.02fce5e8@mail.jpl.nasa.gov> At 11:59 AM 4/2/2007, Joe Landman wrote: >Robert G. Brown wrote: > >>>Not on a per-word basis! rgb is far more productive! >>And on the list, I work for beer and pretzels. Top that, >>Microsoft....;-) > >You do? (rushes to get an RGB contract in place ....) > >>Off the list I do get a bit more expensive. > >D'oh! > >Imported beer. What's considered imported??? Used to be that Coors was an imported brand in the Carolinas.. James Lux, P.E. Spacecraft Radio Frequency Subsystems Group Flight Communications Systems Section Jet Propulsion Laboratory, Mail Stop 161-213 4800 Oak Grove Drive Pasadena CA 91109 tel: (818)354-2075 fax: (818)393-6875 From ballen at gravity.phys.uwm.edu Mon Apr 2 13:58:06 2007 From: ballen at gravity.phys.uwm.edu (Bruce Allen) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] How to Diagnose Cause of Cluster Ethernet Errors? In-Reply-To: <460FE4C9.6040204@berkeley.edu> References: <460AFFA1.6070103@berkeley.edu> <40218.192.168.1.1.1175362730.squirrel@mail.eadline.org> <460FE4C9.6040204@berkeley.edu> Message-ID: Hey Jon, Fun to see you here!! I was just looking through some old Goleta pictures last week. Just for kicks have a look at these figures: http://www.lsc-group.phys.uwm.edu/beowulf/nemo/design/SMC_8508T_Performance.html This was part of a study that we did to select edge switches for the NEMO cluster. We were able to find sub-$100 switches that were wire speed up to MTUs of about 6k. There was a big difference between similar looking cheap switches from various companies. And indeed, 'under the hood' they all used integrated chip sets from a handful of chip vendors. Here are some more testing results from different edge switches: http://www.lsc-group.phys.uwm.edu/beowulf/nemo/design/switching.html (Note: our processing is embarassingly parallel, so we are primarily building compute farms. We don't need very high bandwidth very low latency connections, eg infiniband or myrinet performance.) Cheers, Bruce On Sun, 1 Apr 2007, Jon Forrest wrote: > Douglas Eadline wrote: > >> >> I am constantly amazed at how many people buy the >> latest and greatest node hardware and then connect >> them with a sub-optimal switch (or cheap cables), thus reducing >> the effective performance of the nodes (for parallel >> applications). Kind "penny wise and pound foolish" as they say. >> > > I sincerely appreciate all the comments about my problem. I will reply > to them in due time. However, I'd like to comment on this, which > admittedly is off-topic from my original posting. > > I don't disagree with what you're saying. The problem is how > to recognize "sub-optimal" equipment. For example, I see > three tiers in ethernet switching hardware: > > 1) The low-end, e.g. Netgear, Linksys, D-link, ... > > 2) The mid-end, e.g. HP Procurve, Dell, SMC, ... > > 3) The high-end, e.g. Cisco, Foundry, ... > > What I, as a system manager, not as an Electrical Engineer, > have trouble understanding, is what the true differences > are between these levels, and, at one level, between > the various vendors. > > These days I suspect that many of the vendors are using > ASICs made by other chip companies, and the many vendors > use the same ASICs. Assuming that's true, where's the > added value that justifies the cost differences? Sometimes > the value is in the "management" abilities of a device. > I don't deny this can be a major selling point in a > large enterprise environment, but in a 30-node cluster, > or a small LAN, it's hard to justify paying for this. > > In terms of ethernet performance, once a device > can handle wirespeed communication on all ports, > where's the added value that justifies the added > cost? I'm looking for empirical answers, which > aren't always easy to find, and sometimes to understand. > > In the case of my cluser, it was configured and purchased > before I got here, so I had nothing to do with choosing > its components but I have to admit that I'm not > sure what I would have done differently. > > Cordially, > > Jon Forrest > Unix Computing Support > College of Chemistry > 173 Tan Hall > University of California Berkeley > Berkeley, CA > 94720-1460 > 510-643-1032 > jlforrest@berkeley.edu > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > From mitch48 at sbcglobal.net Mon Apr 2 15:37:55 2007 From: mitch48 at sbcglobal.net (Tom Mitchell) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <39703.192.168.1.1.1175530349.squirrel@mail.eadline.org> <20070402164539.GA20577@swingline.cs.wisc.edu> <38539.192.168.1.1.1175536904.squirrel@mail.eadline.org> <4611545C.2050907@scalableinformatics.com> Message-ID: <20070402223755.GA24691@xtl1.xtl.tenegg.com> On Mon, Apr 02, 2007 at 03:23:26PM -0400, Peter St. John wrote: > > A couple weeks ago a kid (by which I mean, energetic person) who makes his > living via MS products (as I often do) said that Vista solved the problem of > viruses, so there won't be viruses any more. We discussed it (and he > deserved some credit for patience, because my intial reaction was overtly > dismissive). Sort of off topic to the title but this implies that Vista is fair game to abuse just like any Windows product. .... Date: Fri, 30 Mar 2007 14:47:54 ======= Technical Cyber Security Alert TA07-089A Microsoft Windows ANI header stack buffer overflow Original release date: March 30, 2007 Last revised: -- Source: US-CERT Systems Affected Microsoft Windows 2000, XP, Server 2003, and Vista are affected. Applications that provide attack vectors include: but this implies that Vista .... * Vulnerability Note VU#191609 - * Microsoft Security Advisory (935423) - * Unpatched Drive-By Exploit Found On The Web - -- T o m M i t c h e l l Found me a new place to hang my hat :-) Now it got bought. From mitch48 at sbcglobal.net Mon Apr 2 16:07:21 2007 From: mitch48 at sbcglobal.net (Tom Mitchell) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> Message-ID: <20070402230721.GB24691@xtl1.xtl.tenegg.com> On Sun, Apr 01, 2007 at 02:02:53PM +0500, amjad ali wrote: > Date: Sun, 1 Apr 2007 14:02:53 +0500 > From: "amjad ali" > To: beowulf@beowulf.org > Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! > > Hi All, > > Would any of you please like to share usage-experience/views/comments > about Windows Compute Cluster Server 2003 based Beowulf Clusters? > > What in your opinion is the future of such clusters? > > How you compare these with the LINUX CLUSTERS? With full consideration to "fat, soft pretzels with cheezy-mustard sauce or rolled in asiago parmesan and garlic." MS pulled a version of mpich/mvapich/MPI and ported it to windows. They also developed some library code to gateway some *inx library/system calls to windows. The root sources of MPI are public and not GPL so they can. It might be worth looking at the MS announcement -- but why bother. If you look you might think that common MPI codes would just compile and run... I have no idea I expect some will and there begins silly porting for the next... Once a set of boxes are interconnected and you have library support for MPI or another way to share data (PVM... whatever) you are off and running in the clustering world. Sadly MS has a MS specific library that abuses "standard MPI" and could quickly cause source code to surface that runs correctly or on a MS cluster but not on another OS based cluster (Linux, Solaris, Irix, AIX). I see this all the time with java script, and c, c++, and other codes where little 'features' hook you in. Some will be fooled into thinking that this is something to look at or worse something to spend money on. SUMMARY: Since you posted this on 1 Apr 2007 all I can do is giggle and wonder why I replied. Regards, mitch PS: Ask in a year but not on April fools/joke day. -- T o m M i t c h e l l Found me a new place to hang my hat :-) Now it got bought. From rgb at phy.duke.edu Mon Apr 2 17:21:37 2007 From: rgb at phy.duke.edu (Robert G. Brown) Date: Tue Oct 7 01:13:45 2008 Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! In-Reply-To: <20070402230721.GB24691@xtl1.xtl.tenegg.com> References: <428810f20704010202k15df1d28m1a12a7d0c3facb92@mail.gmail.com> <20070402230721.GB24691@xtl1.xtl.tenegg.com> Message-ID: On Mon, 2 Apr 2007, Tom Mitchell wrote: > On Sun, Apr 01, 2007 at 02:02:53PM +0500, amjad ali wrote: >> Date: Sun, 1 Apr 2007 14:02:53 +0500 >> From: "amjad ali" >> To: beowulf@beowulf.org >> Subject: [Beowulf] Win64 Clusters!!!!!!!!!!!! >> >> Hi All, >> >> Would any of you please like to share usage-experience/views/comments >> about Windows Compute Cluster Server 2003 based Beowulf Clusters? >> >> What in your opinion is the future of such clusters? >> >> How you compare these with the LINUX CLUSTERS? > > With full consideration to "fat, soft pretzels with > cheezy-mustard sauce or rolled in asiago parmesan and garlic." > > MS pulled a version of mpich/mvapich/MPI and ported it to windows. > They also developed some library code to gateway some *inx library/system > calls to windows. The root sources of MPI are public and not GPL so they can. > > It might be worth looking at the MS announcement -- but why > bother. If you look you might think that common MPI codes > would just compile and run... I have no idea I expect some will > and there begins silly porting for the next... Sure. MS did this, no doubt. And as you note below, no sooner do they get it in when they begin the borgification of MPI, just as they've borgified java, c, c++, and anything in the Universe they can sucker somebody into buying in borgified form. Borgifying MPI is the most humorous thing in the Universe, BTW, given its historical origins -- it was basically a language written (reluctantly!) by supercomputer vendors when the US government got tired of paying for all their important codes to be ported to each new generation of proprietary hardware with its proprietary low level calls. MS is doubtless trying to figure out just how much of that they can undo while building up a big enough market share and enough vendors of closed source applications written with their borgisms that they can... Oh wait. It IS GPL. Do you think that they actually read it? However, I was really referring to the other aspects of program development and performance tuning associated with using a closed source development environment. Resistance is Futile. rgb > > Once a set of boxes are interconnected and you have library > support for MPI or another way to share data (PVM... whatever) > you are off and running in the clustering world. Sadly MS > has a MS specific library that abuses "standard MPI" and could > quickly cause source code to surface that runs correctly or on a > MS cluster but not on another OS based cluster (Linux, Solaris, > Irix, AIX). I see this all the time with java script, and c, > c++, and other codes where little 'features' hook you in. > > Some will be fooled into thinking that this is something to look at > or worse something to spend money on. > > SUMMARY: > Since you posted this on 1 Apr 2007 all I can do is giggle > and wonder why I replied. > > Regards, > mitch > > PS: Ask in a year but not on April fools/joke day. > > -- Robert G. Brown http://www.phy.duke.edu/~rgb/ Duke University Dept. of Physics, Box 90305 Durham, N.C. 27708-0305 Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb@phy.duke.edu From gerry.creager at tamu.edu Tue Apr 3 06:07:26 2007 From: gerry.creager at tamu.edu (Gerry Creager) Date: Tue Oct 7 01:13:46 2008 Subject: [Beowulf] Ethernet break through? In-Reply-To: References: <9FA59C95FFCBB34EA5E42C1A8573784F76C0A3@mtiexch01.mti.com> <8CC53E49-E29F-4C45-B7A9-C82E46A663D2@myri.com> <51788.192.168.1.1.1175463588.squirrel@mail.eadline.org> Message-ID: <4612518E.9050003@tamu.edu> Peter St. John wrote: > The one node refusing to send the doc, and the other note receiving it > anyway, cracked me the *** up! Thanks > > P.S. the Google April Fool's actually got me thinking (network via > plumbing). Water conducts acoustics real well. So a free peer-to-peer > network within a city, or some counties, would be easy and require no > new infrastructure. I imagine the bandwidth would be weak and certainly > there'd be a problem getting between cities, but I don't think the water > and sewer utiltiies claim rights to the acoustic bandwidth (unlike > hijacking phone lines with stuff that would interfere with existing > telephony) so all free. In general, water conduction requires a continuous column of water unless you're willing to overdrive the signal to allow it to modulate an intermediate air column. The presence of "solids" also modifies the index of refraction and can induce standing waves and cause you to flush your signal to noise ratio. Also, the use of lift stations to continue a sanitary sewer system's contents toward their ultimate destination (A water treatment plant [central routing factility?]) would require significant effort in order to synchronize the lift station's mechanical interfaces with the modulated waveforms in the sewer pipes. For these reasons, use of a sanitary sewer system is likely impractical. Of course, one DOES have a continuous column of water in city distribution systems. This opens all sorts of possibilities for other networking benefits. First, however, let's review some of the drawbacks: 1. Most water distribution pipes are buried at some point in their transit from reservoir to user. This tends to dampen their movement and that of their relatively incompressible content, by decreasing the pipe's ability to expand and contract. Thus, dynamic range is reduced. 2. Except for those about to rupture, copper water pipe tends to be relatively rigid, and its length tends to act as a low-pass filter. Combined with the multitude of joints in 10-foot pipe sections, and ignoring for the moment the potential to use longer copper tubing extruded and packaged as 50 foot rolls, this would represent a high-order filter with a high Q value. Calculating the resonant frequency of such a filter is left to the student. 3. A large number of newer construction commercial and residential structures, as well as those which have undergone plumbing renovation in the last 10 years or so, will have internal piping of CPVC or PEX origin. The mechanical interface to the copper transmission line... er, piping... would require significant impedance matching to diminish an additional standing wave introduction that could dampen audio frequency amplitude below the threshold of detection. 4. Water mains made of virtually any material are effectively r