From hahn at mcmaster.ca Tue Jan 1 12:01:12 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Building a 2 node cluster using mpich In-Reply-To: <2D1ECDD5-85D7-4A02-B49F-3BEE4D9CCB93@staff.uni-marburg.de> References: <2D1ECDD5-85D7-4A02-B49F-3BEE4D9CCB93@staff.uni-marburg.de> Message-ID: >> 4.Finally, create identical user accounts on each node. In our case, >> we create the user DevArticle on each node in our cluster. You can >> either create the identical user accounts during installation, or you >> can use the adduser command as root. > > better use NIS (or LDAP). So you only have to define the users once. for a small cluster, LDAP is overkill (and NIS is, afaik, still insecure). it's much easier to either have a single, shared NFS root (so /etc/{passwd,shadow,group} are inherently in sync) or else just periodically rsync these files from a master node to all others. From tcarroll at ursinus.edu Thu Jan 3 18:43:29 2008 From: tcarroll at ursinus.edu (Thomas Carroll) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Socket AM2 Opterons?? and comments requested... Message-ID: <1199414609.31014.23.camel@Inverness> Hi, I recently posted a potential build for my new cluster nodes and got some great advice (thanks especially to Bill and Mark). I've done some additional research and concluded that AMD will likely give the best performance for my code and also the best bang for the buck. AMD also has many more appealing mobo options. Here's my current prospective node: AMD Athlon 64 X2 6400+ Windsor 3.2GHz Socket AM2 125W Dual-Core G.SKILL 4GB(2 x 2GB) 240-Pin DDR2 SDRAM DDR2 800 (PC2 6400) GIGABYTE GA-M61P-S3 AM2 NVIDIA GeForce 6100 ATX AMD Motherboard Rosewill R804BK Black Steel ATX Mid Tower Computer Case 300W I've scored some free hard drives and dvd drives so that I can start diskful and get simulating before I figure out exactly how I want to do diskless. I'll also be going with GigE and doing the best I can; it seems others have had success with ScaLAPACK and GigE for my type of application. (Hopefully a small grant in the near future will allow an upgrade of the network - myrinet is beyond my budget right now.) My main question (besides throwing this configuration out there for comments) is about the CPU. I noticed on newegg that there are a few socket AM2 dual core opterons (the Santa Ana). Does anyone have any experience with these? There are only a few (most of the opterons seem to be socket F and the socket F mobos are far too expensive). The opteron seems to be a popular cluster choice - any thoughts on whether these would be superior to my choice above? Again, thanks everyone for the help! -tom From andrew at moonet.co.uk Fri Jan 4 04:50:37 2008 From: andrew at moonet.co.uk (andrew holway) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Virtual resource manager Message-ID: Hi, Id like to find out if there are any projects out there to develop a resource manager that can control a virtual cluster. We would like to explore the idea of using xen to deploy operating systems on nodes, checkpoint jobs and deploy MS ccs. Primarily interested in open source initiatives. If anyone has heard anything like this please let me know. Cheers Andy From ascheinine at tuffmail.us Fri Jan 4 05:35:55 2008 From: ascheinine at tuffmail.us (Alan Louis Scheinine) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <477E363B.1020901@tuffmail.us> Andrew Holway wrote: > I'd like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open source > initiatives. A good question. I don't know the answer, nonetheless I would like to mention one point of view. From what I've seen with LSF and SGE, they expect to have a certain set of computers with specific names. In contrast, with Xen that number of computers with different names and different addresses is arbitrary. But on the other hand, if you want to balance the computational load, you need to know the number of actual processors. I realize that you asked about a "resource manager" which does not necessarily imply load balancing. Nevertheless, to focus on the load balancing aspect, it seems practical to have a batch system that is based on actual computers so that the job manager knows how much real resources have been given, then for a Xen-based job a set of computers runs a (parallel) job starts as a script that creates the Xen processes and when finished returns the nodes to the job queue pool as non-virtual machines. I don't know what is available for others to use, we are developing something in-house. It is not simple because the Xen processes will run parallel jobs, so they may need a NIS server and DNS server for their specialized names and users. Moreover, for security in a Grid computing environment, each collection of Xen processes for a parallel job will have its own VLAN. So the script that starts the Xen collection needs to also change the Ethernet switch. I look forward to reading suggestions from other Beowulf list members. Best regards, Alan Scheinine -- Centro di Ricerca, Sviluppo e Studi Superiori in Sardegna Center for Advanced Studies, Research, and Development in Sardinia Postal Address: | Physical Address for FedEx, UPS, DHL: --------------- | ------------------------------------- Alan Scheinine | Alan Scheinine c/o CRS4 | c/o CRS4 C.P. n. 25 | Loc. Pixina Manna Edificio 1 09010 Pula (Cagliari), Italy | 09010 Pula (Cagliari), Italy Email: scheinin@crs4.it Phone: 070 9250 238 [+39 070 9250 238] Fax: 070 9250 216 or 220 [+39 070 9250 216 or +39 070 9250 220] Operator at reception: 070 9250 1 [+39 070 9250 1] Mobile phone: 347 7990472 [+39 347 7990472] From csamuel at vpac.org Fri Jan 4 15:22:53 2008 From: csamuel at vpac.org (Chris Samuel) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Virtual resource manager In-Reply-To: <133368502.3801199488840255.JavaMail.root@zimbra.vpac.org> Message-ID: <232523152.3821199488973683.JavaMail.root@zimbra.vpac.org> Hi Andrew, ----- "andrew holway" wrote: > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open > source initiatives. How about Moab ? Not open source, but it does seem like it can do a fair bit of what you want.. http://www.clusterresources.com/products/mwm/docs/5.6resourceprovisioning.shtml > Enabling provisioning consists of configuring an interface to a > provisioning manager, specifying which nodes can take advantage > of this service, and what the estimated cost and duration of > each change will be. This interface can be used to contact a > system such as System Imager, XCat, Xen, RedCarpet or NIM or > to contact a locally developed system via a script or web service. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From gmpc at sanger.ac.uk Sat Jan 5 06:08:25 2008 From: gmpc at sanger.ac.uk (Guy Coates) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <477F8F59.3060709@sanger.ac.uk> andrew holway wrote: > Hi, > > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. You might want to take a look at openqrm as a starting point; http://www.openqrm.org/ It allows you to dynamically provision virtual or real machine images onto physical hardware. It will also grow or shrink the pool of virtual machines in response to changes in "load". Cheers, Guy -- Dr Guy Coates, Informatics System Group The Wellcome Trust Sanger Institute, Hinxton, Cambridge, CB10 1HH, UK Tel: +44 (0)1223 834244 ex 6925 Fax: +44 (0)1223 496802 -- The Wellcome Trust Sanger Institute is operated by Genome Research Limited, a charity registered in England with number 1021457 and a company registered in England with number 2742969, whose registered office is 215 Euston Road, London, NW1 2BE. From forum.san at gmail.com Fri Jan 4 23:59:41 2008 From: forum.san at gmail.com (Sangamesh B) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] For only NAMD users Message-ID: Hi All, I installed NAMDCharm2.6 on AMD64 dual core dual processor with gcc and MPICH2. I don't know the science behind this application. As a HPC support engineer, I've to test it our cluster hardware. If any member of this Mailing list used NAMD please let me know how to run it and where I can get the input files. regards, Sangamesh HPC Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080105/76c26415/attachment.html From wrankin at ee.duke.edu Sat Jan 5 18:39:03 2008 From: wrankin at ee.duke.edu (Bill Rankin) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] For only NAMD users In-Reply-To: References: Message-ID: <7DE7AF46-D73F-4BE5-90F2-A13D06661D06@ee.duke.edu> On the NAMD website: http://www.ks.uiuc.edu/Research/namd/ If you look through the release notes: http://www.ks.uiuc.edu/Research/namd/2.6/notes.html towards the bottom of the document they have a note on running NAMD with some simple input files. Hope this helps, -bill On Jan 5, 2008, at 2:59 AM, Sangamesh B wrote: > > Hi All, > > > I installed NAMDCharm2.6 on AMD64 dual core dual processor with > gcc and MPICH2. > > I don't know the science behind this application. > > As a HPC support engineer, I've to test it our cluster hardware. > > If any member of this Mailing list used NAMD please let me know > how to run it and where I can get the input files. > > regards, > Sangamesh > HPC Engineer > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf From john.leidel at gmail.com Mon Jan 7 08:14:09 2008 From: john.leidel at gmail.com (John Leidel) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] quad-socket opteron memory performance Message-ID: <1199722449.13428.40.camel@e521.site> Does anyone have any recent memory performance numbers [specifically latency] from the quad-socket opteron's? --john From raysonlogin at gmail.com Mon Jan 7 09:41:05 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Virtual resource manager In-Reply-To: References: Message-ID: <73a01bf20801070941n91ffa5bk72b101b8a7cda65f@mail.gmail.com> On Jan 4, 2008 7:50 AM, andrew holway wrote: > Id like to find out if there are any projects out there to develop a > resource manager that can control a virtual cluster. We would like to > explore the idea of using xen to deploy operating systems on nodes, > checkpoint jobs and deploy MS ccs. Primarily interested in open source > initiatives. Take a look at this paper: "Xen and the Art of Cluster Scheduling". It integrates SGE (Sun Grid Engine) and Xen, and creates XGE (Xen Grid Engine): http://ds.informatik.uni-marburg.de/de/publications/pdf/Xen%20and%20the%20Art%20of%20Cluster%20Scheduling.pdf And Sun has another open source project called Open xVM. xVM Server is a hypervisor based on Xen and xVM Ops Center allows provisioning of cluster nodes. http://openxvm.org/ http://en.wikipedia.org/wiki/Sun_xVM xVM is used to deploy cluster nodes at the Ranger supercomputer at TACC. With 3,936 nodes and 16 cores per node: http://blogs.sun.com/stevewilson/entry/xvm_at_tacc I believe you would be able to get some feedback from both the SGE and the OpenxVM projects... SGE homepage: http://gridengine.sunsource.net/ Rayson > > If anyone has heard anything like this please let me know. > > Cheers > > Andy > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From ntmoore at gmail.com Mon Jan 7 14:39:11 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Multiple NIC on a node Message-ID: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> Hi, I've aquired a few clusternodes that have multiple ethernet jacks. One the system-config-network applet I see several different adaptors (eg eth0 and eth1). Right now, I've got a many more free ports on my switch than I have nodes in my cluster, so I'm wondering if there's some performance benefit from hooking up the second NIC. Do any of you have a tutorial on multiple NIC's per compute node that you'd be willing to share? I'm assigning static IP's with named, and cocurrently maintaining /etc/hosts files on each machine with the full cluster map. Do I "just" assign a secon IP address for the second eth1 jack? Is there more to it? Nathan Moore -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080107/8c4ffa93/attachment.html From gdjacobs at gmail.com Mon Jan 7 15:15:47 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> Message-ID: <4782B2A3.1090205@gmail.com> Nathan Moore wrote: > Hi, > > I've aquired a few clusternodes that have multiple ethernet jacks. One > the system-config-network applet I see several different adaptors (eg > eth0 and eth1). Right now, I've got a many more free ports on my switch > than I have nodes in my cluster, so I'm wondering if there's some > performance benefit from hooking up the second NIC. > > Do any of you have a tutorial on multiple NIC's per compute node that > you'd be willing to share? I'm assigning static IP's with named, and > cocurrently maintaining /etc/hosts files on each machine with the full > cluster map. Do I "just" assign a secon IP address for the second eth1 > jack? Is there more to it? Is your switch capable of trunking, or can it be configured into multiple VLANs? -- Geoffrey D. Jacobs From ntmoore at gmail.com Mon Jan 7 16:38:38 2008 From: ntmoore at gmail.com (Nathan Moore) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> Message-ID: <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> I don't know. Its a 24 port cisco that I got from our local network admin. Nathan On Jan 7, 2008 5:15 PM, Geoff Jacobs wrote: > Nathan Moore wrote: > > Hi, > > > > I've aquired a few clusternodes that have multiple ethernet jacks. One > > the system-config-network applet I see several different adaptors (eg > > eth0 and eth1). Right now, I've got a many more free ports on my switch > > than I have nodes in my cluster, so I'm wondering if there's some > > performance benefit from hooking up the second NIC. > > > > Do any of you have a tutorial on multiple NIC's per compute node that > > you'd be willing to share? I'm assigning static IP's with named, and > > cocurrently maintaining /etc/hosts files on each machine with the full > > cluster map. Do I "just" assign a secon IP address for the second eth1 > > jack? Is there more to it? > > Is your switch capable of trunking, or can it be configured into > multiple VLANs? > > -- > Geoffrey D. Jacobs > > -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -- - - - - - - - - - - - - - - - - - - - - - Nathan Moore Assistant Professor, Physics Winona State University AIM: nmoorewsu - - - - - - - - - - - - - - - - - - - - - -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080107/1de1f1e4/attachment.html From gdjacobs at gmail.com Tue Jan 8 05:10:25 2008 From: gdjacobs at gmail.com (Geoff Jacobs) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> Message-ID: <47837641.9010707@gmail.com> Nathan Moore wrote: > I don't know. Its a 24 port cisco that I got from our local network admin. > > Nathan Timothy Mattox describes the network engineering challenges wrt switches far better than I could: http://www.beowulf.org/archive/2001-March/002760.html You're going to have to make a decision on what strategy to follow, and part of that decision is going to be informed by the performance characteristics of your application, as well as the networking hardware your cluster will be equipped with. So, if your application does a great deal of file I/O on the nodes, you might consider implementing a service network through the second network ports. However, if your application needs more total bandwidth on a single network, you will want to go with channel bonding. If the driver(s) for your network ports do not play well with the channel bonding interface, you will have to go with another option, or buy some different network cards. Also, depending on the capability of the switch, you might have to buy a second (dumb) switch to make your plans work. -- Geoffrey D. Jacobs From hahn at mcmaster.ca Tue Jan 8 08:40:04 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat Jul 19 01:06:47 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <47837641.9010707@gmail.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: > So, if your application does a great deal of file I/O on the nodes, you > might consider implementing a service network through the second network > ports. this is convenient, since if you use two nics with different subnets, traffic will be segregated and non-interfering. > However, if your application needs more total bandwidth on a > single network, you will want to go with channel bonding. If the besides being slightly trickier to configure, it also only gives you higher _aggregate_ bandwidth. any single flow between a pair of IPs will not be faster than a single link. bonding/teaming/link-aggregation is mainly useful for inter-switch links and hosts like a fileserver which are effectively a hotspot and can take advantage of multiple concurrent flows (again, where each flow is no faster than 1 link.) in other words, there's no standard "raid0 of nics" ;) From peter.st.john at gmail.com Tue Jan 8 10:12:33 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: Mark, I don't get it? I would have thought that if a large package were split between two NICs with two cables, then assuming the buffering and recombination at each end to be faster than the transmission, then the transmission would be faster than over a single cable? You don't mean that the router must be a bottleneck, by giving necessarily only one pathway to a pair of IPs? Probably I'm missing something about what is would be meant by "(merely) aggregate bandwidth"? Thanks, Peter On Jan 8, 2008 11:40 AM, Mark Hahn wrote: > > So, if your application does a great deal of file I/O on the nodes, you > > might consider implementing a service network through the second network > > ports. > > this is convenient, since if you use two nics with different subnets, > traffic will be segregated and non-interfering. > > > However, if your application needs more total bandwidth on a > > single network, you will want to go with channel bonding. If the > > besides being slightly trickier to configure, it also only gives you > higher _aggregate_ bandwidth. any single flow between a pair of IPs > will not be faster than a single link. bonding/teaming/link-aggregation > is mainly useful for inter-switch links and hosts like a fileserver > which are effectively a hotspot and can take advantage of multiple > concurrent flows (again, where each flow is no faster than 1 link.) > > in other words, there's no standard "raid0 of nics" ;) > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080108/38b127b4/attachment.html From patrick at myri.com Tue Jan 8 10:29:46 2008 From: patrick at myri.com (Patrick Geoffray) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> Message-ID: <4783C11A.3050308@myri.com> Peter St. John wrote: > I don't get it? I would have thought that if a large package were split > between two NICs with two cables, then assuming the buffering and > recombination at each end to be faster than the transmission, then the > transmission would be faster than over a single cable? You don't mean that The problem is ordering of packets and TCP. When you send a single TCP stream over two (or more) paths, then some packets will arrive out-of-order at the destination. TCP really does not like out-of-order packets and performance takes a (big) hit. That's why most channel bonding mechanisms balance multiple streams over multiple NICs and send each stream on a single NIC. Other protocols than TCP may not have this problem if they don't require strict ordering for performance. Patrick From peter.st.john at gmail.com Tue Jan 8 10:56:11 2008 From: peter.st.john at gmail.com (Peter St. John) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: <4783C11A.3050308@myri.com> References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> <4783C11A.3050308@myri.com> Message-ID: One could use the ...I'm thinking of the extra-big-packet size in IP6. But if you have small numbers of large datasets, you could increase your perceived bandwidth with two NICs and larger packets, maybe by using some protocol other than TCP? Thanks, Peter On Jan 8, 2008 1:29 PM, Patrick Geoffray wrote: > Peter St. John wrote: > > I don't get it? I would have thought that if a large package were split > > between two NICs with two cables, then assuming the buffering and > > recombination at each end to be faster than the transmission, then the > > transmission would be faster than over a single cable? You don't mean > that > > The problem is ordering of packets and TCP. When you send a single TCP > stream over two (or more) paths, then some packets will arrive > out-of-order at the destination. TCP really does not like out-of-order > packets and performance takes a (big) hit. > > That's why most channel bonding mechanisms balance multiple streams over > multiple NICs and send each stream on a single NIC. Other protocols than > TCP may not have this problem if they don't require strict ordering for > performance. > > Patrick > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080108/404a9d51/attachment.html From patrick at myri.com Tue Jan 8 11:20:09 2008 From: patrick at myri.com (Patrick Geoffray) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Multiple NIC on a node In-Reply-To: References: <6009416b0801071439n7fb8d2e3iefc1ebea13fdf68e@mail.gmail.com> <4782B2A3.1090205@gmail.com> <6009416b0801071637m1083dc33w7f1dbc169e8b3df5@mail.gmail.com> <6009416b0801071638s40fa758el1d7aaec319323332@mail.gmail.com> <47837641.9010707@gmail.com> <4783C11A.3050308@myri.com> Message-ID: <4783CCE9.1060809@myri.com> Peter St. John wrote: > One could use the ...I'm thinking of the extra-big-packet size in IP6. But > if you have small numbers of large datasets, you could increase your > perceived bandwidth with two NICs and larger packets, maybe by using some > protocol other than TCP? If you don't drop packets, UDP is the simplest solution. However, you will always lose a packet at some point, so you will need a reliable protocol. You can build one on top of UDP at the host level, or you can do your own wire protocol on Ethernet. SCTP has some support for multipath out of the box last time I looked. Patrick From supercomputer at gmail.com Wed Jan 9 08:07:20 2008 From: supercomputer at gmail.com (Chris Vaughan) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Why need of a scheduler?? In-Reply-To: <428810f20711290514o1912c265q2f7fdd7dd45fa960@mail.gmail.com> References: <428810f20711290514o1912c265q2f7fdd7dd45fa960@mail.gmail.com> Message-ID: <216ee070801090807m6d0abb29n66f35ad126e07ff5@mail.gmail.com> On Nov 29, 2007 1:14 PM, amjad ali wrote: > Hello all, > > I want to develop and run my parallel code (MPI based) on a Beowulf > cluster. I have no problem as such that many user might log on to the > cluster simultaneously. Suppose that I am free to use cluster dedicatedly > for my single parallel application. > > 1) Do I really need a cluster scheduler installed on the cluster? Should I > use scheduler? > Yes, it makes things easier to control and keep track of. > > 2) Is there any effect/benefit on the running of a parallel code with or > without cluster job scheduler? > It depends how many jobs/nodes you run on a 4 node system you would be fine running something like torque with pbs_sched. If your requirements become higher I'd recommend Maui and for those complex environments with many cores/nodes I'd recommend Moab. The benefit is ease of use, the more jobs you run the harder it is to manage those jobs. > > 3) How you differentiate between cluster scheduler and cluster resource > manager? > One schedules the other gives back information about what resources are available. > > 4) If there is any significant difference between a scheduler and manager > then plaese tell me that which of the fall in which category: > > OpenPBS, PBS Professional, SGE, Maui, Moab, Torque, Scyld, LSF, SLURM etc. > Torque=Resource Manager (RM) w/basic scheduling PBS=(RM) w/some scheduling functionality SGE=(RM) w/some scheduling functionality Maui=Scheduler Moab=Scheduler + More OpenPBS=Use Torque LSF=(RM) w/some scheduling functionality SLURM=(RM) w/basic scheduling A resource manager manages resources where as a scheduler can schedule these resources. Although something like torque resource manager (OpenPBS) has pbs _sched a fifo scheduler it is still in-adequate in most environments and you would need a scheduler such as Maui or Moab to schedule it. > > 5) What is maent by " PBS/SGE/LSF supports integration with the Maui > scheduler? > You have this mixed up with Moab, Moab can talk to all of these resource managers and give you a single point of job submission/administration over all resource managers. Cluster Resources provide free support to eval Moab which can be quite handy http://www.clusterresources.com/pages/products/evaluate.php > > Precise, easy and brief reply requested. Thanks to all. > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > -- ------------------------------ Christopher Vaughan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080109/8892874d/attachment.html From tom.elken at qlogic.com Fri Jan 11 10:13:58 2008 From: tom.elken at qlogic.com (Tom Elken) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] quad-socket opteron memory performance In-Reply-To: <1199722449.13428.40.camel@e521.site> References: <1199722449.13428.40.camel@e521.site> Message-ID: <6DB5B58A8E5AB846A7B3B3BFF1B4315A019A88D7@AVEXCH1.qlogic.org> > -----Original Message----- > From: beowulf-bounces@beowulf.org > [mailto:beowulf-bounces@beowulf.org] On Behalf Of John Leidel > Sent: Monday, January 07, 2008 8:14 AM > To: beowulf@beowulf.org > Subject: [Beowulf] quad-socket opteron memory performance > > Does anyone have any recent memory performance numbers [specifically > latency] from the quad-socket opteron's? On a quad-socket system from a major system vendor with 4x Opteron 2218 (Rev. F, dual-core, 2.6 GHz), I measure 90 - 92 nsec for memory latency, 5.5 GB/s for serial STREAM performance, and 17 GB/s for OpenMP STREAM w/ 8 threads, on 4 sockets. -Tom > > --john > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) > visit http://www.beowulf.org/mailman/listinfo/beowulf > From orion at cora.nwra.com Fri Jan 11 10:19:41 2008 From: orion at cora.nwra.com (Orion Poplawski) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] SSE4 benefits? Message-ID: <4787B33D.5050905@cora.nwra.com> Does anyone have a feel for what the benefits of SSE4 are? What kind of codes, compilers take advantage of it? Thanks! -- Orion Poplawski Technical Manager 303-415-9701 x222 NWRA/CoRA Division FAX: 303-415-9702 3380 Mitchell Lane orion@cora.nwra.com Boulder, CO 80301 http://www.cora.nwra.com From jpilldev at gmail.com Sun Jan 13 17:11:54 2008 From: jpilldev at gmail.com (J Pill) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Problem with a simple MPI Program Message-ID: Hello. I'm trying to run a simple hello word program: #include "mpi.h" #include int main (argc, argv) int argc; char **argv; { MPI_Init (&argc, &argv); printf ("hello word\n"); MPI_Finalize(); return 0; } I compile with mpicc and there's no problem, but when i try to run with mpiexec or mpirun y have the folliwing: $ mpirun -np 2 hello problem with execution of hello on DebianJPill: [Errno 2] No such file or directory problem with execution of hello on DebianJPill: [Errno 2] No such file or directory But running the file generated there no problem. $./hello What i have doing wrong? thanks a lot -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080113/5c107457/attachment.html From mengkuan at sxven.com Sun Jan 13 20:03:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console Message-ID: Greetings, I would like to announce the availability of VMC (Virtual Machine Console). VMC is an attempt to provide an opensource, web-based VM management infrastructure. It uses libvirt as the underlying library to manage para-virtualized Xen VMs. In time we intend to scale this to manage VM clusters running HPC applications. You can find out more on our "Introduction to VMC" page: http://www.sxven.com/vmc List of current features and future plans: http://www.sxven.com/vmc/features To get started, we have made available a "VMC Install" document: http://www.sxven.com/vmc/gettingstarted We invite people to take a look at VMC and tell us what you like and what you don't like. If you have any problems, questions or suggestions please feel free to contact us at dev@sxven.com or post them on our forum: http://forum.sxven.com/ Best regards, Meng Kuan From csamuel at vpac.org Sun Jan 13 22:17:07 2008 From: csamuel at vpac.org (Chris Samuel) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Problem with a simple MPI Program In-Reply-To: Message-ID: <1801047904.42861200291427379.JavaMail.root@zimbra.vpac.org> ----- "J Pill" wrote: > Hello. Hiya, > I compile with mpicc and there's no problem, but when i try to run > with mpiexec or mpirun y have the folliwing: > > $ mpirun -np 2 hello > problem with execution of hello on DebianJPill: [Errno 2] No such file > or directory > problem with execution of hello on DebianJPill: [Errno 2] No such file > or directory You probably just need to do: mpirun -np 2 ./hello (or with the full path) to make the location explicit as it will just be searching your $PATH otherwise (and you don't want . in your $PATH).. cheers, Chris -- Christopher Samuel - (03) 9925 4751 - Systems Manager The Victorian Partnership for Advanced Computing P.O. Box 201, Carlton South, VIC 3053, Australia VPAC is a not-for-profit Registered Research Agency From jac67 at georgetown.edu Mon Jan 14 07:26:08 2008 From: jac67 at georgetown.edu (Jess Cannata) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Off Topic: HPC Training Message-ID: <478B7F10.3010603@georgetown.edu> We have some upcoming HPC/Beowulf Systems Administration training courses. We will be also be holding an Advanced Sun Grid Engine class in the next couple of months. For more information, see the following link: http://www.gridswatch.com/index.php?option=com_content&task=view&id=25&Itemid=16 -- Jess Cannata Advanced Research Computing Georgetown University 202-687-3661 From mwill at penguincomputing.com Mon Jan 14 13:51:05 2008 From: mwill at penguincomputing.com (Michael Will) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Why need of a scheduler?? In-Reply-To: <216ee070801090807m6d0abb29n66f35ad126e07ff5@mail.gmail.com> Message-ID: <433093DF7AD7444DA65EFAFE3987879C5ABA26@orca.penguincomputing.com> If you only run your application one at a time interacatively, then you don't need to deal with the overhead and complexity of a scheduler. However if you are planning to batch queue up a few runs or several different applications, then it might be worthwhile to read into torque/maui/moab and the like. You mentioned Scyld in your question below, which within the categories you where interested in is basically a ressource manager which comes with torque prebundled to allow scheduling and batch queueing. Moab/Taskmaster is then the add-on module to allow more complex scheduling. Michael ________________________________ From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org] On Behalf Of Chris Vaughan Sent: Wednesday, January 09, 2008 8:07 AM To: amjad ali Cc: beowulf@beowulf.org Subject: Re: [Beowulf] Why need of a scheduler?? On Nov 29, 2007 1:14 PM, amjad ali wrote: Hello all, I want to develop and run my parallel code (MPI based) on a Beowulf cluster. I have no problem as such that many user might log on to the cluster simultaneously. Suppose that I am free to use cluster dedicatedly for my single parallel application. 1) Do I really need a cluster scheduler installed on the cluster? Should I use scheduler? Yes, it makes things easier to control and keep track of. 2) Is there any effect/benefit on the running of a parallel code with or without cluster job scheduler? It depends how many jobs/nodes you run on a 4 node system you would be fine running something like torque with pbs_sched. If your requirements become higher I'd recommend Maui and for those complex environments with many cores/nodes I'd recommend Moab. The benefit is ease of use, the more jobs you run the harder it is to manage those jobs. 3) How you differentiate between cluster scheduler and cluster resource manager? One schedules the other gives back information about what resources are available. 4) If there is any significant difference between a scheduler and manager then plaese tell me that which of the fall in which category: OpenPBS, PBS Professional, SGE, Maui, Moab, Torque, Scyld, LSF, SLURM etc. Torque=Resource Manager (RM) w/basic scheduling PBS=(RM) w/some scheduling functionality SGE=(RM) w/some scheduling functionality Maui=Scheduler Moab=Scheduler + More OpenPBS=Use Torque LSF=(RM) w/some scheduling functionality SLURM=(RM) w/basic scheduling A resource manager manages resources where as a scheduler can schedule these resources. Although something like torque resource manager (OpenPBS) has pbs _sched a fifo scheduler it is still in-adequate in most environments and you would need a scheduler such as Maui or Moab to schedule it. 5) What is maent by " PBS/SGE/LSF supports integration with the Maui scheduler? You have this mixed up with Moab, Moab can talk to all of these resource managers and give you a single point of job submission/administration over all resource managers. Cluster Resources provide free support to eval Moab which can be quite handy http://www.clusterresources.com/pages/products/evaluate.php Precise, easy and brief reply requested. Thanks to all. _______________________________________________ Beowulf mailing list, Beowulf@beowulf.org To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- ------------------------------ Christopher Vaughan -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080114/3a5a58c6/attachment.html From deadline at eadline.org Wed Jan 16 05:19:02 2008 From: deadline at eadline.org (Douglas Eadline) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: Message-ID: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> While your project looks interesting and I like the idea of VMs, however I have not seen a good answer to the fact that VM = layers and in HPC layers = latency. Any thoughts? Also, is it open source? -- Doug > Greetings, > > I would like to announce the availability of VMC (Virtual Machine > Console). VMC is an attempt to provide an opensource, web-based VM > management infrastructure. It uses libvirt as the underlying library > to manage para-virtualized Xen VMs. In time we intend to scale this to > manage VM clusters running HPC applications. > > You can find out more on our "Introduction to VMC" page: > > http://www.sxven.com/vmc > > List of current features and future plans: > > http://www.sxven.com/vmc/features > > To get started, we have made available a "VMC Install" document: > > http://www.sxven.com/vmc/gettingstarted > > We invite people to take a look at VMC and tell us what you like and > what you don't like. If you have any problems, questions or > suggestions please feel free to contact us at dev@sxven.com or post > them on our forum: > > http://forum.sxven.com/ > > Best regards, > Meng Kuan > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:478b0403265441923983023! > -- Doug From geoff at galitz.org Wed Jan 16 05:39:02 2008 From: geoff at galitz.org (Geoff) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: I certainly cannot speak for the VMC project, but application migration and fault tolerance (the primary benefits other than easy access to heterogeneus environments from VMs) are always going to result in a peformance hit of some kind. You cannot expect to do more things with no overhead. There is great value in introducing HA concepts into an HPC cluster depending on the goals and configuration of the cluster in question (as always). I cannot count the number of times a long running job (weeks) crashed, bumming me out as a result, even with proper checkpointing routines integrated into the code and/or system. As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to all queues of all clusters he managed in order to force researchers to think about checkpoints and smart restarts. I couldn't understand why so many folks from his particular unit kept asking me about arrays inside the scheduler submission scripts and nested commends until I found that out. Unfortunately I came to the conclusion that folks in his unit were spending more time writing job submission scripts than code... well... maybe that is an exaggeration. -geoff Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : > While your project looks interesting and I like the idea of > VMs, however I have not seen a good answer to the fact that VM = layers > and in HPC layers = latency. Any thoughts? Also, is it open source? > > -- > Doug > > >> Greetings, >> >> I would like to announce the availability of VMC (Virtual Machine >> Console). VMC is an attempt to provide an opensource, web-based VM >> management infrastructure. It uses libvirt as the underlying library >> to manage para-virtualized Xen VMs. In time we intend to scale this to >> manage VM clusters running HPC applications. >> >> You can find out more on our "Introduction to VMC" page: >> >> http://www.sxven.com/vmc >> >> List of current features and future plans: >> >> http://www.sxven.com/vmc/features >> >> To get started, we have made available a "VMC Install" document: >> >> http://www.sxven.com/vmc/gettingstarted >> >> We invite people to take a look at VMC and tell us what you like and >> what you don't like. If you have any problems, questions or >> suggestions please feel free to contact us at dev@sxven.com or post >> them on our forum: >> >> http://forum.sxven.com/ >> >> Best regards, >> Meng Kuan >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> !DSPAM:478b0403265441923983023! >> > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From deadline at eadline.org Wed Jan 16 06:18:20 2008 From: deadline at eadline.org (Douglas Eadline) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> I get the desire for fault tolerance etc. and I like the idea of migration. It is just that many HPC people have spent careers getting applications/middleware as close to the bare metal as possible. The whole VM concept seems orthogonal to this goal. I'm curious how people are approaching this problem. -- Doug > > > I certainly cannot speak for the VMC project, but application migration > and fault tolerance (the primary benefits other than easy access to > heterogeneus environments from VMs) are always going to result in a > peformance hit of some kind. You cannot expect to do more things with no > overhead. There is great value in introducing HA concepts into an HPC > cluster depending on the goals and configuration of the cluster in > question (as always). > > I cannot count the number of times a long running job (weeks) crashed, > bumming me out as a result, even with proper checkpointing routines > integrated into the code and/or system. > > > As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to > all queues of all clusters he managed in order to force researchers to > think about checkpoints and smart restarts. I couldn't understand why so > many folks from his particular unit kept asking me about arrays inside the > scheduler submission scripts and nested commends until I found that out. > Unfortunately I came to the conclusion that folks in his unit were > spending more time writing job submission scripts than code... well... > maybe that is an exaggeration. > > -geoff > > > > Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : > >> While your project looks interesting and I like the idea of >> VMs, however I have not seen a good answer to the fact that VM = layers >> and in HPC layers = latency. Any thoughts? Also, is it open source? >> >> -- >> Doug >> >> >>> Greetings, >>> >>> I would like to announce the availability of VMC (Virtual Machine >>> Console). VMC is an attempt to provide an opensource, web-based VM >>> management infrastructure. It uses libvirt as the underlying library >>> to manage para-virtualized Xen VMs. In time we intend to scale this to >>> manage VM clusters running HPC applications. >>> >>> You can find out more on our "Introduction to VMC" page: >>> >>> http://www.sxven.com/vmc >>> >>> List of current features and future plans: >>> >>> http://www.sxven.com/vmc/features >>> >>> To get started, we have made available a "VMC Install" document: >>> >>> http://www.sxven.com/vmc/gettingstarted >>> >>> We invite people to take a look at VMC and tell us what you like and >>> what you don't like. If you have any problems, questions or >>> suggestions please feel free to contact us at dev@sxven.com or post >>> them on our forum: >>> >>> http://forum.sxven.com/ >>> >>> Best regards, >>> Meng Kuan >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >>> >>> >>> >> >> >> -- >> Doug >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf > > > > -- > ------------------------------- > Geoff Galitz, geoff@galitz.org > Blankenheim, Deutschland > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > > !DSPAM:478e094566431543480883! > -- Doug From mengkuan at sxven.com Wed Jan 16 06:31:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: On Jan 16, 2008 9:19 PM, Douglas Eadline wrote: > While your project looks interesting and I like the idea of > VMs, however I have not seen a good answer to the fact that VM = layers > and in HPC layers = latency. Any thoughts? Also, is it open source? We performed some benchmark testing with linpack and bonnie++ on the VM and on the physical host. For para-virtualized VMs, the linpack performance is on par with the physical host. However, for bonnie++ tests, para-virtualized VMs fell way behind physical host's performance. In short, CPU-bound and memory intensive HPC apps should do ok but not IO-intensive apps. More testing and fine-tuning will probably be needed to see how far we can push the VM in terms of IO-intensive operations but we are hoping that in time to come virtualization technologies will be able to narrow that gap. And yes, the VMC application is open source. You can find the download links in the VMC Install document. Regards, Meng Kuan From apittman at concurrent-thinking.com Wed Jan 16 06:35:57 2008 From: apittman at concurrent-thinking.com (Ashley Pittman) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> On Wed, 2008-01-16 at 09:18 -0500, Douglas Eadline wrote: > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. There was a paper on this at SC, I don't know if you caught it... http://sc07.supercomputing.org/schedule/event_detail.php?evid=11066 If I was to try and sum it up in one paragraph it would be: "The advantages of virtulisation are obvious but for some reason the HPC community have been slow to reap these benefits, we predict that this is because of a perception that the performance of comms and VM operations suffers when virtulised. This is true however we have demonstrated that with months of work this performance loss could be minimised such that instead of slowing down performance a lot it would only slow down performance a bit." I think progress is being made on the comms front, both in terms of raw numbers (bandwidth/latency) but also in reducing CPU usage but we are still a long way from it being widely used. Ashley, From rgb at phy.duke.edu Wed Jan 16 06:55:28 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: On Wed, 16 Jan 2008, Douglas Eadline wrote: > > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. As previously noted, however, YMMV and one size does not fit all. There are two distinct ways of managing the heterogeneous environments that some cluster applications might require. One is indeed the creation of VMs -- running an extremely thin toplevel operating system that does little else but to run the host VM and respond to provisioning requests, as is the case in many corporate HA environments. The other is to create a similar provisioning system that works at the level of e.g. grub and/or PXE to provide the ability to easily boot a node into a unique environment that might last only for the duration of a particular computation. Neither is particularly well supported in current clustering, although projects for both have been around for some time (Duke's Cluster On Demand project and wulfware being examples of one, Xen and various VMs as examples of the other). There are plenty of parallel chores that are tolerant of poor latency -- the whole world of embarrassingly parallel computations plus some extension up into merely coarse grained, not terribly synchronous real parallel computations. Remember, people did parallel computation effectively with 10Base ethernet for many years (more than a decade) before 100Base came along, and cluster nodes would now ROUTINELY be provisioned with at least 1000Base. Even a 1000Base VM is going to have better latency in most cases than a 10Base ever did on the old hardware it ran on, and it might well compete with early 100Base latencies. It isn't exactly like running in a VM is going to cripple all code. VMs can also be wonderful for TEACHING clustering and for managing "political" problems. In many environments there are potential nodes with lots of spare cycles that "have to run Windows" 24x7 and have a Windows console available at the desktop at all times (and thus cannot be dual booted) but which CAN run e.g. VMware and an "instant node" VM under Windows. Having any sort of access to a high-latency Linux VM node running on a Windows box beats the hell out of having no node at all or having to port one's code to work under Windows. We can therefore see that there are clearly environments where the bulk of the work being done is latency tolerant and where VMs may well have benefits in administration and security and fault tolerance and local politics that make them a great boon in clustering, just as there are without question computations for which latency is the devil and any suggestion of adding a layer of VM latency on top of what is already inherent to the device and minimal OS will bring out the peasants with pitchforks and torches. Multiboot systems, via grub and local provisioning or PXE and remote e.g. NFS provisioning is also useful but is not always politically possible or easy to set up. It is my hope that folks working on both sorts of multienvironment provisioning and sysadmin environments work hard and produce spectacular tools. I've done way more work than I care to setting up both of these sorts of things. It is not easy, and requires a lot of expertise. Hiding this detail and expertise from the user would be a wonderful contribution to practical clustering (and of course useful in the HA world as well). rgb > > -- > Doug > > > >> >> >> I certainly cannot speak for the VMC project, but application migration >> and fault tolerance (the primary benefits other than easy access to >> heterogeneus environments from VMs) are always going to result in a >> peformance hit of some kind. You cannot expect to do more things with no >> overhead. There is great value in introducing HA concepts into an HPC >> cluster depending on the goals and configuration of the cluster in >> question (as always). >> >> I cannot count the number of times a long running job (weeks) crashed, >> bumming me out as a result, even with proper checkpointing routines >> integrated into the code and/or system. >> >> >> As a funny aside, I once knew a sysadmin who applied 24 hour timelimits to >> all queues of all clusters he managed in order to force researchers to >> think about checkpoints and smart restarts. I couldn't understand why so >> many folks from his particular unit kept asking me about arrays inside the >> scheduler submission scripts and nested commends until I found that out. >> Unfortunately I came to the conclusion that folks in his unit were >> spending more time writing job submission scripts than code... well... >> maybe that is an exaggeration. >> >> -geoff >> >> >> >> Am 16.01.2008, 14:19 Uhr, schrieb Douglas Eadline : >> >>> While your project looks interesting and I like the idea of >>> VMs, however I have not seen a good answer to the fact that VM = layers >>> and in HPC layers = latency. Any thoughts? Also, is it open source? >>> >>> -- >>> Doug >>> >>> >>>> Greetings, >>>> >>>> I would like to announce the availability of VMC (Virtual Machine >>>> Console). VMC is an attempt to provide an opensource, web-based VM >>>> management infrastructure. It uses libvirt as the underlying library >>>> to manage para-virtualized Xen VMs. In time we intend to scale this to >>>> manage VM clusters running HPC applications. >>>> >>>> You can find out more on our "Introduction to VMC" page: >>>> >>>> http://www.sxven.com/vmc >>>> >>>> List of current features and future plans: >>>> >>>> http://www.sxven.com/vmc/features >>>> >>>> To get started, we have made available a "VMC Install" document: >>>> >>>> http://www.sxven.com/vmc/gettingstarted >>>> >>>> We invite people to take a look at VMC and tell us what you like and >>>> what you don't like. If you have any problems, questions or >>>> suggestions please feel free to contact us at dev@sxven.com or post >>>> them on our forum: >>>> >>>> http://forum.sxven.com/ >>>> >>>> Best regards, >>>> Meng Kuan >>>> _______________________________________________ >>>> Beowulf mailing list, Beowulf@beowulf.org >>>> To change your subscription (digest mode or unsubscribe) visit >>>> http://www.beowulf.org/mailman/listinfo/beowulf >>>> >>>> >>>> >>> >>> >>> -- >>> Doug >>> _______________________________________________ >>> Beowulf mailing list, Beowulf@beowulf.org >>> To change your subscription (digest mode or unsubscribe) visit >>> http://www.beowulf.org/mailman/listinfo/beowulf >> >> >> >> -- >> ------------------------------- >> Geoff Galitz, geoff@galitz.org >> Blankenheim, Deutschland >> >> _______________________________________________ >> Beowulf mailing list, Beowulf@beowulf.org >> To change your subscription (digest mode or unsubscribe) visit >> http://www.beowulf.org/mailman/listinfo/beowulf >> >> !DSPAM:478e094566431543480883! >> > > > -- > Doug > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From gerry.creager at tamu.edu Wed Jan 16 07:25:13 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> Message-ID: <478E21D9.60900@tamu.edu> Ashley Pittman wrote: > On Wed, 2008-01-16 at 09:18 -0500, Douglas Eadline wrote: >> I get the desire for fault tolerance etc. and I like the idea >> of migration. It is just that many HPC people have spent >> careers getting applications/middleware as close to the bare >> metal as possible. The whole VM concept seems orthogonal to >> this goal. I'm curious how people are approaching this >> problem. > > There was a paper on this at SC, I don't know if you caught it... > > http://sc07.supercomputing.org/schedule/event_detail.php?evid=11066 > > If I was to try and sum it up in one paragraph it would be: > > "The advantages of virtulisation are obvious but for some reason the HPC > community have been slow to reap these benefits, we predict that this is > because of a perception that the performance of comms and VM operations > suffers when virtulised. This is true however we have demonstrated that > with months of work this performance loss could be minimised such that > instead of slowing down performance a lot it would only slow down > performance a bit." > > I think progress is being made on the comms front, both in terms of raw > numbers (bandwidth/latency) but also in reducing CPU usage but we are > still a long way from it being widely used. I'm constantly reminded of a meeting early on in the SCOOP project, which I participate in (http://scoop.sura.org). "We're able to virtualize our model applications using VMware and only see a 13% performance hit". Note that, at this time I was tweaking for ms upgrades in MPI communications.... We need to look at virtualization as a means of mitigating, on a heterogeneous hardware environment, the concept of porting to every different available machine type. In other words, I think that for a grid environment, we might see a lot of benefit for virtualization but for a local, homogeneous, cluster, it's less an issue. By the way: In order to compensate for their "13%" degradation, I had to nearly double the number of virtual nodes over real nodes to get the same performance data. That's "expensive" but very do-able on a grid environment. gerry -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From laytonjb at charter.net Wed Jan 16 07:31:11 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> Message-ID: <478E233F.9080103@charter.net> Douglas Eadline wrote: > I get the desire for fault tolerance etc. and I like the idea > of migration. It is just that many HPC people have spent > careers getting applications/middleware as close to the bare > metal as possible. The whole VM concept seems orthogonal to > this goal. I'm curious how people are approaching this > problem. > Like many things, the devil is in the details. While I don't want to be as prodigious as rgb, I want to mention a few things and ask some questions: - With multi-core processors, to get the best performance you want to assign a process to a core. But this can cause problems when moving a process or creating a checkpoint. For example VMware explicitly tells you not to do this. While I can't state their position, in general the idea is that restarting a check-pointed VM may have problems when a process is pinned to a core (even more so if the CPU is different). Also, moving a pinned process to another node may cause problems if the nodes is different in pretty much any way (it may also be affected by what's on the new node). - As Ashley pointed out, the network aspect is still very problematic. Getting good performance out of a NIC in a VM is not easy and from what I understand difficult or impossible to do with multi-core nodes (I would love to hear if someone has gotten very good performance out of a NIC in a VM when other VM's are also using the same NIC. Please give as many details as possible) - As Meng mentioned, IO is still problematic (I think for the same reasons that interconnects are). - I haven't seen any benchmarks run in VM's using several nodes with an interconnect. Does anyone know of any? - Has anyone tried moving processes around to different nodes for an MPI job? I'm curious what they found. I would like to see virtualization take off in HPC, but I have to see a few demos of things working and I need to see reasons why I should adopt it. Right not I don't relish taking my "High" Performance Computing system and turning it into "Kind-of-High" Performance Computing because it would allow non-code specific checkpointing or movement of processes. Losing 10% in performance, for example, in HPC is a big deal, and I haven't yet seen the benefits of virtirualization for giving up the 10% (I'm dying to be shown to be wrong though). The only aspect of virtualization that could make some sense in HPC is what rgb mentioned - allowing the user to select and OS as part of their job and installing or tearing down the OS as part of the job. I can see this being very useful if the details could be worked out (I know there are people working on it but I haven't seen any large demonstrations of it yet and I would really like to see such a beastie). Anyway, my 2 cents (and probably my last since this topic falls under Landman's Rule: of flammability). Jeff From landman at scalableinformatics.com Wed Jan 16 08:26:14 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E233F.9080103@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> Message-ID: <478E3026.2030206@scalableinformatics.com> Jeffrey B. Layton wrote: > Anyway, my 2 cents (and probably my last since this topic falls under > Landman's Rule: of flammability). uh... er ... uh .... huh ? Hey ... the coffee hasn't quite kicked in yet, and we have been pounding out DragonFly code (and it is working ... woo hoo! Jobs submit and all that) ... I saw the VMC bit and decided it wasn't worth spending time talking about it as Doug, Jeff, RGB, and others would pound it into the dirt^H^H^H^H^H^H discuss the salient aspects ... yeah thats the ticket. That and I was busy. -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From Michael.Frese at NumerEx.com Wed Jan 16 08:32:49 2008 From: Michael.Frese at NumerEx.com (Michael H. Frese) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E233F.9080103@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> Message-ID: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >- With multi-core processors, to get the best performance you want to > assign a process to a core. Excuse my ignorance, please, but can someone tell me how to do that on Linux (2.6 kernels would be fine)? The kernel scheduler -- as opposed to a cluster scheduler -- is a complete black box as far as I know. While I am it, where do I find a minimal list of processes necessary to run a cluster node. I can't see any reason to run the PC Smart Card demon, pcscd, but I don't know what else I can pitch. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/fcf9cd66/attachment.html From Michael.Frese at NumerEx.com Wed Jan 16 08:50:20 2008 From: Michael.Frese at NumerEx.com (Michael H. Frese) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3415.6040208@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> Message-ID: <6.2.5.6.2.20080116095006.04ed6cc8@NumerEx.com> Cool. Thanks. Mike At 09:43 AM 1/16/2008, Shannon V. Davidson wrote: >Michael H. Frese wrote: >>At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >>>- With multi-core processors, to get the best performance you want to >>> assign a process to a core. >> >>Excuse my ignorance, please, but can someone tell me how to do that >>on Linux (2.6 kernels would be fine)? > >sched_setaffinity(2) >taskset(1) >numactl(1) > >> >>The kernel scheduler -- as opposed to a cluster scheduler -- is a >>complete black box as far as I know. >> >>While I am it, where do I find a minimal list of processes >>necessary to run a cluster node. I can't see any reason to run the >>PC Smart Card demon, pcscd, but I don't know what else I can pitch. >> >> >>Mike >> >> >> >> >> >>_______________________________________________ >>Beowulf mailing list, Beowulf@beowulf.org >>To change your subscription (digest mode or unsubscribe) visit >>http://www.beowulf.org/mailman/listinfo/beowulf >> -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/02ffec2a/attachment.html From bill at platform.com Wed Jan 16 08:53:39 2008 From: bill at platform.com (Bill Bryce) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console Message-ID: Try the man pages for the taskset command on Linux 2.6 machine. There are also system calls sched_setaffinity() and sched_getaffinity() Regards, Bill. -----Original Message----- From: beowulf-bounces@beowulf.org [mailto:beowulf-bounces@beowulf.org]On Behalf Of Michael H. Frese Sent: January 16, 2008 11:33 AM To: "Jeffrey B. Layton"laytonjb@charter.net Cc: beowulf@beowulf.org Subject: Re: [Beowulf] VMC - Virtual Machine Console At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: - With multi-core processors, to get the best performance you want to assign a process to a core. Excuse my ignorance, please, but can someone tell me how to do that on Linux (2.6 kernels would be fine)? The kernel scheduler -- as opposed to a cluster scheduler -- is a complete black box as far as I know. While I am it, where do I find a minimal list of processes necessary to run a cluster node. I can't see any reason to run the PC Smart Card demon, pcscd, but I don't know what else I can pitch. Mike -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/e795bb9a/attachment.html From anandvaidya.ml at gmail.com Wed Jan 16 07:21:50 2008 From: anandvaidya.ml at gmail.com (Anand Vaidya) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Question on COAMPS, WRF and NHM Message-ID: <200801162321.50928.anandvaidya.ml@gmail.com> We are in the process of acquiring a new cluster for running weather modelling software viz. NRL COAMPS, WRF and NHM (Japan) We are currently running COAMPS on a Cluster of 50+ GigE and dual socket DC Opterons, NFS, CentOS4, RAM size=1GB/core, the performance seems to be limited by I/O (network I/O primarily). The performance flattens out at about 32CPU. Looking at the budget, current hardware availability, we have narrowed down to dual socket Intel Quad Cores, with 2GB/core and DDR infiniband, and CentOS 5.x, OpenMPI 1.2.x, SGE 6.x (Or maybe we will buy faster D-DC AMDs) We did enquire with the organizations regarding suitability of these, they could only offer limited help (understandably, the orgs may not be running the configs we intend to buy) I do understand that factors such as grid size etc play a role. I am right now looking at gross factors before getting into actual test runs with different configs. I would like to to whether any users of the aforementioned software can help answer the following questions: - Does memory bandwidth (STREAMS?) have a significant impact? (Intel shared bus -vs- AMD's dedicated interconnect), since QCs worsen the shared bus loading - Is infiniband worth it? (NRL seems to think it does enhance performance), however no additional details are available. - Is a parallel filesystem (eg: Lustre, GPFS, GFS) vs NFS Regards Anand From svdavidson at charter.net Wed Jan 16 08:43:01 2008 From: svdavidson at charter.net (Shannon V. Davidson) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> Message-ID: <478E3415.6040208@charter.net> Michael H. Frese wrote: > At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >> - With multi-core processors, to get the best performance you want to >> assign a process to a core. > > Excuse my ignorance, please, but can someone tell me how to do that on > Linux (2.6 kernels would be fine)? sched_setaffinity(2) taskset(1) numactl(1) > > The kernel scheduler -- as opposed to a cluster scheduler -- is a > complete black box as far as I know. > > While I am it, where do I find a minimal list of processes necessary > to run a cluster node. I can't see any reason to run the PC Smart > Card demon, pcscd, but I don't know what else I can pitch. > > > Mike > > ------------------------------------------------------------------------ > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080116/91a95e1d/attachment.html From landman at scalableinformatics.com Wed Jan 16 09:09:55 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <478E3A63.1090703@scalableinformatics.com> Meng Kuan wrote: > We performed some benchmark testing with linpack and bonnie++ on the > VM and on the physical host. For para-virtualized VMs, the linpack > performance is on par with the physical host. However, for bonnie++ > tests, para-virtualized VMs fell way behind physical host's > performance. In short, CPU-bound and memory intensive HPC apps should > do ok but not IO-intensive apps. More testing and fine-tuning will > probably be needed to see how far we can push the VM in terms of > IO-intensive operations but we are hoping that in time to come > virtualization technologies will be able to narrow that gap. Hi Meng: Not to ignite flammable substances here ... but there are a few hallmarks of HPC applications. One of those is "beating the heck out of a specific available resource". Extra layers only add to this. What I want is thunking-free VMs. It would be really nice to take an 8 core workstation/server, run our base OS on one or two cores, and run other OSes on the other cores. The problem is that this is not easy to do with todays commodity hardware. Moreover, you pay a (sometimes huge) performance penalty for doing this, as you have single points of information flow (SPIF). These SPIFs are anathema to HPC. They are rate limiting. They can increase contention/latency, decrease effective bandwidth. I like the idea of VMs for services that need HA, and for OSes like windows that need a safe place to run in. HPC apps will stress one or the other portion of the machine. They will beat on the memory bandwidth in some cases, which is why, despite AMD Opterons of old (single/dual core) having a disadvantage in computational performance to older Xeons of woodcrest derivation, they are still faster on specific memory bound problems and code (that second memory bus is hard to beat). That said, and the point of this is that many HPC apps are rapidly becoming IO bound, as they need to move ginormous (meaning really large) amounts of data to and from disk, and MPI codes usually need to move data at the lowest latency possible. There VMs which negatively impact IO performance (bandwidth/latency) will be problematic. What would be interesting is a VM OS bypass for IO. VM talk directly to hardware. Not sure it is possible though, unless you are using a hypervisor, and a thin VM (OpenVZ?). Just some thoughts, hopefully not all that flammable (Jeff, what is that rule? I am being asked, and I don't have an answer ...) -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From gerry.creager at tamu.edu Wed Jan 16 09:39:24 2008 From: gerry.creager at tamu.edu (Gerry Creager) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Question on COAMPS, WRF and NHM In-Reply-To: <200801162321.50928.anandvaidya.ml@gmail.com> References: <200801162321.50928.anandvaidya.ml@gmail.com> Message-ID: <478E414C.10700@tamu.edu> No experience running COAMPS but for WRF I think your proposed system will work well. Memory bandwidth will play a role in preformance but file IO will also. Infiniband _is_ worth the cost/effort. I'd strongly recomment Luster/Gluster or GFS over NFS for this. gerry Anand Vaidya wrote: > We are in the process of acquiring a new cluster for running weather modelling > software viz. NRL COAMPS, WRF and NHM (Japan) > > We are currently running COAMPS on a Cluster of 50+ GigE and dual socket DC > Opterons, NFS, CentOS4, RAM size=1GB/core, the performance seems to be > limited by I/O (network I/O primarily). The performance flattens out at about > 32CPU. > > Looking at the budget, current hardware availability, we have narrowed down to > dual socket Intel Quad Cores, with 2GB/core and DDR infiniband, and CentOS > 5.x, OpenMPI 1.2.x, SGE 6.x (Or maybe we will buy faster D-DC AMDs) > > We did enquire with the organizations regarding suitability of these, they > could only offer limited help (understandably, the orgs may not be running > the configs we intend to buy) > > I do understand that factors such as grid size etc play a role. I am right now > looking at gross factors before getting into actual test runs with different > configs. > > I would like to to whether any users of the aforementioned software can help > answer the following questions: > > - Does memory bandwidth (STREAMS?) have a significant impact? (Intel shared > bus -vs- AMD's dedicated interconnect), since QCs worsen the shared bus > loading > > - Is infiniband worth it? (NRL seems to think it does enhance performance), > however no additional details are available. > > - Is a parallel filesystem (eg: Lustre, GPFS, GFS) vs NFS > > Regards > Anand > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf -- Gerry Creager -- gerry.creager@tamu.edu Texas Mesonet -- AATLT, Texas A&M University Cell: 979.229.5301 Office: 979.862.3982 FAX: 979.862.3983 Office: 1700 Research Parkway Ste 160, TAMU, College Station, TX 77843 From lindahl at pbm.com Wed Jan 16 09:53:46 2008 From: lindahl at pbm.com (Greg Lindahl) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> Message-ID: <20080116175346.GA18703@bx9.net> > >- With multi-core processors, to get the best performance you want to > > assign a process to a core. > > Excuse my ignorance, please, but can someone tell me how to do that > on Linux (2.6 kernels would be fine)? Use an MPI which does this for you? Two examples are InfiniPath MPI and OpenMPI. -- greg From laytonjb at charter.net Wed Jan 16 10:08:12 2008 From: laytonjb at charter.net (Jeffrey B. Layton) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3A63.1090703@scalableinformatics.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> Message-ID: <478E480C.5020503@charter.net> Joe Landman wrote: > Just some thoughts, hopefully not all that flammable (Jeff, what is > that rule? I am being asked, and I don't have an answer ...) Rule: (Theorem) Anything that appears to be flame-bait, actually is. Corollary: Not matter what you say, no matter how much experience you have, no matter how much evidence you have, someone will always either: (a) violently disagree with you to their death bed, inviting more posts on the subject or any other subject that appears to be flame bait. -or- (b) Misunderstand everything and post something worthless possibly inviting more posts on the subject or any other subject that appears to be flame bait. From landman at scalableinformatics.com Wed Jan 16 10:28:03 2008 From: landman at scalableinformatics.com (Joe Landman) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E480C.5020503@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> <478E480C.5020503@charter.net> Message-ID: <478E4CB3.5090202@scalableinformatics.com> Jeffrey B. Layton wrote: > Joe Landman wrote: >> Just some thoughts, hopefully not all that flammable (Jeff, what is >> that rule? I am being asked, and I don't have an answer ...) > Rule: (Theorem) > Anything that appears to be flame-bait, actually is. Ahhh.... I wonder if we can say "flame-bait is isomorphic to text editor wars, c.f. vi vs emacs". > Corollary: > Not matter what you say, no matter how much experience > you have, no matter how much evidence you have, someone > will always either: > (a) violently disagree with you to their death bed, inviting more > posts on the subject or any other subject that appears to be flame > bait. > -or- > (b) Misunderstand everything and post something worthless > possibly inviting more posts on the subject or any other subject > that appears to be flame bait. Heh... -- Joseph Landman, Ph.D Founder and CEO Scalable Informatics LLC, email: landman@scalableinformatics.com web : http://www.scalableinformatics.com http://jackrabbit.scalableinformatics.com phone: +1 734 786 8423 fax : +1 866 888 3112 cell : +1 734 612 4615 From rgb at phy.duke.edu Wed Jan 16 15:09:47 2008 From: rgb at phy.duke.edu (Robert G. Brown) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E480C.5020503@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> <478E480C.5020503@charter.net> Message-ID: On Wed, 16 Jan 2008, Jeffrey B. Layton wrote: Dear Jeff: > Joe Landman wrote: >> Just some thoughts, hopefully not all that flammable (Jeff, what is that >> rule? I am being asked, and I don't have an answer ...) > Rule: (Theorem) > Anything that appears to be flame-bait, actually is. > > Corollary: > Not matter what you say, no matter how much experience > you have, no matter how much evidence you have, someone > will always either: > (a) violently disagree with you to their death bed, inviting more > posts on the subject or any other subject that appears to be flame > bait. As I sit here in my comfortable bed experiencing severe chest pain, I have to tell you that you are wrong, wrong, wrong. This is not what flame-bait is. I may have to cut you with a knife. > -or- > (b) Misunderstand everything and post something worthless > possibly inviting more posts on the subject or any other subject > that appears to be flame bait. Flame bait (as all proper fishermen know) is what you get when you spill your glass of straight Everclear into the worms and then "accidentally" knock the coal of your cigar in on top as you sway gently from side to side in the boat, waiting for fish to bite. It's a variant of stink bait and cut bait -- fish don't bite on them much either. In fact, fish don't bite much. But they REALLY don't bite on flame bait. Only cluster-fanatics bite on flame bait. Usually after a tall, cool glass of Everclear and a cigar... rgb > > > > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit > http://www.beowulf.org/mailman/listinfo/beowulf > -- Robert G. Brown Phone(cell): 1-919-280-8443 Duke University Physics Dept, Box 90305 Durham, N.C. 27708-0305 Web: http://www.phy.duke.edu/~rgb Book of Lilith Website: http://www.phy.duke.edu/~rgb/Lilith/Lilith.php Lulu Bookstore: http://stores.lulu.com/store.php?fAcctID=877977 From mengkuan at sxven.com Wed Jan 16 18:31:10 2008 From: mengkuan at sxven.com (Meng Kuan) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3A63.1090703@scalableinformatics.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3A63.1090703@scalableinformatics.com> Message-ID: On Jan 17, 2008 1:09 AM, Joe Landman wrote: > That said, and the point of this is that many HPC apps are rapidly > becoming IO bound, as they need to move ginormous (meaning really large) > amounts of data to and from disk, and MPI codes usually need to move > data at the lowest latency possible. > > There VMs which negatively impact IO performance (bandwidth/latency) > will be problematic. > > What would be interesting is a VM OS bypass for IO. VM talk directly to > hardware. Not sure it is possible though, unless you are using a > hypervisor, and a thin VM (OpenVZ?). I believe Xen is working towards that. For instance, their latest release (Xen 3.2.0) has: - Preliminary PCI pass-through support (using appropriate Intel or AMD I/O-virtualisation hardware) I have read on the Xen lists that some folks have successfully increased network performance this way. OpenVZ is a possibility and it definitely is "thinner" than Xen in this aspect. This is why we are using the libvirt library which is starting to include support for containers like OpenVZ. > > Just some thoughts, hopefully not all that flammable (Jeff, what is > that rule? I am being asked, and I don't have an answer ...) Not at all. In fact, its great to hear and learn from you guys. Thanks! Regards, Meng Kuan From Craig.Tierney at noaa.gov Wed Jan 16 09:16:18 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Sat Jul 19 01:06:48 2008 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> Message-ID: <478E3BE2.8060301@noaa.gov> Geoff wrote: > ..Interesting discussion deleted.. > > As a funny aside, I once knew a sysadmin who applied 24 hour timelimits > to all queues of all clusters he managed in order to force researchers > to think about checkpoints and smart restarts. I couldn't understand > why so many folks from his particular unit kept asking me about arrays > inside the scheduler submission scripts and nested commends until I > found that out. Unfortunately I came to the conclusion that folks in > his unit were spending more time writing job submission scripts than > code... well... maybe that is an exaggeration. > Our queue limits are 8 hours. They are set this way for two reasons. First, we have real time jobs that need to get through the queues and we believe that allowing significantly longer jobs would block those really important jobs. Second, for a multi-user system, it isn't very fair for a user to run multi-day jobs and prevent shorter jobs from getting in. It is about being fair. Use the resource and then get back in line. I know that at other US Government facilities it is common practice to set sub-day queue limits. I recently helped setup one site that had queue limits set at 12 hours. Another large organization near the top of the top 500 list does this as well. This means that codes need check-pointing. Although we are all waiting for the holy grail of system level check-pointing, the odds of that being implemented consistently across architectures AND not have a significant performance hit is unlikely. This means that researchers have to also be software engineers. If they want to get real work done, adding check-pointing is one of the steps. As one operations manager at a major HPC site once said to me 'codes that don't support check-pointing aren't real codes'. Allowing users to run for days or weeks as SOP is begging for failure. Did that sysadmin who set 24 hour time limits ever analyze the amount of lost computational time because of larger time limits? Craig -- Craig Tierney (craig.tierney@noaa.gov) From nixon at nsc.liu.se Thu Jan 17 00:31:42 2008 From: nixon at nsc.liu.se (Leif Nixon) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: <478E3BE2.8060301@noaa.gov> (Craig Tierney's message of "Wed\, 16 Jan 2008 10\:16\:18 -0700") References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Craig Tierney writes: > Allowing users to run for days or weeks as SOP is begging for failure. Define failure. Our time limit is typically somewhere around 5 or 6 days. Many codes don't have checkpointing, and it's often simply not possible to add it because you don't have access to the source code. With backfill scheduling, short and narrow jobs typically don't have to wait *that* long, at least with the job mixture we see. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From nixon at nsc.liu.se Thu Jan 17 00:52:02 2008 From: nixon at nsc.liu.se (Leif Nixon) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E21D9.60900@tamu.edu> (Gerry Creager's message of "Wed\, 16 Jan 2008 09\:25\:13 -0600") References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <1200494157.11752.19.camel@bruce.priv.wark.uk.streamline-computing.com> <478E21D9.60900@tamu.edu> Message-ID: Gerry Creager writes: > I'm constantly reminded of a meeting early on in the SCOOP project, > which I participate in (http://scoop.sura.org). "We're able to > virtualize our model applications using VMware and only see a 13% > performance hit". Oops. Please note that the VMware license agreement forbids the users to publish benchmark figures, unless the benchmark method has been cleared with VMware beforehand. -- Leif Nixon - Systems expert ------------------------------------------------------------ National Supercomputer Centre - Linkoping University ------------------------------------------------------------ From Hakon.Bugge at scali.com Thu Jan 17 02:09:59 2008 From: Hakon.Bugge at scali.com (=?iso-8859-1?Q?H=E5kon?= Bugge) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <200801162000.m0GK08wZ023932@bluewest.scyld.com> References: <200801162000.m0GK08wZ023932@bluewest.scyld.com> Message-ID: <20080117101001.0F05F35AEA2@mail.scali.no> At 21:00 16.01.2008, Greg Lindahl wrote: >Use an MPI which does this for you? > >Two examples are InfiniPath MPI and OpenMPI. .. and another is Scali MPI Connect. We do it in two dimensions; latency or bandwidth policy, that is to use as few or many sockets as possible. Once that is selected, the resolution can be defined as a hyperthread, core (all HTs constituting a core), socket (all cores constituting a socket), or a node (all sockets on a node). The resolution is important for hybrid application; on a dual-socket, quad-core system, you can specify bandwidth policy and socket resolution staring two MPI processes. The first rank will be bound to all the cores on the first socket, the second on all the cores on the other socket. Further, the decision on which cores/sockets to use is determined dynamically, so multiple MPI instances on the same node is supported. Thanks, Hakon From Bogdan.Costescu at iwr.uni-heidelberg.de Thu Jan 17 05:53:36 2008 From: Bogdan.Costescu at iwr.uni-heidelberg.de (Bogdan Costescu) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: On Wed, 16 Jan 2008, Craig Tierney wrote: > Our queue limits are 8 hours. > ... > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? While I agree with the idea and reasons of short job runtime limits, I disagree with your formulation. Being many times involved in discussions about what runtime limits should be set, I wouldn't make myself a statement like yours; I would say instead: YMMV. In other words: choose what fits better the job mix that users are actually running. If you have determined that 8h max. runtime is appropriate for _your_ cluster and increasing it to 24h would lead to a waste of computational time due to the reliability of _your_ cluster, then you've done your job well. But saying that everybody should use this limit is wrong. Furthermore, although you mention that system-level checkpointing is associated with a performance hit, you seem to think that user-level checkpointing is a lot lighter, which is most often not the case. Apart from the obvious I/O limitations that could restrict saving & loading of checkpointing data, there are applications for which developers have chosen to not store certain data but recompute it every time it is needed because the effort of saving, storing & loading it is higher than the computational effort of recreating it - but this most likely means that for each restart of the application this data has to be recomputed. And smaller max. runtimes mean more restarts needed to reach the same total runtime... -- Bogdan Costescu IWR - Interdisziplinaeres Zentrum fuer Wissenschaftliches Rechnen Universitaet Heidelberg, INF 368, D-69120 Heidelberg, GERMANY Telephone: +49 6221 54 8869, Telefax: +49 6221 54 8868 E-mail: Bogdan.Costescu@IWR.Uni-Heidelberg.De From dnlombar at ichips.intel.com Thu Jan 17 08:34:19 2008 From: dnlombar at ichips.intel.com (Lombard, David N) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: <20080117163419.GA27510@nlxdcldnl2.cl.intel.com> On Thu, Jan 17, 2008 at 02:53:36PM +0100, Bogdan Costescu wrote: > On Wed, 16 Jan 2008, Craig Tierney wrote: > > >Our queue limits are 8 hours. > >... > >Did that sysadmin who set 24 hour time limits ever analyze the amount > >of lost computational time because of larger time limits? > > While I agree with the idea and reasons of short job runtime limits, I > disagree with your formulation. Being many times involved in > discussions about what runtime limits should be set, I wouldn't make > myself a statement like yours; I would say instead: YMMV. In other > words: choose what fits better the job mix that users are actually > running. If you have determined that 8h max. runtime is appropriate > for _your_ cluster and increasing it to 24h would lead to a waste of > computational time due to the reliability of _your_ cluster, then > you've done your job well. But saying that everybody should use this > limit is wrong. Completely agree. > Furthermore, although you mention that system-level checkpointing is > associated with a performance hit, you seem to think that user-level > checkpointing is a lot lighter, which is most often not the case. Hmmm. A system level checkpoint must save the complete state of the process to be checkpointed plus all of its siblings/children plus varying amounts of external state; a machine level checkpoint must save complete machine(s) state. A user level checkpoint need only save the data that define the current state--that could well be a small set of values. Having written that, it may be *easier* (even cheaper) to expend the resources to save the complete state than to restructure some suitably complex code to expose a restart state. I certainly know an application that fits that model during most of its runtime. But, at the end of the day, that is just trading runtime for design/coding/validation time and the notion's validity depends on which side of the operation you sit. Consider this though, if as an admin you only rely on user- level checkpoint, you *will* end up with an argument from one or more users about the maximum runtime at some point; with a system (or machine) checkpoint, you'll likely avoid a lot of agida[1], especially when unplanned or emergency outages/reprioritzations occur. > Apart from the obvious I/O limitations that could restrict saving & > loading of checkpointing data, there are applications for which > developers have chosen to not store certain data but recompute it > every time it is needed because the effort of saving, storing & > loading it is higher than the computational effort of recreating it - > but this most likely means that for each restart of the application > this data has to be recomputed. And smaller max. runtimes mean more > restarts needed to reach the same total runtime... As you note, only the application can know that it's easier to recompute than save and restore. I suspect many of us can site specific examples where it's easier to recompute; some could probably also cite cases where recomputing is faster too... [1] Hearburn, indigestion, general upset or agitation. -- David N. Lombard, Intel, Irvine, CA I do not speak for Intel Corporation; all comments are strictly my own. From Craig.Tierney at noaa.gov Thu Jan 17 08:43:19 2008 From: Craig.Tierney at noaa.gov (Craig Tierney) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] Re: Time limits in queues In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: <478F85A7.9040600@noaa.gov> Bogdan Costescu wrote: > On Wed, 16 Jan 2008, Craig Tierney wrote: > >> Our queue limits are 8 hours. >> ... >> Did that sysadmin who set 24 hour time limits ever analyze the amount >> of lost computational time because of larger time limits? > > While I agree with the idea and reasons of short job runtime limits, I > disagree with your formulation. Being many times involved in discussions > about what runtime limits should be set, I wouldn't make myself a > statement like yours; I would say instead: YMMV. In other words: choose > what fits better the job mix that users are actually running. If you > have determined that 8h max. runtime is appropriate for _your_ cluster > and increasing it to 24h would lead to a waste of computational time due > to the reliability of _your_ cluster, then you've done your job well. > But saying that everybody should use this limit is wrong. First all I agree that it is always a YMMV case. We good about that here (the list). My point was, that in every instance that I have seen, multi-day queue limits are not the norm. Those places do have exceptions for particular codes and particular projects. I know our system would handle 24h queues in terms of reliability, but with the job mix we have, it would cause problems beyond stability (we are currently looking at a new scheduler to solve that problem). > > Furthermore, although you mention that system-level checkpointing is > associated with a performance hit, you seem to think that user-level > checkpointing is a lot lighter, which is most often not the case. There was an assumption in my statement that I didn't share with people. I was thinking about system-level checkpointing that will probably work for clusters which will be some sort of VM based solution. That will have the overhead of the virtual machine as well as moving the data when the time comes. > Apart > from the obvious I/O limitations that could restrict saving & loading of > checkpointing data, there are applications for which developers have > chosen to not store certain data but recompute it every time it is > needed because the effort of saving, storing & loading it is higher than > the computational effort of recreating it - but this most likely means > that for each restart of the application this data has to be recomputed. Yes, but didn't you just say the recomputing that data are faster than the IO time associated with reading it? A checkpoint isn't model results. A checkpoint is a state of the model at a particular time, so in this case you would save that data. Its already in memory, you just need to write it out with every other bit of relevant information. No extra needed computations. > And smaller max. runtimes mean more restarts needed to reach the same > total runtime... > Yes, anytime you are doing something other than the model run (like checkpointing) your run will take longer. This is another one of those "it depends" scenario. If the runtime takes 1% longer, and it makes the other users happier or lessens the loss due to an eventual crash, is it worth it? The 1% number is a target I would design for, based on the workload we experience (multitude of different sized jobs, not one big job). I would buy a couple of nodes with 3ware cards and run either Lustre or PVFS2 over it for a place to dump the checkpoints. The filesystem would be mostly volatile (so redundancy wouldn't be critical), and would more than meet the reliability requirements of my system (>97%). Craig -- Craig Tierney (craig.tierney@noaa.gov) From smulcahy at aplpi.com Fri Jan 18 00:57:45 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <478E3415.6040208@charter.net> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> Message-ID: <47906A09.2040908@aplpi.com> Shannon V. Davidson wrote: > > Michael H. Frese wrote: >> At 08:31 AM 1/16/2008, Jeffrey B. Layton wrote: >>> - With multi-core processors, to get the best performance you want to >>> assign a process to a core. >> >> Excuse my ignorance, please, but can someone tell me how to do that on >> Linux (2.6 kernels would be fine)? > > sched_setaffinity(2) > taskset(1) > numactl(1) Hi, As an aside to this, do 2.6 kernels make some efforts to keep a process on a specific core anyways recognising the benefits to the cache of doing so (I suspect they do but maybe I just dreamed it up)? As a further aside, some MPI libraries (OpenMPI comes to mind) seem to make some efforts to keep processes on the same cores also (or can be instructed to via a run-time option). I'm wondering how much of a performance benefit there is to using the above-mentioned OS commands to set affinity (versus the trade-off in setting this up). -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From hahn at mcmaster.ca Fri Jan 18 12:15:16 2008 From: hahn at mcmaster.ca (Mark Hahn) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <47906A09.2040908@aplpi.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: > As an aside to this, do 2.6 kernels make some efforts to keep a process on a > specific core anyways recognising the benefits to the cache of doing so (I > suspect they do but maybe I just dreamed it up)? yes - the code is pretty reasonable, though probably more tuned towards typical desktop/webhost-type applications. there are affinity heuristics for managing which core a proc will be run on, as well as for guiding memory allocation on numa machines. (pretty soon, of course, all multi-socket machines will be numa and need these issues handled...) From mathog at caltech.edu Fri Jan 18 12:43:45 2008 From: mathog at caltech.edu (David Mathog) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] network question Message-ID: The questions is: do modern networks bundle multiple TCP "ack" together into a single packet? If so, on linux does ifconfig count all N acks in that single packet as if they were separate packets? Here's the background: I have been modifying nettee lately and ran across something which was a bit mysterious. The initial observation had TCP_NODELAY set on the data line. This was a mistake but it was largely being compensated for by a minwrite variable which controlled how big the buffer had to be before it was emptied by a read. Anyway, when running in that mode these 3 tests A->B B->C A->B->C were performed. (All 3 are on a single 100baset switch.) The first two ran at "full speed" (11.x Mb/sec) and the third much slower. Which is odd, since B could read and write at "full speed", just not both at the same time. So to work on this issue runs on B were instrumented like this: ifconfig eth0 | grep packet; nettee... ; ifconfig eth0 and the RX/TX counts compared before and after to see how many packets moved in/out on each test. For A->B, on B there were 27857 packets in and 13963 packets out. For A->B->C on B there were 41887 packets in and 42441 packets out. Since the sum of in + out on B for the first test is 41820, which is very close to 41887 for B in the second test, there are definitely a lot of ack packets coming back to B from C, and that seemed likely to be the problem. Only it wasn't, at least according to ifconfig. By varying the minwrite parameter described above the nettee throughput on B (and so the whole 3 member chain) could be adjusted to "full speed". The same speed was obtained by not setting TCP_NODELAY, in which case the minwrite parameter made no difference. Oddly, in these configurations where the program ran fastest the RX/TX counts changed only very slightly and not as I expected. In one typical "optimized" relay B had 41744 in RX and 41822 in TX. These numbers are only very slightly different from the unoptimized example shown above. The one way in which they were remarkable, and this could just be a coincidence, was that for the highest transfer rates the observed RX/TX ratio was closer to 1.0 than for other configurations. So my best guess for explaining how the data rate could increase so much is that there really were fewer packets, and ifconfig has somehow concealed this fact by breaking the multiple acks out and counting them as separate packets. Or is there something else going on? I also discovered that "athcool" really screws up network throughput on these Athlon machines. Dropping from about 11.4 Mb/sec to about 6.7 Mb/sec. We run a script that keeps track of CPU usage and shuts athcool off when CPU use peaks, but nettee only reached 20-30 percent of CPU so that never kicked in. I wonder if Athlon64's have the same issue when they are in their power saving mode, but have not run tests yet to find out. Thanks, David Mathog mathog@caltech.edu Manager, Sequence Analysis Facility, Biology Division, Caltech From bernard at vanhpc.org Fri Jan 18 14:53:26 2008 From: bernard at vanhpc.org (Bernard Li) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: <47906A09.2040908@aplpi.com> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: On 1/18/08, stephen mulcahy wrote: > I'm wondering how much of a performance benefit there is to using the > above-mentioned OS commands to set affinity (versus the trade-off in > setting this up). I guess the answer to the above question is "it depends on your code" -- but I'd also like to hear whether there are any general performance benefits to setting CPU affinity. Do major schedulers support this? Would this help with embarrassingly parallel jobs VS large MPI jobs on manycore machines? Thanks, Bernard From geoff at galitz.org Sun Jan 20 10:40:50 2008 From: geoff at galitz.org (Geoff) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] who is buying those $200 PCs from wal-mart? In-Reply-To: <6.2.3.4.2.20071115101758.02f266d8@mail.jpl.nasa.gov> References: <6.2.3.4.2.20071115101758.02f266d8@mail.jpl.nasa.gov> Message-ID: Am 15.11.2007, 19:55 Uhr, schrieb Jim Lux : > > > That dissatisfaction is among the small subset of consumers who read > Slashdot or this list or who write for and read those magazines. For > them Vista is a pain. > Just my $.02 worth... even me for Vista is not a pain. I use it in my work to run VMware workstation so I can write and test the tools needed to manage my clusters. I use putty to connect to them, deploy my tools and use them. The $200 PC would surely be underpowered to run something like VMware Workstation but it does mean that genuine real work can happen on that platform... just not running 3 or 4 VM's at once like I do. Running a single VM for that kind of work would be ok, I think. At my old lab we had about a dozen boxes whose purpose was to prepare jobs for cluster submission. No compilation was required, it was all defining parameters and visualization. There is definitely room for these lower power machines in the universe. Even if they do run Vista. -geoff -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From geoff at galitz.org Sun Jan 20 10:42:03 2008 From: geoff at galitz.org (Geoff) Date: Sat Jul 19 01:06:48 2008 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Interesting. We (and by we, I refer to my time at UC Berkeley College of Chemistry) used to implement multiple queues with various time restrictions to accomdate short, medium, long and extended run jobs. It was an honor to system to be sure, but I spent a great amount of time working with the researchers on an indvidual level to foster the trust that an honor system needs. There was also a little logic to allow submitted jobs to skew towards one end of the spectrum if the cluster was not fully utilized, and not expected to be so. Working that closely with folks also allowed us to chart cluster usage for about a month (and sometimes much more) so we can tweak cluster policy if appropriate. It worked out for the most part, but there was the occasional scofflaw. With the trust relationship I had with the researchers, we could usually nag the scofflaws back into line. Layer 8 issues can certainly lead to trouble, but it can also be used to your advantage! Just a personal observation. I realize this kind of thing would not work everywhere. -geoff > > Our queue limits are 8 hours. They are set this way for two reasons. > First, we have real time jobs that need to get through the queues and > we believe that allowing significantly longer jobs would block those > really important jobs. Second, for a multi-user system, it isn't very > fair for a user to run multi-day jobs and prevent shorter jobs from > getting > in. It is about being fair. Use the resource and then get back in line. > > I know that at other US Government facilities it is common practice to > set sub-day queue limits. I recently helped setup one site that had > queue limits set at 12 hours. Another large organization near the top > of the top 500 list does this as well. > > This means that codes need check-pointing. Although we are all waiting > for the holy grail of system level check-pointing, the odds of that being > implemented consistently across architectures AND not have a significant > performance hit is unlikely. This means that researchers have to also be > software engineers. If they want to get real work done, adding > check-pointing > is one of the steps. As one operations manager at a major HPC site once > said > to me 'codes that don't support check-pointing aren't real codes'. > > Allowing users to run for days or weeks as SOP is begging for failure. > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? > > Craig > -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From geoff at galitz.org Sun Jan 20 10:42:05 2008 From: geoff at galitz.org (Geoff) Date: Sat Jul 19 01:06:48 2008 Subject: Time limits in queues (was: Re: [Beowulf] VMC - Virtual Machine Console) In-Reply-To: <478E3BE2.8060301@noaa.gov> References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <478E3BE2.8060301@noaa.gov> Message-ID: Interesting. We (and by we, I refer to my time at UC Berkeley College of Chemistry) used to implement multiple queues with various time restrictions to accomdate short, medium, long and extended run jobs. It was an honor to system to be sure, but I spent a great amount of time working with the researchers on an indvidual level to foster the trust that an honor system needs. There was also a little logic to allow submitted jobs to skew towards one end of the spectrum if the cluster was not fully utilized, and not expected to be so. Working that closely with folks also allowed us to chart cluster usage for about a month (and sometimes much more) so we can tweak cluster policy if appropriate. It worked out for the most part, but there was the occasional scofflaw. With the trust relationship I had with the researchers, we could usually nag the scofflaws back into line. Layer 8 issues can certainly lead to trouble, but it can also be used to your advantage! Just a personal observation. I realize this kind of thing would not work everywhere. -geoff PS, sorry for any duplicate copies of this email, I am having some ISP issues this week. Am 16.01.2008, 18:16 Uhr, schrieb Craig Tierney : > Geoff wrote: >> > > ..Interesting discussion deleted.. > >> As a funny aside, I once knew a sysadmin who applied 24 hour >> timelimits to all queues of all clusters he managed in order to force >> researchers to think about checkpoints and smart restarts. I couldn't >> understand why so many folks from his particular unit kept asking me >> about arrays inside the scheduler submission scripts and nested >> commends until I found that out. Unfortunately I came to the >> conclusion that folks in his unit were spending more time writing job >> submission scripts than code... well... maybe that is an exaggeration. >> > > Our queue limits are 8 hours. They are set this way for two reasons. > First, we have real time jobs that need to get through the queues and > we believe that allowing significantly longer jobs would block those > really important jobs. Second, for a multi-user system, it isn't very > fair for a user to run multi-day jobs and prevent shorter jobs from > getting > in. It is about being fair. Use the resource and then get back in line. > > I know that at other US Government facilities it is common practice to > set sub-day queue limits. I recently helped setup one site that had > queue limits set at 12 hours. Another large organization near the top > of the top 500 list does this as well. > > This means that codes need check-pointing. Although we are all waiting > for the holy grail of system level check-pointing, the odds of that being > implemented consistently across architectures AND not have a significant > performance hit is unlikely. This means that researchers have to also be > software engineers. If they want to get real work done, adding > check-pointing > is one of the steps. As one operations manager at a major HPC site once > said > to me 'codes that don't support check-pointing aren't real codes'. > > Allowing users to run for days or weeks as SOP is begging for failure. > Did that sysadmin who set 24 hour time limits ever analyze the amount > of lost computational time because of larger time limits? > > Craig > -- ------------------------------- Geoff Galitz, geoff@galitz.org Blankenheim, Deutschland From raysonlogin at gmail.com Sun Jan 20 19:20:33 2008 From: raysonlogin at gmail.com (Rayson Ho) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] VMC - Virtual Machine Console In-Reply-To: References: <46697.192.168.1.1.1200489542.squirrel@mail.eadline.org> <42890.192.168.1.1.1200493100.squirrel@mail.eadline.org> <478E233F.9080103@charter.net> <6.2.5.6.2.20080116092550.04ed6140@NumerEx.com> <478E3415.6040208@charter.net> <47906A09.2040908@aplpi.com> Message-ID: <73a01bf20801201920q3ac7b647q335e1cd13bec4cdb@mail.gmail.com> On Jan 18, 2008 5:53 PM, Bernard Li wrote: > -- but I'd also like to hear whether there are any general performance > benefits to setting CPU affinity. Do major schedulers support this? > Would this help with embarrassingly parallel jobs VS large MPI jobs on > manycore machines? I am working on adding processor affinity support for serial and parallel jobs for Grid Engine, and I am working with the OpenMPI developers to define an interface. http://gridengine.sunsource.net/servlets/BrowseList?list=dev&by=thread&from=27044 http://www.open-mpi.org/community/lists/devel/2008/01/2949.php http://www.open-mpi.org/community/lists/devel/2008/01/2964.php BTW, LSF 7.0.2 supports processor affinity for serial jobs. However, supporting processor affinity for serial jobs is only useful when the OS scheduler is dumb... See also: "Enhancing an Open Source Resource Manager with Multi-Core/Multi-threaded Support" -- this paper talks about the support of processor affinity in SLURM: http://www.cs.huji.ac.il/~feit/parsched/jsspp07/p2-balle.pdf Rayson > > Thanks, > > Bernard > _______________________________________________ > Beowulf mailing list, Beowulf@beowulf.org > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf > From forum.san at gmail.com Sun Jan 20 21:43:36 2008 From: forum.san at gmail.com (Sangamesh B) Date: Sat Jul 19 01:06:48 2008 Subject: [Beowulf] For GROMACS users Message-ID: Hi, I'm a Linux guy and now it is required to install Gromacs on Solaris 10, x86 system. I faced lot of problems but still not installed. I think this is because, the binaries are not in the path. Don't know where the binaries are available in Solaris. I downloaded the binutils from Gromacs website, but the installation gave following error: ls/windres ] ; then echo $r/./binutils/windres ; else if [ ' i386-pc-solaris2.10' = 'i386-pc-solaris2.10 ' ] ; then echo windres; else echo windres ; fi; fi`" "CONFIG_SHELL=/bin/sh" "MAKEINFO=`if [ -f $r/build-i386-pc-solaris2.10/texinfo/makeinfo/Makefile ] ; then echo $r/build-i386-pc-solaris2.10 /texinfo/makeinfo/makeinfo ; else if (makeinfo --version | egrep 'texinfo[^0-9]*([1-3][0-9]|4\.[2-9]|[5-9])') >/dev/null 2>&1; then echo makeinfo; else echo $s/missing makeinfo; fi; fi` --split-size=5000000" 'AR=ar' 'AS=as' 'CC=gcc' 'CXX=c++' 'DLLTOOL=dlltool' 'LD=/usr/ccs/bin/ld' 'NM=nm' 'RANLIB=ranlib' 'WINDRES=windres' install) make: Fatal error: Command failed for target `install-binutils' Can any one guide me to install gromacs on Solaris? regards, Sangamesh HPC Engineer -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080121/194ae225/attachment.html From toon.knapen at gmail.com Mon Jan 21 22:49:03 2008 From: toon.knapen at gmail.com (Toon Knapen) Date: Sat Jul 19 01:06:49 2008 Subject: [Beowulf] how to detect boundedness Message-ID: I would like to ask you all what your preferred method is to detect if and how strongly an application is cpu-, memory- or I/O-bound. Do you 1) just run the app. on different machines (with diff. characteristics) 2) use the profiler 3) use hardware monitors such as cache-miss rate, ... .... Thanks in advance, toon -------------- next part -------------- An HTML attachment was scrubbed... URL: http://www.scyld.com/pipermail/beowulf/attachments/20080122/349c4c11/attachment.html From smulcahy at aplpi.com Tue Jan 22 00:38:48 2008 From: smulcahy at aplpi.com (stephen mulcahy) Date: Sat Jul 19 01:06:49 2008 Subject: [Beowulf] how to detect boundedness In-Reply-To: References: Message-ID: <4795AB98.1050306@aplpi.com> Toon Knapen wrote: > I would like to ask you all what your preferred method is to detect if > and how strongly an application is cpu-, memory- or I/O-bound. Do you > 1) just run the app. on different machines (with diff. characteristics) > 2) use the profiler > 3) use hardware monitors such as cache-miss rate, ... > .... > Hi, I'm inclined to use a bunch of tools (htop, dstat, vmstat, iostat, free, ganglia) to get a picture of whats happening on the system while my app is running and then start making some inferences from the behaviour I observe. Having the ability to test your application on hardware with different characteristics after coming to some tentative conclusions about the charactertistics of your application sounds like a good option but could be pretty time consuming. -stephen -- Stephen Mulcahy, Applepie Solutions Ltd., Innovation in Business Center, GMIT, Dublin Rd, Galway, Ireland. +353.91.751262 http://www.aplpi.com Registered in Ireland, no. 289353 (5 Woodlands Avenue, Renmore, Galway) From carsten.aulbert at aei.mpg.de Wed Jan 23 01:43:42 2008 From: carsten.aulbert at aei.mpg.de (Carsten Aulbert) Date: Sat Jul 19 01:06:49 2008 Subject: [Beowulf] Vendor/Distributor for customized ethernet cables? Message-ID: <47970C4E.7010703@aei.mpg.de> Hi, we need a few thousand cables (mix of Cat5e and/or Cat6) but I have a very hard time finding a distributor which can offer me other lengths than the "standard European" .5, 1.