[Beowulf] Re: cluster profiling (John Hearns) & (Christopher Samuel)

Wed Nov 3 07:06:27 PDT 2010

Thanks a lot for the advice. :) 

I don't want to overcomplicate things. Its just that I really haven't found any literature that explains how to profile in the top level the cluster and decide where the bottlenecks are. The Ganglia monitoring system has a gstat utility and a python interface that returns the monitoring statistics in a raw mode (not in the HTTP form for its web interface). With the ability to use its Python interface I can write the profiling application within a single language / environment, so this is why I chose it, and the simple way to get the info for each node. I've read online that it has some overhead, but I'm dealing with dedicated multicore nodes and my simulations are really computationally intensive, which will phase out the system side processes on the nodes. I will look up the perf and sysstat definitely in detail before coding anything. 

Please don't get me wrong, I just want to find a way to build a smal ~20 node multicore COTS beowulf for use with OpenFOAM. This would more than suffice for my needs. In order to do this I don't want to overcomplicate things... I just want to learn as much as I can and build my machine. This kind of profiling harness should inform me on the major metrics for the nodes and for the frontend. 

I have spent significant amount of time on profiling the application (a solver of the OpenFOAM package) on a single system, only to gradually realize that there are multiple dozens of different CFD/CCM solvers in OpenFOAM, that involve different numerical schemes (spatial, temporal) and operate on different physical fields. This is an impossible task, to optimise the ordinary PC (CPU clock rate, RAM amount, etc) for OpenFOAM, because the number of the parameters involves huge number of permutations. The optimisation itself would take too long and the conclusion is still untouched by the reality:

I should just buy the before-last family of processors and RAM, and stack up on the machines. 

Now, I would just like a few pointers in the right directions regarding the macroscopic metrics of the networked cluster: e.g. increase the simulation size while keeping the core number and the node number constant and find out when the switch drops dead. This will help me evaluate for a specific set of cases (up to 3 Million cells), how much nodes do I need to buy, do I need 10 gig eth, and what kind of speedup did the separation of the network into DATA - HSI - FRONTEND bring me. This also reduces the number of parameters to 

1 - node number (in this case 1 or 2, but I'm getting more machines soon)
2 - core number (2 - 8, more to come)
3 - mesh density (scripted increase from 200k to 1 M)

the rest are the global metrics: used memory, CPU %, duration of the simulation (user CPU time), network traffic (MB/s) on the switch. The metrics stays the same and is temporaly averaged over the course of the simulation (the case decomposition is static, otherwise, no conclusions may be drawn). 

I have read the book from Robert Lucke, Robert Gordon Brown, HPC for dummies, the Beowulf books, and a swarm of articles online... and well.... I did my homework... still, I could really use any advice anyone can spare on the profiling/scaling of such machine. 

Thanks again, 
Tomislav

> ----- Original Message -----
> From: beowulf-request at beowulf.org
> Sent: 11/03/10 02:22 AM
> To: beowulf at beowulf.org
> Subject: Beowulf Digest, Vol 81, Issue 3
> 
> Send Beowulf mailing list submissions to
> beowulf at beowulf.org
> 
> To subscribe or unsubscribe via the World Wide Web, visit
> http://www.beowulf.org/mailman/listinfo/beowulf
> or, via email, send a message with subject or body 'help' to
> beowulf-request at beowulf.org
> 
> You can reach the person managing the list at
> beowulf-owner at beowulf.org
> 
> When replying, please edit your Subject line so it is more specific
> than "Re: Contents of Beowulf digest..."
> 
> 
> Today's Topics:
> 
>  1. cluster profiling (tomislav_maric at gmx.com)
>  2. Re: cluster profiling (John Hearns)
>  3. Re: Re: Interesting (Peter St. John)
>  4. Re: cluster profiling (Christopher Samuel)
>  5. RE: Re: Interesting (Lux, Jim (337C))
> 
> 
> ----------------------------------------------------------------------
> 
> Message: 1
> Date: Tue, 02 Nov 2010 22:45:51 +0100
> From: "tomislav_maric at gmx.com" <tomislav.maric at gmx.com>
> Subject: [Beowulf] cluster profiling
> To: beowulf at beowulf.org
> Message-ID: <20101102215301.225020 at gmx.com>
> Content-Type: text/plain; charset="utf-8"
> 
> Hi everyone,
> 
> I'm running a COTS beowlulf cluster and I'm using it for CFD simulations with the OpenFOAM code. I'm currently writing a profiling application (a bunch of scripts) in Python that will use the Ganglia-python interface and try to give me an insight into the way machine is burdened during runs. What I'm actually trying to do is to profile the parallel runs of the OpenFOAM solvers. 
> 
> The app will increment the mesh density (the coarsness) of the simulation, and run the simulations increasing the number of cores. Right now the machine is miniscule: two nodes with Quad cores. The app will store the data (timing of the execution, the number of cores) and I will plot the diagrams to see when the case size and the core number is starting to drive the speedup away from the "linear one". 
> 
> Is this a good approach? I know that this will show just tendencies on such an impossible small number of nodes, but I will expand the machine soon, and then their increased number should make these tendencies more accurate. When I cross-reference the temporal data with the system status data given by the ganglia, I can derive conclusions like "O.K., the speedup went down because for the larger cases, the decomposition on max core number was more local, so the system bus must have been burdened, if ganglia confirms that the network is not being strangled for this case configuration".
> 
> Can anyone here tell me if I am at least stepping in the right direction? :) Please, don't say "it depends". 
> 
> Best regards, 
> Tomislav Maric, (MSc Mechanical Engineering, just to clarify my ignorance regarding HPC)
> 
> 
> 
> ------------------------------
> 
> Message: 2
> Date: Tue, 2 Nov 2010 23:21:13 +0000
> From: John Hearns <hearnsj at googlemail.com>
> Subject: Re: [Beowulf] cluster profiling
> To: Beowulf Mailing List <beowulf at beowulf.org>
> Message-ID:
> <AANLkTikfnGurYPZcTX=ieN7x-uihxGj1anh0c6qeEqCz at mail.gmail.com>
> Content-Type: text/plain; charset=ISO-8859-1
> 
> On 2 November 2010 21:45, tomislav_maric at gmx.com <tomislav.maric at gmx.com> wrote:
> 
> >
> > Can anyone here tell me if I am at least stepping in the right direction? :) Please, don't say "it depends".
> >
> 
> This sounds very cool.
> To be honest, most people use Excel spreadsheets to plot this sort of thing.
> If you can produce an automated framework to do this it would be very
> interesting.
> 
> I have to slightly question your choice of Ganglia - have you thought
> of using sysstat to capture the system's load or memory figures?
> 
> 
> ------------------------------
> 
> Message: 3
> Date: Tue, 2 Nov 2010 20:49:34 -0400
> From: "Peter St. John" <peter.st.john at gmail.com>
> Subject: Re: [Beowulf] Re: Interesting
> To: "Robert G. Brown" <rgb at phy.duke.edu>
> Cc: "beowulf at beowulf.org" <beowulf at beowulf.org>, "Lux, Jim \(337C\)"
> <james.p.lux at jpl.nasa.gov>
> Message-ID:
> <AANLkTinFdxvvxTy_2ER_Mw+T8AcFNhxFjB45d+5o9u+J at mail.gmail.com>
> Content-Type: text/plain; charset="iso-8859-1"
> 
> deBeers laser-engraves serial numbers onto their (natural) diamonds (to
> counter the increasing gem quality of artificial diamonds made by, say,
> chemical vapor deposition). So how about laser engraving data onto cheap
> chemical vapor deposition thin diamond slices? (One of the ideas had been to
> make microelectronics substrates from diamond this way, since diamond
> conducts heat better than silicon).
> Peter
> 
> On Fri, Oct 29, 2010 at 1:36 PM, Robert G. Brown <rgb at phy.duke.edu> wrote:
> 
> > On Fri, 29 Oct 2010, Lux, Jim (337C) wrote:
> >
> > Or, how about something like the UNICON aka "terabit memory" (TBM) from
> >> Illiac IV days. It's a stable polyester base with a thin film of rhodium
> >> that was ablated by a laser making 3 micron holes to write the bits.
> >> $3.5M
> >> to store a terabit in 1975.
> >>
> >
> > Burned RO laser disks should in principle be as stable, if the medium
> > used is thick enough. The problem is that CDs tend to be mass produced
> > with very thin media, cheap plastic, and are even susceptible to
> > corrosion through the plastic over time. If one made a CD with tempered
> > glass and a moderately thick slice of e.g. stainless steel or
> > platinum...
> >
> > But then your problem is the reader. CD readers give way to DVD and are
> > still backwards compatible, sort of. But what about the 2020
> > equivalent? Will there even be one? Nobody will buy actual CDs any
> > more. Nobody will buy movies on DVDs any more (seriously, I doubt that
> > they will). Will there BE a laser drive that is backwards compatible to
> > CD, or will it go the way of reel to reel tapes, 8 track tapes, cassette
> > tapes, QIC tapes, floppy drives of all flavors (including high capacity
> > drives like the ones I have carefully saved at home in case I ever need
> > one), magnetic core memories, large mountable disk packs, exabyte tape
> > drives, DA tapes, and so on? I rather think it will be gone. It isn't
> > even clear if hard disk drives will still be available (not that any
> > computer around would be able to interface with the 5 or 10 MB drives of
> > my youth anyway).
> >
> > This is the problem with electronics. You have to have BOTH long time
> > scale stability AND an interface for the ages. And the latter is highly
> > incompatible with e.g. Moore's Law -- not even the humble serial port
> > has made it through thirty years unscathed. Is the Universal Serial Bus
> > really Universal? I doubt it. And yet, that is likely to be the only
> > interface available AT ALL (except for perhaps some sort of wireless
> > network that isn't even VISIBLE to old peripherals) on the vast bulk of
> > the machines sold in a mere five years.
> >
> > A frightening trend in computing these days is that we may be peaking in
> > the era where one's computer (properly equipped with a sensible
> > operating system) is symmetrically capable of functioning as a client
> > and a server. Desktop computers were clients, servers, or both as one
> > wished, from the days of Sun workstations through to the present, with
> > any sort of Unixoid operating system and adequate resources. From the
> > mid 90's on, with Linux, pure commodity systems were both at the whim of
> > the system owner -- anybody could add more memory, more disks, a backup
> > device, and the same chassis was whatever you needed it to be.
> >
> > Now, however, this general purpose desktop is all but dead, supplanted
> > by laptops that are just as powerful, but that lack the expandability
> > and repurposeability. And laptops are themselves an endangered species
> > all of a sudden -- in five years a "laptop" could very well be a single
> > "pad" (touchscreen) of whatever size with or without an external
> > keyboard, all wireless, smooth as a baby's bottom as far as actual plugs
> > are concerned (or maybe, just maybe, with a single USB charger/data port
> > or a couple of slots for SD-of-the-day or USB peripherals). Actual data
> > storage may well migrate into servers that are completely different
> > beasts, far away, accessible only over a wireless network, and
> > controlled by others.
> >
> > An enormous step backwards, in other words. A risk to our political
> > freedom. And yet so seductive, so economical, so convenient, that we
> > may willingly dance down a primrose path to an information catastrophe
> > that is more or less impossible still with the vast decentralization of
> > stored knowledge.
> >
> > rgb
> >
> >
> > Robert G. Brown http://www.phy.duke.edu/~rgb/<http://www.phy.duke.edu/%7Ergb/>
> > Duke University Dept. of Physics, Box 90305
> > Durham, N.C. 27708-0305
> > Phone: 1-919-660-2567 Fax: 919-660-2525 email:rgb at phy.duke.edu<email%3Argb at phy.duke.edu>
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> >
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://www.scyld.com/pipermail/beowulf/attachments/20101102/09fdea41/attachment-0001.html
> 
> ------------------------------
> 
> Message: 4
> Date: Wed, 03 Nov 2010 11:49:59 +1100
> From: Christopher Samuel <samuel at unimelb.edu.au>
> Subject: Re: [Beowulf] cluster profiling
> To: beowulf at beowulf.org
> Message-ID: <4CD0B1B7.9020705 at unimelb.edu.au>
> Content-Type: text/plain; charset=UTF-8
> 
> -----BEGIN PGP SIGNED MESSAGE-----
> Hash: SHA1
> 
> On 03/11/10 08:45, tomislav_maric at gmx.com wrote:
> 
> > that will use the Ganglia-python interface and try to
> > give me an insight into the way machine is burdened
> > during runs
> 
> Depending on how old your kernel is the "perf" utility
> (found in the tools/perf directory in your kernel sources,
> or packaged in Ubuntu as part of the linux-tools package
> or linux-tools-2.6 in Debian Squeeze) may well give you
> some interesting stats.
> 
> As a here is an overview of stats a "find -ls" over
> the current kernel git tree:
> 
> $ perf stat find . -ls > /dev/null
> 
>  Performance counter stats for 'find . -ls':
> 
>  372.415331 task-clock-msecs # 0.923 CPUs
>  158 context-switches # 0.000 M/sec
>  2 CPU-migrations # 0.000 M/sec
>  395 page-faults # 0.001 M/sec
>  648855865 cycles # 1742.291 M/sec
>  698863597 instructions # 1.077 IPC
>  14321645 cache-references # 38.456 M/sec
>  379109 cache-misses # 1.018 M/sec
> 
>  0.403454703 seconds time elapsed
> 
> 
> You can use the "perf list" command to get a list of all
> the kernel tracepoints you can monitor and then you can
> select them individually with the "stat" command.
> 
> Here is perf monitoring CPU migrations, L1 dcache misses
> and the kernel scheduler stats of that well known HPC
> program "top". ;-)
> 
> perf stat -e migrations -e L1-dcache-load-misses -e sched:* top
> 
> [...]
> 
>  Performance counter stats for 'top':
> 
>  0 CPU-migrations # 0.000 M/sec
>  1038307 L1-dcache-load-misses # 0.000 M/sec
>  0 sched:sched_kthread_stop # 0.000 M/sec
>  0 sched:sched_kthread_stop_ret # 0.000 M/sec
>  0 sched:sched_wait_task # 0.000 M/sec
>  98 sched:sched_wakeup # 0.000 M/sec
>  0 sched:sched_wakeup_new # 0.000 M/sec
>  61 sched:sched_switch # 0.000 M/sec
>  0 sched:sched_migrate_task # 0.000 M/sec
>  0 sched:sched_process_free # 0.000 M/sec
>  1 sched:sched_process_exit # 0.000 M/sec
>  0 sched:sched_process_wait # 0.000 M/sec
>  0 sched:sched_process_fork # 0.000 M/sec
>  15 sched:sched_signal_send # 0.000 M/sec
>  49 sched:sched_stat_wait # 0.000 M/sec
>  174 sched:sched_stat_runtime # 0.000 M/sec
>  67 sched:sched_stat_sleep # 0.000 M/sec
>  0 sched:sched_stat_iowait # 0.000 M/sec
> 
>  29.452075124 seconds time elapsed
> 
> With root access you can even do "perf top" to see what's
> going on under the hood.
> 
> You can also use "perf record -g $COMMAND" to record the profiling
> information for $COMMAND to perf.data along with call graph information
> so you can display a detailed tree view of what was going on via the
> "perf report" command.
> 
> Quite a neat little tool I've got to say!
> 
> cheers,
> Chris
> - -- 
>  Christopher Samuel - Senior Systems Administrator
>  VLSCI - Victorian Life Sciences Computational Initiative
>  Email: samuel at unimelb.edu.au Phone: +61 (0)3 903 55545
>  http://www.vlsci.unimelb.edu.au/
> 
> -----BEGIN PGP SIGNATURE-----
> Version: GnuPG v1.4.10 (GNU/Linux)
> Comment: Using GnuPG with Mozilla - http://enigmail.mozdev.org/
> 
> iEYEARECAAYFAkzQsbcACgkQO2KABBYQAh+6JACaAx7p0zARcGGO4busVv7AbqHL
> tCcAnA4Z6HOs1LTbucprnyBJFxF6glo+
> =D2wX
> -----END PGP SIGNATURE-----
> 
> 
> ------------------------------
> 
> Message: 5
> Date: Tue, 2 Nov 2010 18:19:22 -0700
> From: "Lux, Jim (337C)" <james.p.lux at jpl.nasa.gov>
> Subject: RE: [Beowulf] Re: Interesting
> To: "Peter St. John" <peter.st.john at gmail.com>
> Cc: "beowulf at beowulf.org" <beowulf at beowulf.org>
> Message-ID:
> <ECE7A93BD093E1439C20020FBE87C47FEDD2AEE36B at ALTPHYEMBEVSP20.RES.AD.JPL>
> 
> Content-Type: text/plain; charset="us-ascii"
> 
> Diamonds may be almost forever,
> But I would think that fused silica would work almost as well, and is substantially less expensive.
> If you want something exotic, how about ion implantation of Cr+ or Ti+ ions into alumina
> 
> Jim Lux
> +1(818)354-2075
> From: Peter St. John [mailto:peter.st.john at gmail.com]
> Sent: Tuesday, November 02, 2010 5:50 PM
> To: Robert G. Brown
> Cc: Lux, Jim (337C); beowulf at beowulf.org
> Subject: Re: [Beowulf] Re: Interesting
> 
> deBeers laser-engraves serial numbers onto their (natural) diamonds (to counter the increasing gem quality of artificial diamonds made by, say, chemical vapor deposition). So how about laser engraving data onto cheap chemical vapor deposition thin diamond slices? (One of the ideas had been to make microelectronics substrates from diamond this way, since diamond conducts heat better than silicon).
> Peter
> -------------- next part --------------
> An HTML attachment was scrubbed...
> URL: http://www.scyld.com/pipermail/beowulf/attachments/20101102/370656f6/attachment.html
> 
> ------------------------------
> 
> _______________________________________________
> Beowulf mailing list
> Beowulf at beowulf.org
> http://www.beowulf.org/mailman/listinfo/beowulf
> 
> 
> End of Beowulf Digest, Vol 81, Issue 3
> **************************************