[Beowulf] paralle computation question:please help
Joshua mora acosta
joshua_mora at usa.net
Mon Sep 3 01:31:08 PDT 2012
If a program scales in parallel you should be able to see a reduction of the
elapsed time or number of clocks of the entire code. Look at your code for
both the total of your code and the key functions that are scaling. By scaling
I mean that you would take the analysis of 1 core,2 cores,4 cores,.. Then look
at those functions that remain constant and those that reduce time/clocks.
Some functions will go up, which is due to overheads associated with the
algorithm (communication and buffers). Pay attention to the tool you use in
terms of the performance metric unit so you measure things right.
Get trained yourself on the usage of the tool by using academic cases of
parallel algorithms with simple implementations (eg. reduction in the dot
product). Then you focus on the training on the tool rather than on
implementation details of the code. You know what you get, just look at the
performance metric until you get what you are expecting.
In terms of IPC (instructions per clock): that isn't going to tell you
anything in terms of parallelism. why ? because that only tells you how
efficiently is running the code on each core when executing each thread or MPI
process. It does not tell you parallel efficiency.
With CodeAnalyst, use any set of performance events that measure simply the
clocks. Then observe that a multithreaded run has a view of the total
elapsed/clocks and also a per thread view. Do this collection of performance
data for 1,2,4 threads. Observe both the total view and the per thread view.
You will see the scaling if any by seeing the total being reduced. The per
thread view will mostly help you to see the balance of work across threads.
The balance of work is important in order to see scalability issues.
A good example is also a parallel (openMP) DGEMM from math libraries (ACML,
MKL), where you should see good scalability and you can focus on the tool
Hope it helps.
------ Original Message ------
Received: 06:01 AM CDT, 09/02/2012
From: Vincent Diepeveen <diep at xs4all.nl>
To: Tamnun E Mursalin <temursalin at gmail.com>Cc: beowulf at beowulf.org
Subject: Re: [Beowulf] paralle computation question:please help
> Yeah well the classical mistake if i may say so.
> You're plotting a derived function and then based upon the derived
> function say something
> about the actual functions.
> Much simpler is doing a profile run with the actual code and then for
> each function in your code,
> see the direct data, rather than this graph.
> It's very easy to have the processor execute NOPs of course and then
> claim a high IPC based upon that.
> On Sep 1, 2012, at 8:55 PM, Tamnun E Mursalin wrote:
> > Hi,
> > I am trying to plot instruction per cycle versus number of core, as
> > I increase the numbe of threads. Object 1 represent 1 thread, object
> > 2 represents 2 threads, ...
> > I have used pthread,C code and code analyst to perform the simulation
> > of my program.
> > Purpose is to see the level of parallelism in my code, or where I can
> > get the highest level of parallelism. I am failing to understand the
> > plot, for the last two days. Can anyone see any knowledge out of this
> > plot on the aspect of parallelism.
> > Please help..
> > Best
> > --
> > Best
> > Regards<parallelism.png>______________________________________________
> > _
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> > Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
More information about the Beowulf