[Beowulf] Seg Fault with pvm_upkstr() and Linux.

Vincent Diepeveen diep at xs4all.nl
Wed Mar 16 09:37:47 PST 2005


At 12:51 AM 3/15/2005 -0700, Josh Zamor wrote:
>I have just started in on programming using PVM and have run into an 
>odd problem. I have written a C (c99) program that calculates the 
>factorial of a number by calculating parts of the range. I'm using the 
>GMP for dealing with large numbers (I have done this program 
>successfully before using numerous methods including pthreads). The 

Did you configure GMP correctly?

For math with big numbers it default does not use FFT calculations but way
slower methods. You might want to recompile it with FFT enabled in case you
didn't do this yet.

Please note that GMP is a very slow library, commercial libraries are up to
factor 3 faster (lineair speed not exponential, more efficient
implementation).

I personally use GMP with big pleasure.

>basic way it works for the cluster is that the program starts on a 
>machine, determines the subrange to be calculated for each task and 
>then waits for each process to come back with the answer for it's 
>subrange. The main process then finds the total result of the factorial 
>from multiplying the subrange results back together... Pretty 
>standard...

>The problem is that after a subrange is calculated by a task the result 
>is put into a character array (null terminated, created from GMP's 
>mpz_getstr()), it is packaged using pvm_pkstr() and sent to the parent. 
>The parent can receive this using pvm_recv(), but as soon as it tries 
>to store this into a character array in the program using pvm_upkstr(), 
>or explore it with pvm_bufinfo(), it segfaults.

In general in parallel programming the worst performance you get when all
processes must report to 1 central process.

It's far more efficient when each process is equal and 'divides' the work
done.

A simple calculation example of a problem i had at a 512 processor SGI is
that each 'hub' can handle at most 680MB data per second (for 4 processors
in total yes). 

However if 499 other processors start reading/writing from/to this 'hub'
then real disasters will happen. 

Things will completely lock up. Not only because all processors must divide
the small bandwidth, but also because you will get switch latency overhead
problems of routers and switches. 

If they first must stream a few bytes data from A to B and then suddenly
from C to D, that's far less efficient than when 1 switch/router must
stream only from A to B.

Switches and routers sometimes have their own cache which is optimized for
those benchmark streaming tests simply. Switch latency can cause serious
problems if all processors want to use the same communication resources.

The general rule is to keep the routers/switches as less possible as busy
and try to make embarrassingly as possible parallel software.

>I've also tried this in a couple of different ways, passing ints, it 
>works, but passing strings (either large or small) results in a 
>segfault.

>Of my 2 "cluster" learning setup one is running Mac OSX and the other 
>is running Gentoo Linux. It is only the linux box that segfaults, the 
>OSX box finds the answer correctly. The segfault happens whenever the 
>linux box is part of the PVM created cluster, or when PVM is ran alone 
>(no other boxes in the cluster) on the linux box.

which compiler do you compile with?

I hope gcc only and not intel c++?

intel c++ is notorious with floating points in order to get faster at
benchmarks.

Are you busy with floating point or with integers?

>If anyone has any idea why this is happening, or is only happening on 
>the linux box I would be grateful. Thanks, tech details follow:

>Gentoo Linux Box:
>	Proc: AMD AthlonXP 1700+
>	RAM: 512MB.
>	Kernel: 2.4.26-gentoo-r9
>	PVM: 3.4.5
>	GMP: 4.1.4
>	GCC: 3.3.5
>Mac OSX:
>	Proc: G4 1GHz
>	RAM: 768MB
>	Kernel: 10.3.8, Mach 7.8.0
>	PVM: 3.4.5
>	GMP: 4.1.4
>	GCC: 3.3 (20030304)

Are you using PGO with gcc?  (pgo = profile guided optimizations)

There is major bugs even in latest 3.4.3 gcc in the PGO.

Those guys are all volunteers and very cool guys.

Very slow in bugfixing as they have other jobs too, and i don't blame them.

Vincent

>Thanks again.
>
>Regards,
>-J Zamor
>jzamor at gmail.com
>
>_______________________________________________
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit
http://www.beowulf.org/mailman/listinfo/beowulf
>
>



More information about the Beowulf mailing list