[Beowulf] New HPCC results, and an MX question

Vincent Diepeveen diep at xs4all.nl
Wed Jul 20 14:14:22 PDT 2005


My chessprogram diep is multiprocessor, but about everything 
around me is multithreaded.

Multithreading is supported better by windows for example than

If you share in windows XP a big amount of RAM and store pointers there,
meaning you have to allocated it in the same virtual adress space,
then microsoft is up to factor 100 slower there than it should be.

Linux has by default if you boot the kernel allowance for 32MB shared memory.
You have to signal the kernel as 'root' to RUN a program which eats more
than 32MB ram.

Arguably, multiprocessor works better for freak software, but about all
teachers and professors will push students already for 10 years towards

Unless you want a very selected group to use your software, you'll have
to take care that multithreaded software also works fast for clusters.

Now you'll argue that MPI in itself can already start many processes and
that there is no way out.

But let's take for example most chessprograms, they are multithreaded,
so if they go run on a cluster, they want at each node 1 process started
which has a bit more threads than there is cores, and of course for each
node 1 process.

This is a logical way to get more speedup out of a cluster.

Additional they would be near to insane to rewrite their already good working
SMP algorithm from multithreading to multiprocessor first, and *then* start
a 2nd layer MPI parallel search.

If you see that majority of jobs in supercomputers are jobs of 4-8 cores,
you'll also realize how big multithreading is in scientific world.

MPI is the big reason why not everyone who is needing a lot of cpu power
has a cluster at home. If there would
be a layer on top of it providing same functionality but in a kind of SSI
form, there would be more software running on clusters.

At 10:18 AM 7/20/2005 -0700, Greg Lindahl wrote:
>On Wed, Jul 20, 2005 at 03:06:08PM +0200, Vincent Diepeveen wrote:
>> Additional there will be software layers that have to lock in some way.
>Vincent, nobody builds networks this way, at least nobody building a
>high performance network. What everyone does is give N processes their
>own virtual copy of the chip, generally called a "port". Myrinet
>implements ports in software on their Lanai chip, we do them in
>hardware. In regular InfiniBand, the separate processes get separate
>You are correct that software locking on the host cpu would be
>expensive, and that's why "threaded MPI" is a bad idea.
>-- greg
>Beowulf mailing list, Beowulf at beowulf.org
>To change your subscription (digest mode or unsubscribe) visit

More information about the Beowulf mailing list