[Beowulf] Infiniband Advice which functions to use for what purpose
Vincent Diepeveen
diep at xs4all.nl
Mon Apr 9 17:14:52 PDT 2012
hi,
Trying to make an new model for infiniband for Diep.
I need some advice which functioncalls/libraries to use for fastest
possible communication over infiniband (mellanox qdr)
from one node to another.
There is a lot of possibilities there but what's communicating fastest?
I need 2 different types of communication possibly 3 or more.
Still can setup the model there how to communicate now so let's test
the water:
a) each node has a 1.5GB cache. so that's 1.5 GB * n
each core of each node is randomly needing 192 bytes. Don't
know which node in
advance and don't know where in the gigabytes of cache
(hashtable) it needs to read.
what library and which function call is best to ask for this?
Realize all 8 cores are busy, if i need to keep 1 core free
handling all requests from all other
nodes, that slows down each machine significantly as i lose 1
core then.
b) for starting and stopping the difference cores (at all nodes) in a
de-centralized manner,
some variables are difficult to keep decentralized, you want
them broadcasted to all nodes somehow
updating shared memory at remote nodes in some sort of manner,
so the mellanox card writing into the RAM
without interrupting the probably 8 running cores, nor needing
any of them to handle this.
Is that possible somehow? If so, is it possible to update it
with 1 function call to all n-1 other nodes?
c) memory migration - which possibilities are there to do this - i
probably need to build a manual memory migration
when a specific job gets taken over from 1 node to another.
Which function calls would you advice to use there,
is there documentation on how to efficiently implement memory
migration?
I need to migrate roughly around a 2 kilobyte at a time. This
doesn't happen too much obviously, yet the algorithms
are so complex i can't avoid doing this if i want the utmost
performance so i figured out on paper.
And yes i do know there is some stuff that already has this built in
- but that's possibly too slow for what i need.
d) atomic reads/writes/spinlocks over infiniband. there probably is a
function to set a lock at a remote memory adress,
which one is it?
Is there also a function call that sets a lock, and when lock
is succesful directly returns you a bunch of bytes from a specific
adress (nearby the lock); that would avoid me doing the
procedure first setting a lock. Then sit duck and wait until lock is
set.
Then issue that read. Means we ship from node A to B something,
then when lock set at B, goes back to A. Then A can read its
bytes finally at B as it has the lock set. Is there a combined
function that is faster than this and is just directly after it can get
the lock at B return those bytes to A?
e) when doing the spinlock from A, is the core A.c that tries to set
the lock at node B, is that core spinning?
My previous experience there is that nowadays and/or in past
when trying to do this, some implementations instead of having your
core spin for a bunch of microseconds, they put your core to
idle, which means that it needs to get fired by the runqueue,
to say it in a simple manner, once again, which again means a
10-30 milliseconds delay until it has received that data.
Do cores get put in prison for up to 30 years when trying to
set a lock with the function call in D, do i have both options or am
i so lucky?
Many thanks for taking a look at my questions and even more to those
responding!
Kind Regards,
Vincent
More information about the Beowulf
mailing list