[Beowulf] single machine with 500 GB of RAM

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Wed Jan 9 08:29:58 PST 2013

Dear all,

many thanks for the quick reply and all the suggestions.

The code we want to use is that one here:


Feel free to download and dig into the code. I am no expert in Fortran so I 
won't be able to help you much if you got specific questions to the code :-(
However, my understanding is that it will only run on one core/thread. 

As for the budget: That is where it is getting a bit tricky. The ceiling is 
10k GBP. I know that machines with less memory, say 256 GB, are cheaper, so 
one solution would be to get two of the beast so we can do two calculations at 
the same time. If there are enough slots free, we could upgrade to 500 GB once 
we got another pot of money. 

I guess I would go for DDR3, simply as it is faster. Waiting 2 weeks for a 
calculation is no fun, so if we can save a bit of time here (faster RAM) we 
gain actually quite a bit here. 

I am not convinced with the AMD Bulldozer to be honest. From what I understand 
the Sandybridge has the faster memory access (higher bandwidth). Is that 
correct or do I miss out something here.

I gather that the idea of just using one CPU is not a good one. So we need to 
have a dual CPU machine, which is fine with me. 

I am wondering about the vSMP / ScaleMP suggestion from Joe. If I am using an 
InfiniBand network here, would I be able to spread the 'bottlenecks' a bit 
better? What I am after is, when I tested out the InfiniBand on the new cluster 
we got, I noticed that if you are running a job in parallel between nodes, the 
same amount of cores are marginally faster. At the time I put that down due to 
a slightly faster memory access as there was no bottleneck to the RAM. 
I am not familiar with vSMP (i.e. I never used it), but is it possible to 
aggregate RAM from a number of nodes (say 40) and use it as a large virtual 
SMP? So one node would be slaving away with the calculations and the other 
nodes are only doing memory IO. Is that possible with vSMP?
In a related context, how about NUMAScale?

The idea of the aggregates SDD is nice as well. I know some storage vendors 
are using a mixture of RAM and SDD for their meta-data (fast access) and that 
seems to work quite well. So that would be a large swap file / partition or is 
there another way to use disc-space as RAM? I need to read the paper of 
NVMalloc I suppose. Is that actually used or is that just a good idea and we 
got a working example here?

I don't think there is much disc IO here. There is most certainly no network 
bound traffic as it is a single thread. A fast CPU would be of advantage as 
well, however, I gut the feeling the trade-off would be the memory access speed 

I have tried to answer the questions raised. Let me know whether there are 
still some unclear points. 

Thanks for all your help and suggestions so far. I will need to digest that.

All the best from a sunny London


Jörg Saßmannshausen
University College London
Department of Chemistry
Gordon Street

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html

More information about the Beowulf mailing list