> This reminds me of a similar issue I had.  What approaches do you 
> take for large dense matrix multiplication in MPI, when the matrices
> are too large to fit into cluster memory?  If I hack up something to
> cache intermediate results to disk, the IO seems to drag everything 
> to a halt and I'm looking for a better solution.  I'd like to use 
> some libraries like PETSc, but how would you work around memory 
> limitations like this (short of building a bigger cluster)? 

Dear Peter, 

There are many algorithms for Matrix operations that depend on the 
properties of the matrix and the operation.
You can easily add writing to a tmpfs RAM disk filesystem to speed methods 
that involve reading and writing of temporary files. 

So what I do now is take those old Fortran codes that read and write files 
and keep the intermediate result files in ramdisk. 


