Hi Gerry,<br><br> I'm by no means an expert on WRF so take the following with a grain of salt, but I'm inclined to think that WRF wouldn't really run very well on a cluster of PS3s. The problem being that with < 200 MB total, giving ~ 25MB per SPU, you're limited to a pretty small number of grid points per SPU, which means they'll fly through all the computations on those very few grid points,... and then very, very slowly communicate through the gigabit network. Even if you can get 2GB on each PS3, that's still only 256MB per SPU, right?<br>
<br> Again, my WRF experience is admittedly tremendously limited but a recent 3D run I did with a 300x300x200 domain size required a little over 12GB of RAM, I believe. The code had a few custom modifications, but I doubt that changed the run-time characteristics drastically, and the resulting run took something like 12.8 seconds on 8 processors,... and 11.8 seconds on 16 processors. (Two nodes and four nodes in this case.) Speeding up the calculations through smaller grids and the very fast SPUs just means that the communication would be, relatively speaking, even longer.<br>
<br> Since we do have some people who need to run some pretty large WRF models, I'd be happy if this <i>did</i> work, but if you're interested in novel architectures for WRF, I would think that perhaps a GPU (or FPGA with many FP units) connected to a PCI-Express bus with Infiniband links would be nicer. The IB would hopefully allow you to balance out the extremely fast computations. If I can, once the double precision GPUs are out, I'll be picking one up for experimentation, but mostly for home-grown codes - WRF may take a bit more effort. The guys are NCAR do seem to have done some work in this area, though, running one of WRF's physics modules on an NVIDIA 8800 card - you can read about here: <a href="http://www.mmm.ucar.edu/wrf/WG2/michalakes_lspp.pdf">http://www.mmm.ucar.edu/wrf/WG2/michalakes_lspp.pdf</a><br>
<br> My two cents. :-)<br><br>
(PS. Ooh, now, if one could have a 'host system' with a large amount
of RAM to pipe in to the GPU, running very large models, I could see
that potentially working well as an <i>accelerator</i>. Say, 32-64GB of RAM, of
which it deals with 2 x 128MB 'tiles' at a time - one being cached and written
by the GPU while the other computes - and once all the acceleration
is done, use the host to quickly synchronize via IB with other large
nodes. But that's probably a fair amount of work!)<br>
<br><br> - Brian<br><br>Brian Dobbins<br>Yale Engineering HPC<br><br>