Each node of my Beowulf cluster has two CPUs. The memory is shared between the two CPUs. How is MPI handling the memory in this situation? What is the most efficient way to program under this situation? Thanks.