Alpha beowulf: True64 or Linux?
Brian Pomerantz
bapper at piratehaven.org
Thu Feb 1 14:58:05 PST 2001
On Thu, Feb 01, 2001 at 04:42:33PM -0600, Frank Muldoon wrote:
> I have tested my CFD code using Dec's Fortran 90/95 compiler on 2
> identical Alpha 21264's @500Mhz. The ratio of time to finish for
> Tru64/linux was .85. This is right in line with what Dec was saying
> the performance penalty for using Linux on their machines was. Does
> anyone know why this is? I heard something about Linux not having
> page coloring, which I am not familiar with.
>
Page coloring has to do with how cache lines map to pages in memory.
Here is a brief blurb on page coloring from the BSD people:
We'll end with the page coloring optimizations. Page coloring
is a performance optimization designed to ensure that accesses
to contiguous pages in virtual memory make the best use of the
processor cache. In ancient times (i.e. 10+ years ago)
processor caches tended to map virtual memory rather than
physical memory. This led to a huge number of problems
including having to clear the cache on every context switch in
some cases, and problems with data aliasing in the cache.
Modern processor caches map physical memory precisely to solve
those problems. This means that two side-by-side pages in a
processes address space may not correspond to two side-by-side
pages in the cache. In fact, if you aren't careful
side-by-side pages in virtual memory could wind up using the
same page in the processor cache -- leading to cacheable data
being thrown away prematurely and reducing CPU performance.
This is true even with multi-way set-associative caches
(though the effect is mitigated somewhat).
FreeBSD's memory allocation code implements page coloring
optimizations, which means that the memory allocation code
will attempt to locate free pages that are contiguous from the
point of view of the cache. For example, if page 16 of
physical memory is assigned to page 0 of a process's virtual
memory and the cache can hold 4 pages, the page coloring code
will not assign page 20 of physical memory to page 1 of a
process's virtual memory. It would, instead, assign page 21 of
physical memory. The page coloring code attempts to avoid
assigning page 20 because this maps over the same cache memory
as page 16 and would result in non-optimal caching. This code
adds a significant amount of complexity to the VM memory
allocation subsystem as you can well imagine, but the result
is well worth the effort. Page Coloring makes VM memory as
deterministic as physical memory in regards to cache
performance.
There has been a lot of arguing back and forth about whether there is
any benefit to page coloring when you take into consideration that it
is very time consuming and difficult to set up and get right. The
thing that I here REALLY increases performance on many scientific apps
is the use of super pages.
BAPper
More information about the Beowulf
mailing list