Alpha beowulf: True64 or Linux?

Thu Feb 1 14:58:05 PST 2001

On Thu, Feb 01, 2001 at 04:42:33PM -0600, Frank Muldoon wrote:
> I have tested my CFD code using Dec's Fortran 90/95 compiler on 2
> identical Alpha 21264's @500Mhz.  The ratio of time to finish for
> Tru64/linux was .85.  This is right in line with what Dec was saying
> the performance penalty for using Linux on their machines was.  Does
> anyone know why this is?  I heard something about Linux not having
> page coloring, which I am not familiar with.
> 

Page coloring has to do with how cache lines map to pages in memory.
Here is a brief blurb on page coloring from the BSD people:

	We'll end with the page coloring optimizations. Page coloring
	is a performance optimization designed to ensure that accesses
	to contiguous pages in virtual memory make the best use of the
	processor cache. In ancient times (i.e. 10+ years ago)
	processor caches tended to map virtual memory rather than
	physical memory. This led to a huge number of problems
	including having to clear the cache on every context switch in
	some cases, and problems with data aliasing in the cache.
	Modern processor caches map physical memory precisely to solve
	those problems. This means that two side-by-side pages in a
	processes address space may not correspond to two side-by-side
	pages in the cache. In fact, if you aren't careful
	side-by-side pages in virtual memory could wind up using the
	same page in the processor cache -- leading to cacheable data
	being thrown away prematurely and reducing CPU performance.
	This is true even with multi-way set-associative caches
	(though the effect is mitigated somewhat).                              

	FreeBSD's memory allocation code implements page coloring
	optimizations, which means that the memory allocation code
	will attempt to locate free pages that are contiguous from the
	point of view of the cache. For example, if page 16 of
	physical memory is assigned to page 0 of a process's virtual
	memory and the cache can hold 4 pages, the page coloring code
	will not assign page 20 of physical memory to page 1 of a
	process's virtual memory. It would, instead, assign page 21 of
	physical memory. The page coloring code attempts to avoid
	assigning page 20 because this maps over the same cache memory
	as page 16 and would result in non-optimal caching. This code
	adds a significant amount of complexity to the VM memory
	allocation subsystem as you can well imagine, but the result
	is well worth the effort. Page Coloring makes VM memory as
	deterministic as physical memory in regards to cache
	performance.

There has been a lot of arguing back and forth about whether there is
any benefit to page coloring when you take into consideration that it
is very time consuming and difficult to set up and get right.  The
thing that I here REALLY increases performance on many scientific apps
is the use of super pages.

BAPper