[Beowulf] GPFS on Linux (x86)

Thu Sep 14 17:30:32 PDT 2006

>> did you mention the kind of compute/client load you've got?
>
> During periods of high load, when they are IO waiting, the web servers
> can reach load averages of 25 - 35.  The file servers will reach a
> load average off 15 - 16.

I've seen this caused by bad storage controllers (yes, dell PERC's 
in particular).  if you have an opportunity, you should definitely
test the hardware (under an easy, streaming workload like just 
"dd if=/dev/zero of=testfile bs=64k").  I'm guessing you are cursed
with a "raid" card which delivers 10-15 MB/s regardless of how many
50 MB/s disks you have plugged into it...

>> uh, that sounds fine - web traffic tends to be quite read-cache
>> friendly, which NFS does very nicely.
>
>> From what I read, NFS v3 actually has issues with this.  This is why
> David Howells (at RedHat) is actively writing and adopting "fscache"
> to support NFS.  Additionally, I believe that many people switch to
> AFS for precisely this reason.  Am I mistaken?

I'm aware of fscache, but I have the impression it's mainly aimed 
at write-heavy loads.  though it would certainly work well for 
reads, if you have a hefty local disk system (>=4 wide raid0 with 
tuned stripesize...)

I did a trivial experiment: ran "vmstat 1" on a basic NFS server,
and generated a read load from another machine.  the test file was 
only 1GB, and depending on competition from other users on the client,
it _could_ stay in client-side cache for at least the 10 minutes 
I waited.  2G on the client machine, gigabit between, nothing special
kernel/settings-wise.

> scalable).  We do the following on our NFS mounts which has helped
> considerably:
>
> tcp,nfsvers=3,intr,rsize=16384,wsize=16384,noatime 0 0

how about noatime on underlying FS on the NFS server?
also, tweaking up the ac-related parameters will probably help.

> This may be the case, we are running Dell's stock PERC/4i cards.

I'm very sorry ;)

> Provided NFS's algorithm is actually well implemented, and caching is
> constantly being swapped out given the shear number of files that we
> regularly read  (i.e. they get swapped out because amount on disk
> (460GB) is much smaller than the 2GB ram on front end webs.

hmm.  the total content size is 460G, but what's the working set size?
does a typical workload really wander through all 460G of php scripts!?!

>> how much memory do the web servers have?  if the bottleneck IO really
>> is mostly-read pages, then local dram will help a lot.
>
> 2GB Ram

well, considering that memory is cheap and doesn't require a lot of 
elbow grease or thought, you should probably try the experiment of scraping
together a big-memory webserver and seeing whether it runs a lot faster. 
I bet you a beer that it'll take <=8GB to achieve a good cache hit-rate ;)

regards, mark.