[Beowulf] GPFS on Linux (x86)

Thu Sep 14 13:57:59 PDT 2006

On 9/14/06, Mark Hahn <hahn at physics.mcmaster.ca> wrote:
> did you mention the kind of compute/client load you've got?

During periods of high load, when they are IO waiting, the web servers
can reach load averages of 25 - 35.  The file servers will reach a
load average off 15 - 16.

> uh, that sounds fine - web traffic tends to be quite read-cache
> friendly, which NFS does very nicely.

>From what I read, NFS v3 actually has issues with this.  This is why
David Howells (at RedHat) is actively writing and adopting "fscache"
to support NFS.  Additionally, I believe that many people switch to
AFS for precisely this reason.  Am I mistaken?

> have you measured the nature of your NFS and SQL loads?

Our SQL loads are fine.  The Database backend performs very well.  The
DB Servers do not interact in any way with NFS.

> > As stated, our FS infrasdtructure leaves much to be desired.  The
> > current setup involving NFS servers (Dell PE 2850 with local 1TB local
> > storage 10K scsi disks) have not performed well.  We are constantly IO
> > waiting.
>
> but _why_?  heavy write load without async NFS (and writeback at the
> block level)?  with multiple local 10K scsi disks, you really shouldn't
> be seek limited, especially if requests are coming over just gigabit.

We went to great lengths to optimize where we could on NFS side.  I
think the issue is not particularly that NFS isn't capable to the
task, because we could certainly (and have) created multiple NFS
servers and then started symlinking our PHP files out into different
directories on different NFS servers.  This as you know is a
management nightmare, hence leading us to the desired features of
global namespace of a parallel / cluster FS (and its also not
scalable).  We do the following on our NFS mounts which has helped
considerably:

tcp,nfsvers=3,intr,rsize=16384,wsize=16384,noatime 0 0

Jumbo frames and link agg on network level as well.

> to me that indicates your disk-local servers are misconfigured.
> (which reminds me - dell has shipped some _astoundingly_ bad raid systems
> marketed as high-end...)

This may be the case, we are running Dell's stock PERC/4i cards.

>
> but web pages will normally be nicely read-cached on the web frontends...

Provided NFS's algorithm is actually well implemented, and caching is
constantly being swapped out given the shear number of files that we
regularly read  (i.e. they get swapped out because amount on disk
(460GB) is much smaller than the 2GB ram on front end webs.

> how much memory do the web servers have?  if the bottleneck IO really
> is mostly-read pages, then local dram will help a lot.

2GB Ram

> not to insult, but I find that the main problem is not understanding
> the workload sufficiently, not lapses in proactivity...

Not taken as an insult, honestly we are open to anything that allows
us to scale our infrastrucutre upwards.