[Beowulf] NFS & Scaling issues
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Amrik Singh asingh at ideeinc.comMon Apr 9 14:13:29 PDT 2007
- Previous message: [Beowulf] NFS & Scaling issues
- Next message: [Beowulf] NFS & Scaling issues
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Thanks for the reply Joe.... atop seems to be a really cool tool that would be very helpful once I get a chance to patch the kernel on file servers (for process level disk usage and ethernet usage information.). I am setting up a test cluster to reproduce the problem. Would post updates as I find more info... Amrik Joe Landman wrote: > Hi Amrik: > > Amrik Singh wrote: >> Hi, >> >> We are running a cluster of 180 diskless compute nodes. 60 of them >> have 32 bit AMD Semptron processors and rest are dual core AMD >> Athelon 64 bit processors. 32 bit machines have 10/100 mbps and rest >> have gigabit ethernet cards. We have four file servers, each hosting >> around 3.5TB on SATA drives connected to 3Ware RAID controller cards >> configured on RAID 10 array. These file servers are exporting the >> drives through NFS. Each file server is running 265 daemons for nfsd. >> >> The file servers are mainly hosting large number of small files >> ranging from 256KB to 2 MB. The compute nodes are primarily doing a >> search through these files, so there is lot's of reading and some >> writing to the file servers. >> >> Recently we started noticing very high (70-90%) wait states on the >> file servers when compute nodes. We have tried to optimize the NFS >> through increasing the number of daemons and the rsize and wsize but >> to no avail. >> >> Can someone point us in the right direction as to how we should be >> trying to troubleshoot this problem. > > You might want to look at the read patterns. > >> >> PS: All the nodes are running SuSE 10.0 and servers are running >> SuSE10.0 and 10.1 and all the drives are formatted with reiserfs. > > Hmmm... I remember Reiser has had a problem in the past when file > systems get full or nearly so. There are file tail optimizations you > might want to turn off, as well as use noatime for mounts. I might > suggest turning to a better file system for your servers (if possible, > it might not be a trivial undertaking), but even then that might not > be responsible. > > Grab a copy of atop (google for it), run it on your file server. See > if it is the file system that is problematic (disk devices running > near 80% or higher capacity for reads/writes all the time). > > Other possibilities are your file access patterns, what the file > server is doing itself, whether or not your networks are being flooded > with small packets (see if your csw is very high, or the number of > interrupts or packets are very high). > > Joe > >> >> >> thanks >> >
- Previous message: [Beowulf] NFS & Scaling issues
- Next message: [Beowulf] NFS & Scaling issues
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
