[Beowulf] RHEL5 network throughput/scalability

Walid walid.shaari at gmail.com
Thu Jun 12 06:32:38 PDT 2008

Hi All,

I have an issue with a new cluster setup where the nodes are RHEL5.1(with
the latest 5.2 kernel), when i try to write NFS data, the nodes scale
linearly until they reach the 10th node, that is the bandwidth , and
throughput seen from the NFS sever on the other side of the nodes shows a
liner increment from around 100+Mbyte/sec up to 1Gbyte/sec, however when we
add another extra node to the equation the bandwidth/throughput becomes
erratic/inconsistent, and drops to around 500-700Mbyte/sec. however if i try
the same setup with RHEL4U6 i do not get the same behaviour it sustains the
bandwidth at 1Gbyte/sec. the setup is like this 48 nodes sharing 48 port
access switch that is up linked  using 10g link to a CISCO 6509 switch which
is linked to a Clustered NFS File system that consist of eight heads where
each head linked using a 10G link to the 6509. the above was a write test,
so i thought may be the tcp congestion kicked in, or sliding windows
problem, however when i do a read test it gets worse, the scalability now is
reduced to 5 nodes that is one node is able to read around 100 MBps, two
will read double, and so on until you add the fifth node where the bandwidth
drops from around 500+MBps to around 300, and again from RHEL4 the behaviour
is different.

any pointers?


