[Beowulf] NFS over RDMA performance confusion
Ellis H. Wilson III
ellis at cse.psu.edu
Thu Sep 13 08:38:11 PDT 2012
On 09/13/2012 11:21 AM, holway at th.physik.uni-frankfurt.de wrote:
>> I assume so, but just to be clear you witnessed this behavior even with
>> the -I (directio) parameter?
>
> Yes.
Sorry for the confusion, this question was aimed at Joe, not you Andrew.
I was wondering if Joe had seen caching effects even when using IOzone
with the -I parameter.
>> Can you rerun those tests with, 16 and 32 procs? I've run into some
>
> 1 proc Children see throughput for 1 random writers = 46036.32 KB/sec
> 2 proc Children see throughput for 2 random writers = 82828.13 KB/sec
> 4 proc Children see throughput for 4 random writers = 126709.65 KB/sec
> 8 proc Children see throughput for 8 random writers = 190070.96 KB/sec
> 16 proc Children see throughput for 16 random writers = 273970.94 KB/sec
>
> 1 proc Children see throughput for 1 random readers = 109169.52 KB/sec
> 2 proc Children see throughput for 2 random readers = 202556.82 KB/sec
> 4 proc Children see throughput for 4 random readers = 381504.25 KB/sec
> 8 proc Children see throughput for 8 random readers = 719108.27 KB/sec
> 16 proc Children see throughput for 16 random readers = 1152648.13 KB/sec
Ah, this looks great! You added 50% IOPs with the doubling of procs,
and I would bet you could squeeze a little more out by going to 24 or 32
procs.
> I am quite sure I am not doing any local caching.
Ok, great! But remote caching likely is still happening unless you blow
away those files in between runs, so make sure you're doing that.
Obviously this is harder for the reads, but if you have root permissions
to the nexgenta gear just nuke the kernel buffer cache on that end.
> Why is each process IO limited like that?
Anytime a process is forced to wait or does so voluntarily, you are
going to run into this type of limiting. By increasing the numbers of
threads or processes you are able to "hide" some of this turn-around gap
because another process that is available to run jumps right in and uses
the bandwidth.
My dissertation /should/ fix this such that a single process can get
full bandwidth, but that's some 2 years and a bunch of sleepless nights
away, so don't hold your breath, ;D.
Best,
ellis
More information about the Beowulf
mailing list