[Beowulf] NFS over RDMA performance confusion
Ellis H. Wilson III
ellis at cse.psu.edu
Thu Sep 13 05:56:29 PDT 2012
On 09/13/2012 08:34 AM, Joe Landman wrote:
> On 09/13/2012 07:52 AM, holway at th.physik.uni-frankfurt.de wrote:
>> If I set up a single machine to hammer the fileserver with IOzone I see
>> something like 50,000 IOPS but if all four machines are hammering the
>> filesystem concurrently we got it up to 180,000 IOPS.
> I wouldn't recommend IOzone for this sort of testing. Its not a very
> good load generator, and it has a tendency to report things which are
> not actually seen at the hardware level. I'd noticed this some years
> ago, when running some of our benchmark testing on these units, that an
> entire IOzone benchmark completed with very few activity lights going on
> the disks. Which suggested that the test was happily entirely cached,
> and I was running completely within cache.
I assume so, but just to be clear you witnessed this behavior even with
the -I (directio) parameter?
>> Can anyone tell me what might be the bottleneck on the single machines?
>> Why can I not get 180,000 IOPS when running on a single machine.
Can you rerun those tests with, 16 and 32 procs? I've run into some
pretty wacky relationships between numbers of cores, procs, and disks in
the subsystem. I assume your machine has 8 cores, and I tend to find
around 2 processes per core to be ideal if the number of disks your
trying to run against are greater than the number of cores. This is a
really handwaving rule-of-thumb, but it's served me alright in the past
as a first benchmark.
Also, are you flushing your caches in between runs? Regardless of what
iozone promises, that's the safest thing to do. If you somehow have
root access to your file servers I'd do the same on them as well to be
> Some observations ... these don't sound like disk subsystems. A 15k RPM
> drive will give you ~300 IOPs. To get 50k IOPs, you would need 167 disk
Maybe I'm misreading the iozone command line, as I haven't played with
that for 6+ months, but it looks like your writing 8 tiny ten megabyte
files. This kind of really tiny benchmarking gives a lot of credence to
Joe's fear this is running out of cache exclusively.
I almost always try to do tests that are twice the size of all of my
machines (all clients + all file servers) to get a real idea of
steady-state throughput. But I'm more concerned with absolutely huge
I/O so perhaps this isn't your use-case. If your use case is just 80
Megs, screw it, you don't need a file server, you need to run MySQL in
solely main-memory mode.
More information about the Beowulf