nbd/nfs speeds - using buffers instead of network
Velocet
math at velocet.ca
Mon Jun 4 12:41:07 PDT 2001
I've got our prototype cluster (8 nodes, expanding to 40 odd later)
booted on RPL+DHCP/BOOTP on these PcChips M810 boards... Linux 2.4.
Even with NFSv3 we're only getting about 3Mb/s at max xfer across
to NFS. Not sure what the problem is (we're using a RAID but we're gonna
go to a buncha raid 0 ATA-66 drives for scratch files - the raid maxes
out at 6Mb/s for now).
One thing we're gonna try is NBD. This allows remote mounting of a file
on the server (via a server side daemon listening on a port, or via
inetd) as a filesystem. There are many interesting possibilities with this
which are quite useful to beowulf admins. (raid0 across a network! ;)
However the NBD device just freezes whenever I write to it... cant even
make a filesystem on it. Im using pavel's original nbd-client
and server stuff, and the only other package that I can find that might
do this stuff is the Enhanced NBD on Freshmeat, but the code wont compile.
(I have a 2.4.2 kernel and my lots of header files seem to be totally
incompatible with the code. I am not enough of a code guru to wade through
all the fixes required).
The Nbd-server stuff at SCYLD (ironically enough) doenst exist on their
FTP server ('file not found'). I've asked the maintainer of the code
for help (no reply yet, but then again I just emailed a few hours ago).
So a couple things to ask here:
1) has someone got the nbd client and server code? not pavel's versions, I
know they dont work for me. Perhaps alternate versions wont work either,
but I dont know where to start to figure this out.
2) I need some other way to keep the scratch files off the network. I Have
256Mb in my prototype nodes, but we'll be increasing that to 512Mb. I
found under FreeBSD the MFS filesystem (memory fs) works quite well
because it writes to ram when needed (albeit via useland - costs about
1.5% cpu overall for my types of jobs), but when it goes beyond actual
core, it will swap - in my case it would be to the NFS server, which is fine -
only machines that have full MFS's would write at all, and we've calculated
things to be such that this is rare. I Have the network setup to be able
to handle the average load (which is about .25 to .5 Mbps average per node).
Ie the scratch files generally fit into buffers/local core
Aside: seems to me I should let g98 manage the ram itself and avoid
needing to write scratch files anyway - but there will always be a situation
where the scratch files may come out larger than local core and I;ll need
a backup system to handle it, and I need more than 3Mb/s (or 600K/s which
is what I see g98 getting in trafshow).
There's no MFS in Linux unfortunately, but I need something like it.
Ramdisk doesnt work because when its full, its full and there's catastrophic
failure (unless g98 understands alternate scratch areas? I havent checked
that now that i think of it).
NFS or NBD could both work here - except NBD has the potential to run
far faster I think - the key to the whole thing however is ensuring local
buffers are used for r/w instead of syncing back to the server all the time
for a read on a recently written file.
To compound things Linux 2.4 is bitching about locking on the FreeBSD 4.3
NFSv3 raid server - I found things will only work with -o nolock on my Linux
mounts - I think this may not allow me to use local buffers for rereading
recently written scratch files from buffer and avoid abusing the network.
I guess I can fix this in one of several ways:
- get NFSv3 locking working to allow local buffers for writes/rewrites
- get nbd working with extremely long sync/kupdate times so it almost
never syncs back buffers across the net unless they're full
Anyone got other suggestions?
Im doing some bonnie tests here on the raid I see from one of the nodes and Im
getting respectable speeds - 10MB/s for small files - which is higher than I
get on local bonnies on the raid server itself (6MB/s max write speed of the
disks) - with larger test volumes (larger than free mem/max possible buffers)
I get slower speeds, which suggests some buffering is going on for
writing/rewriting which is what I need.
(slightly odd stats below, but... could be the raid was being used... others
have access to it, Im only storing log/source files there, scratch soon moving
as I said to raid 0 ata-66 or -100):
rw,v3,rsize=16384,wsize=16384,hard,udp,nolock,addr=raid:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
n1 1* 5 10406 85.4 8857 3.5 910 0.2 11842 87.9 154421 90.5 623.4 5.5
n1 10* 5 1074 8.5 3527 1.6 837 0.6 11355 86.3 194485 87.4 693.0 5.9
n1 1* 100 3896 30.7 3800 2.2 835 0.6 11441 86.0 191955 84.4 459.5 3.4
and another try (hmm locking not an issue anymore?)
lock,rw,nfsvers=3,rsize=16384,wsize=16384,udp,soft:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
n1 1* 5 10199 85.7 8833 6.9 906 0.7 11811 90.0 154012 60.2 587.2 5.3
n1 1* 100 2255 17.7 3447 2.0 834 0.6 11531 86.0 192423 86.4 474.0 3.9
/kc
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
More information about the Beowulf
mailing list