nbd/nfs speeds - using buffers instead of network
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Velocet math at velocet.caMon Jun 4 12:41:07 PDT 2001
- Previous message: Gaussian 98
- Next message: TotalView
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
I've got our prototype cluster (8 nodes, expanding to 40 odd later)
booted on RPL+DHCP/BOOTP on these PcChips M810 boards... Linux 2.4.
Even with NFSv3 we're only getting about 3Mb/s at max xfer across
to NFS. Not sure what the problem is (we're using a RAID but we're gonna
go to a buncha raid 0 ATA-66 drives for scratch files - the raid maxes
out at 6Mb/s for now).
One thing we're gonna try is NBD. This allows remote mounting of a file
on the server (via a server side daemon listening on a port, or via
inetd) as a filesystem. There are many interesting possibilities with this
which are quite useful to beowulf admins. (raid0 across a network! ;)
However the NBD device just freezes whenever I write to it... cant even
make a filesystem on it. Im using pavel's original nbd-client
and server stuff, and the only other package that I can find that might
do this stuff is the Enhanced NBD on Freshmeat, but the code wont compile.
(I have a 2.4.2 kernel and my lots of header files seem to be totally
incompatible with the code. I am not enough of a code guru to wade through
all the fixes required).
The Nbd-server stuff at SCYLD (ironically enough) doenst exist on their
FTP server ('file not found'). I've asked the maintainer of the code
for help (no reply yet, but then again I just emailed a few hours ago).
So a couple things to ask here:
1) has someone got the nbd client and server code? not pavel's versions, I
know they dont work for me. Perhaps alternate versions wont work either,
but I dont know where to start to figure this out.
2) I need some other way to keep the scratch files off the network. I Have
256Mb in my prototype nodes, but we'll be increasing that to 512Mb. I
found under FreeBSD the MFS filesystem (memory fs) works quite well
because it writes to ram when needed (albeit via useland - costs about
1.5% cpu overall for my types of jobs), but when it goes beyond actual
core, it will swap - in my case it would be to the NFS server, which is fine -
only machines that have full MFS's would write at all, and we've calculated
things to be such that this is rare. I Have the network setup to be able
to handle the average load (which is about .25 to .5 Mbps average per node).
Ie the scratch files generally fit into buffers/local core
Aside: seems to me I should let g98 manage the ram itself and avoid
needing to write scratch files anyway - but there will always be a situation
where the scratch files may come out larger than local core and I;ll need
a backup system to handle it, and I need more than 3Mb/s (or 600K/s which
is what I see g98 getting in trafshow).
There's no MFS in Linux unfortunately, but I need something like it.
Ramdisk doesnt work because when its full, its full and there's catastrophic
failure (unless g98 understands alternate scratch areas? I havent checked
that now that i think of it).
NFS or NBD could both work here - except NBD has the potential to run
far faster I think - the key to the whole thing however is ensuring local
buffers are used for r/w instead of syncing back to the server all the time
for a read on a recently written file.
To compound things Linux 2.4 is bitching about locking on the FreeBSD 4.3
NFSv3 raid server - I found things will only work with -o nolock on my Linux
mounts - I think this may not allow me to use local buffers for rereading
recently written scratch files from buffer and avoid abusing the network.
I guess I can fix this in one of several ways:
- get NFSv3 locking working to allow local buffers for writes/rewrites
- get nbd working with extremely long sync/kupdate times so it almost
never syncs back buffers across the net unless they're full
Anyone got other suggestions?
Im doing some bonnie tests here on the raid I see from one of the nodes and Im
getting respectable speeds - 10MB/s for small files - which is higher than I
get on local bonnies on the raid server itself (6MB/s max write speed of the
disks) - with larger test volumes (larger than free mem/max possible buffers)
I get slower speeds, which suggests some buffering is going on for
writing/rewriting which is what I need.
(slightly odd stats below, but... could be the raid was being used... others
have access to it, Im only storing log/source files there, scratch soon moving
as I said to raid 0 ata-66 or -100):
rw,v3,rsize=16384,wsize=16384,hard,udp,nolock,addr=raid:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
n1 1* 5 10406 85.4 8857 3.5 910 0.2 11842 87.9 154421 90.5 623.4 5.5
n1 10* 5 1074 8.5 3527 1.6 837 0.6 11355 86.3 194485 87.4 693.0 5.9
n1 1* 100 3896 30.7 3800 2.2 835 0.6 11441 86.0 191955 84.4 459.5 3.4
and another try (hmm locking not an issue anymore?)
lock,rw,nfsvers=3,rsize=16384,wsize=16384,udp,soft:
-------Sequential Output-------- ---Sequential Input-- --Random--
-Per Char- --Block--- -Rewrite-- -Per Char- --Block--- --Seeks---
Machine MB K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU K/sec %CPU /sec %CPU
n1 1* 5 10199 85.7 8833 6.9 906 0.7 11811 90.0 154012 60.2 587.2 5.3
n1 1* 100 2255 17.7 3447 2.0 834 0.6 11531 86.0 192423 86.4 474.0 3.9
/kc
--
Ken Chase, math at velocet.ca * Velocet Communications Inc. * Toronto, CANADA
- Previous message: Gaussian 98
- Next message: TotalView
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
