[Beowulf] since we are talking about file systems ...

Dan Stromberg strombrg at dcs.nac.uci.edu
Tue Jan 17 14:40:23 PST 2006


On Tue, 2006-01-17 at 12:02 -0500, Joe Landman wrote:
> I created a simple perl code to create lots of small files in a 
> pre-existing directory named "dir" below the current directory.  This 
> code runs like this
> 
> 	dualcore:/local/files # ./files.pl 50000
> 	Creating N=50000 files
> 	Creating N files took 20 wallclock secs ( 0.57 usr + 13.99 sys = 14.56 
> CPU) seconds
> 
> then looking at the files
> 
> 	dualcore:/local/files # time ls dir | wc -l
> 	50002
> 	
> 	real    0m0.131s
> 	user    0m0.094s
> 	sys     0m0.040s
> 
> also doesn't take much time.  Then again, this may be due to caching, 
> the md0 raid device the filesystem is on, or any number of other things.
> 
> What's interesting about this is the amount of wasted space more than 
> anything.
> 
> Each file is on the order of 21 bytes or less.  50000 of them should be 
> about 1 MB.  Right?

Perhaps it's not scaling linearly (quantized linear), because they're
using a hash or btree or something to avoid file lookups being O(n) in
the number of directory entries?

Pretty sure this kind of behavior, if it is the underlying cause, is
tunable with ext3.


> No.
> 
> 	dualcore:/local/files/dir # ls -alF f10011.dat
> 	-rw-r--r--  1 root root 21 Jan 17 13:10 f10011.dat
> 	dualcore:/local/files/dir # du -h .
> 	198M    .
> 
> ext3 isn't any better, giving about 197M.
> 
> Reiser theoretically takes care of stuff like this, though it has enough 
> other issues that we won't use it (again).
> 
> Note:  for laughs, I ran the same code (one character modification to 
> work under windows) under windows using the late model ActiveState port 
> of Perl.  Running on an NTFS file system, fast local disk, 1 GB ram, 
> windows XP latest patch updates.  Interesting results.
> 
> 	C:\test>perl files.pl 50000
> 	Creating N=50000 files
> 	Creating N files took 187 wallclock secs ( 6.28 usr + 15.84 sys 		= 
> 22.13 CPU) seconds
> 
> Yes, we had a virus scanner running (who doesn't) under windows.  Ok, 
> turn off the virus scanner (McAfee).  This is a *dangerous* way to run 
> windows as all of us know.
> 
> 	C:\test>perl files.pl 50000
> 	Creating N=50000 files
> 	Creating N files took 27 wallclock secs ( 2.98 usr +  7.95 sys = 	10.94 
> CPU) seconds
> 
> Ok, thats better, but still not where we need.  More importantly, we had 
> to turn off the only protection we have against viri and malware on this 
> platform in order to achieve these results.  You can be fast or you can 
> be safe on this platform.  You get to pick exactly one of these two options.
> 
> Ok, now try CIFS (running on a *fast* SAMBA server).  Had to add a "\\" 
> to the sprintf to replace the "\" that worked in windows, which replaced 
> the "/" which worked in linux.
> 
> Ummm... going on 10 minutes now, and it still hasn't returned.  Looks 
> like it is creating 50-80 files per second.  Running a quick ls in that 
> directory from the file server itself shows about 36k files out of 50k.
> 
> So I wanted to see if this was a SAMBA server problem.  Ran
> 
> 	time smbclient -U landman //crunch-r/big
> 
> from another linux machine.  Logged in.  cd'ed to dir.  typed ls. 
> exited.  Even with the interaction time in there (password entry, etc), 
> this took *only* 10 seconds wall clock.  Doesn't sound like the SAMBA 
> server is the issue.  The machine (PC with windows XP) was not swapping, 
> not running anything else, virus checker is off...
> 
> 	J:\>perl c:\test\files.pl 50000
> 	Creating N=50000 files
> 	Creating N files took 2297 wallclock secs ( 5.05 usr + 24.41 sys = 
> 29.45 CPU) seconds
> 
> The files.pl code is on my download page 
> http://downloads.scalableinformatics.com
> 	




More information about the Beowulf mailing list