[Beowulf] since we are talking about file systems ...

Joe Landman landman at scalableinformatics.com
Tue Jan 17 09:02:06 PST 2006


I created a simple perl code to create lots of small files in a 
pre-existing directory named "dir" below the current directory.  This 
code runs like this

	dualcore:/local/files # ./files.pl 50000
	Creating N=50000 files
	Creating N files took 20 wallclock secs ( 0.57 usr + 13.99 sys = 14.56 
CPU) seconds

then looking at the files

	dualcore:/local/files # time ls dir | wc -l
	50002
	
	real    0m0.131s
	user    0m0.094s
	sys     0m0.040s

also doesn't take much time.  Then again, this may be due to caching, 
the md0 raid device the filesystem is on, or any number of other things.

What's interesting about this is the amount of wasted space more than 
anything.

Each file is on the order of 21 bytes or less.  50000 of them should be 
about 1 MB.  Right?

No.

	dualcore:/local/files/dir # ls -alF f10011.dat
	-rw-r--r--  1 root root 21 Jan 17 13:10 f10011.dat
	dualcore:/local/files/dir # du -h .
	198M    .

ext3 isn't any better, giving about 197M.

Reiser theoretically takes care of stuff like this, though it has enough 
other issues that we won't use it (again).

Note:  for laughs, I ran the same code (one character modification to 
work under windows) under windows using the late model ActiveState port 
of Perl.  Running on an NTFS file system, fast local disk, 1 GB ram, 
windows XP latest patch updates.  Interesting results.

	C:\test>perl files.pl 50000
	Creating N=50000 files
	Creating N files took 187 wallclock secs ( 6.28 usr + 15.84 sys 		= 
22.13 CPU) seconds

Yes, we had a virus scanner running (who doesn't) under windows.  Ok, 
turn off the virus scanner (McAfee).  This is a *dangerous* way to run 
windows as all of us know.

	C:\test>perl files.pl 50000
	Creating N=50000 files
	Creating N files took 27 wallclock secs ( 2.98 usr +  7.95 sys = 	10.94 
CPU) seconds

Ok, thats better, but still not where we need.  More importantly, we had 
to turn off the only protection we have against viri and malware on this 
platform in order to achieve these results.  You can be fast or you can 
be safe on this platform.  You get to pick exactly one of these two options.

Ok, now try CIFS (running on a *fast* SAMBA server).  Had to add a "\\" 
to the sprintf to replace the "\" that worked in windows, which replaced 
the "/" which worked in linux.

Ummm... going on 10 minutes now, and it still hasn't returned.  Looks 
like it is creating 50-80 files per second.  Running a quick ls in that 
directory from the file server itself shows about 36k files out of 50k.

So I wanted to see if this was a SAMBA server problem.  Ran

	time smbclient -U landman //crunch-r/big

from another linux machine.  Logged in.  cd'ed to dir.  typed ls. 
exited.  Even with the interaction time in there (password entry, etc), 
this took *only* 10 seconds wall clock.  Doesn't sound like the SAMBA 
server is the issue.  The machine (PC with windows XP) was not swapping, 
not running anything else, virus checker is off...

	J:\>perl c:\test\files.pl 50000
	Creating N=50000 files
	Creating N files took 2297 wallclock secs ( 5.05 usr + 24.41 sys = 
29.45 CPU) seconds

The files.pl code is on my download page 
http://downloads.scalableinformatics.com
	
-- 
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615




More information about the Beowulf mailing list