[Beowulf] since we are talking about file systems ...
Joe Landman
landman at scalableinformatics.com
Tue Jan 17 09:02:06 PST 2006
I created a simple perl code to create lots of small files in a
pre-existing directory named "dir" below the current directory. This
code runs like this
dualcore:/local/files # ./files.pl 50000
Creating N=50000 files
Creating N files took 20 wallclock secs ( 0.57 usr + 13.99 sys = 14.56
CPU) seconds
then looking at the files
dualcore:/local/files # time ls dir | wc -l
50002
real 0m0.131s
user 0m0.094s
sys 0m0.040s
also doesn't take much time. Then again, this may be due to caching,
the md0 raid device the filesystem is on, or any number of other things.
What's interesting about this is the amount of wasted space more than
anything.
Each file is on the order of 21 bytes or less. 50000 of them should be
about 1 MB. Right?
No.
dualcore:/local/files/dir # ls -alF f10011.dat
-rw-r--r-- 1 root root 21 Jan 17 13:10 f10011.dat
dualcore:/local/files/dir # du -h .
198M .
ext3 isn't any better, giving about 197M.
Reiser theoretically takes care of stuff like this, though it has enough
other issues that we won't use it (again).
Note: for laughs, I ran the same code (one character modification to
work under windows) under windows using the late model ActiveState port
of Perl. Running on an NTFS file system, fast local disk, 1 GB ram,
windows XP latest patch updates. Interesting results.
C:\test>perl files.pl 50000
Creating N=50000 files
Creating N files took 187 wallclock secs ( 6.28 usr + 15.84 sys =
22.13 CPU) seconds
Yes, we had a virus scanner running (who doesn't) under windows. Ok,
turn off the virus scanner (McAfee). This is a *dangerous* way to run
windows as all of us know.
C:\test>perl files.pl 50000
Creating N=50000 files
Creating N files took 27 wallclock secs ( 2.98 usr + 7.95 sys = 10.94
CPU) seconds
Ok, thats better, but still not where we need. More importantly, we had
to turn off the only protection we have against viri and malware on this
platform in order to achieve these results. You can be fast or you can
be safe on this platform. You get to pick exactly one of these two options.
Ok, now try CIFS (running on a *fast* SAMBA server). Had to add a "\\"
to the sprintf to replace the "\" that worked in windows, which replaced
the "/" which worked in linux.
Ummm... going on 10 minutes now, and it still hasn't returned. Looks
like it is creating 50-80 files per second. Running a quick ls in that
directory from the file server itself shows about 36k files out of 50k.
So I wanted to see if this was a SAMBA server problem. Ran
time smbclient -U landman //crunch-r/big
from another linux machine. Logged in. cd'ed to dir. typed ls.
exited. Even with the interaction time in there (password entry, etc),
this took *only* 10 seconds wall clock. Doesn't sound like the SAMBA
server is the issue. The machine (PC with windows XP) was not swapping,
not running anything else, virus checker is off...
J:\>perl c:\test\files.pl 50000
Creating N=50000 files
Creating N files took 2297 wallclock secs ( 5.05 usr + 24.41 sys =
29.45 CPU) seconds
The files.pl code is on my download page
http://downloads.scalableinformatics.com
--
Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax : +1 734 786 8452
cell : +1 734 612 4615
More information about the Beowulf
mailing list