[Beowulf] Building new cluster - estimate

Joe Landman landman at scalableinformatics.com
Mon Aug 4 15:02:17 PDT 2008

Matt Lawrence wrote:
> On Mon, 4 Aug 2008, Joe Landman wrote:
>> This mirrors our experience, though RHEL stability under intense loads 
>> is questionable IMO (talking about the kernel BTW).  We find that the 
>> missing drivers, the omitted drivers, the backported drivers along 
>> with some odd and often useless "features" (4k stacks anyone?) render 
>> the RHEL default kernels (and by definition the Centos kernels) less 
>> useful for HPC and storage tasks than what we build.  Our current 
>> standard is a kernel which is rock solid under load.  
>> Working on a 2.6.26 based version now (even though I am on 
>> vacation/holiday, I just updated it to to address an observed 
>> crashing issue with the RDMA server)
> Since I plan to continue running CentOS, it sounds like building a much 
> later kernel rpm is the way I want to approach the problem.  Will going 
> to a much later kernel break any of the utilities?  Other problems I can 
> expect to see?

Doesn't break most things.  We usually insert a new RPM and off it goes.

> What do you recommend for the kernel config?
>> Combine this with the small upper limit of ext3 partition sizes, the 
>> file size limits in ext3, the serialization in the journaling code 
>> (ext4 is extents based to help deal with this), ext3 just doesn't make 
>> much sense in a storage/HPC system (apart from possibly boot/root file 
>> system where performance is less critical).  Yeah I have seen studies 
>> from folks whom had done 1E6 removes, file creates, and other things 
>> who claim xfs is slower than ext3.  Yeah, those are bad benchmarks in 
>> that they really don't touch on real end user use cases for the most 
>> part (apart from possible large scale mail servers and other things 
>> like that).
> I have never had any problems with ext3.  I had dinner with a friend who 
> is an expert Linux sysadmin who was warning me to stay away from xfs.  
> He cited lots of fragmentation problems that routinely locked up his 
> systems. I am willing to be convinced otherwise, but he is a very sharp 
> fellow.

I haven't seen or heard anyone claim xfs 'routinely locks up their 
system'.  I won't comment on your friends "sharpness".  I will point out 
that several very large data stores/large cluster sites use xfs.  By 
definition, no large data store can be built with ext3 (16 TB limit with 
patches, 8 TB in practice), so if your sharp friend is advising you to 
do this ...

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452
cell : +1 734 612 4615

More information about the Beowulf mailing list