[Beowulf] File server dual opteron suggestions?

Joe Landman landman at scalableinformatics.com
Thu Aug 3 22:03:26 PDT 2006


Mark Hahn wrote:

>> I would recommend upping the memory.  Computing or not, large buffer 
>> caches on file servers are with very rare exception, a preferred config.
> 
> unclear.  the FS's memory does act as an excellent cache, but then again,
> the client memory does too.  do you have a pattern of file accesses in 
> which
> the same files are frequently re-read and would fit in memory?  the servers
> I've looked at closely have had mostly write and attribute activity,
> since the client's own cache already has a high hit-rate.  for writes, of
> course, more FS memory is not important unless you have extremely high 

I was actually assuming read-dominated.  Dave does informatics as I 
remember, and most of the informatics we have dealt with tends to be 
read dominated.  Doesn't mean much though without the workload info 
though.  So I agree with the caution, though I humbly note that a 1GB 
stick costs about 120$ +/- a bit these days.  Eg, it is not a large 
price, and the potential impact on performance is much higher than for 
10k RPM drives.

FWIW I have a pair of 10k RPM SATA raptors and I am not all that 
impressed with them.

> bandwidth net and disks.  in fact, I've been using the following 
> sysctl.conf
> entries:
> 
> # delay writing dirty blocks hoping to collect further writes (default 30s)
> vm.dirty_expire_centisecs = 1000
> # try writing back every 1s (default 500=5s)
> vm.dirty_writeback_centisecs = 100
> 
> in short, don't bother working at write caching much.  with a lot of 
> memory,
> an untuned machine will exhibit unpleasant oscillations of delaying writes
> then frantically flushing.

Yup.  I had my dirty around 250 for a long time.  Write caching is 
harder because if you really want to play it safe, you shouldn't cache 
the write ...

> 
>> 2Gb/socket minimum.  Nothing serves files faster than having them 
>> already sitting in ram.
> 
> true, but is that actually your working set size?  it would be rather 
> embarassing if 3 of the 4 GB were files read once a month...

Hmmm... again, this is a good workload problem.  If Dave's users are 
going through big "databases" from NCBI, lots of ram is a good thing. 
It it is just a buncha small files, yeah, could be overkill.

But if I had to spend extra $$ on ram versus 10kRPM drives, I know where 
I would spend it ...

> 
>>> 4 x 74 Gb disks Ultra320 (or make an argument for a particular SATA)
> 
> SATA disks are SATA disks, of course.  dumb controllers are all pretty
> similar as well (cheap, fast, not-cpu-consuming).  if you have your
> heart set on HW raid, at least get a 3ware 9550, which is quite fast.
> (most other HW raid are surprisingly bad.)

The LSI SAS unit is pretty good.  I like the 3ware, the Areca, and a few 
others.  We just created a nice 500+ MB/s "file server" for a large 
customer out of an Areca card, 16 spindles and some tweaking.  I haven't 
seen production performance data for it yet, but our in house testing 
exceeded the 500 MB/s by a little bit.

>>> dual 10/100/1000 ethernet on the mobo
>>
>> Careful on this... we and our customers have been badly bitten by tg3 
>> and broadcom NICs.  If the MB doesn't have Intel NICs, get an Intel 
>> 1000/MT dual gigabit card.  You won't regret that, and it is money 
>> well spent.
> 
> that's odd; I have quite a few of both tg3 and bcm nics, and can't say 
> I've had any complaints.  what are the problems?

Interrupted to death.  The tg3 doesn't seem to have NAPI turned on by 
default in the standard distro kernels.  Haven't tried the FC* with 
this, hopefully it is saner there.  Under heavy load, we see interrupts 
climb past 40k/s, and it context switches like mad.  Seen this from 
early 2.6 through 2.6.13 on SuSE and RHEL.  Makes using AOE (Coraid) 
nearly useless with Broadcom, formatting the unit with ext3 renders the 
server unusable for hours.  Drop a nice Intel unit in there, do the same 
thing and it works great, server is responsive during formatting.  Same 
issues for file service and heavy load.

Seen this on Tyan, iWill, Arima?, MSI(ibm e32*), and others.

> 
>>> case - 2U (big enough for adequate ventilation, right?)
>>
>> Yeah, just make sure you have good airflow.
> 
> 2U still requires a custom PS, doesn't it?  it's kind of nice to be able 
> to put in an ATX-ish PS.  and is 2U tall enough for stock/standard
> heatsink/fans?

Don't know if it is custom.  I like the redundant PS, but the small 
redundant PSes tend not to supply enough current to boot the system. 
Need a 3U case for that.

Best cooling designs I have seen involve baffles, and a pull or 
push-pull config.  We have used some units where under load the 
processors are happily working around 22-28C.  Fans are loud though. 
Case (1U) is very cool to the touch.

For 2U you still need to worry about flow.  I find it hard to believe 
that most people get efficient flow out the back grating on 2U and 
larger without a helper fan of some sort.

> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf


-- 

Joseph Landman, Ph.D
Founder and CEO
Scalable Informatics LLC,
email: landman at scalableinformatics.com
web  : http://www.scalableinformatics.com
phone: +1 734 786 8423
fax  : +1 734 786 8452 or +1 866 888 3112
cell : +1 734 612 4615




More information about the Beowulf mailing list