[Beowulf] SATA II - PXE+NFS - diskless compute nodes
Greg Kurtzer
gmk at runlevelzero.net
Fri Dec 15 11:49:20 PST 2006
On Dec 14, 2006, at 2:33 PM, Donald Becker wrote:
> On Sat, 9 Dec 2006, Joe Landman wrote:
>> Guy Coates wrote:
>>> At what node count does the nfs-root model start to break down?
>>> Does anyone
>>> have any rough numbers with the number of clients you can support
>>> with a generic
>>> linux NFS server vs a dedicated NAS filer?
>>
>> If you use warewulf or the new perceus variant, it creates a ram disk
>> which is populated upon boot. Thats one of the larger
>> transients. Then
>> you nfs mount applications, and home directories. I haven't
>> looked at
>> Scyld for a while, but I seem to remember them doing something
>> like this.
>
> I forgot to finish my reply to this message earlier this week.
> Since I'm
> in the writing mood today, I've finished it.
>
>
> Just when were getting past "diskless" being being misinterpreted as
> "NFS root"...
I prefer the term "stateless" to describe Warewulf and Perceus
provisioning model (stateless installs may have local disks for swap
and scratch/data space).
> RAMdisk Inventory
>
> We actually have five (!) different types of ramdisks over the system
> (see the descriptions below). But it's the opposite of the Warewulf
> approach. Our architecture is a consistent system model, so we
> dynamically build and update the environment on nodes. Warewulf-like
> ramdisk system only catch part of what we are doing:
The stateless provisioning model has a very different goal then
Scyld's Bproc implementation and thus a comparison is misleading.
>
> The Warewulf approach
> - Uses a manually selected subset distribution on the compute node
> ramdisk.
> While still very large, it's never quite complete. No matter how
> useless
> you think some utility is, there is probably some application out
> there
> that depends on it.
> - The ramdisk image is very large and it has to be completely
> downloaded
> at
> boot time just when the server is extremely.
> - Supplements the ramdisk with NFS, combining the problems of
> both.(*)
> The
> administrator and users to learn and think about how both fail.
I suppose that under some circumstances these observations maybe
applicable, but with that said... I have not heard of any of the
Warewulf or stateless Perceus *users* sharing these opinions.
Regarding the various cluster implementations: there is not one size
fits all, and all of the toolkits and implementation methods have
tradeoffs. Rather then point out the problems in the various cluster
solutions, I would just like to reiterate that people should evaluate
what fits their needs best and utilize what works best for them in
their environment.
>
> (*1) That said, combining a ramdisk root with NFS is still far more
> scalable and somewhat more robust than using solely NFS. With careful
> administration most of the executables will be on the ramdisk,
> allowing
> the server to support more nodes and reducing the likelihood of
> failures.
Well said, I agree. There are also some general policies that will
work reasonably well and doesn't require much system specific tuning
(if any).
> The phrase "careful administration" should be read as "great for
> demos,
> and when the system is first configured, but degrades over time". The
> type of people that leap to configure the ramdisk properly the first
> time are generally not the same type that will be there for long-term
> manual tuning.
What an odd way of looking at it.
Great for demos but not for a long term solution because it degrades
over time???
If you are referring to careless people or admins mucking up the
virtual node file systems I think the physical muckage would be the
least of the concerns when these people have root. Not to mention
blaming the cluster toolkit or provisioning model for allowing the
users the freedom and flexibility to do what they want is
misidentifying the problem.
> Either they figure out why we designed around dynamic,
> consistent caching and re-write, or the system will degrade over time.
Why would a system not built around "dynamic consistent caching and
re-write" degrade over time?
Many thanks!
Greg
--
Greg Kurtzer
gmk at runlevelzero.net
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20061215/aea34951/attachment.html>
More information about the Beowulf
mailing list