NAS

Donald Becker becker at scyld.com
Tue Jul 2 08:47:01 PDT 2002


On Tue, 2 Jul 2002, Mark Hahn wrote:

> > > E.g., can I NFS mount a 8TB file system under Linux? (kernel 2.4.18).

NFS v2 only supports 32 bit byte offsets in the on-wire protocol.
You need to run NFS v3 to get "64 bit" support, with the limit then
being the kernel.

> > No.  Even if filesystem can do 64bit support, the underlying block device
> > (and other places, as well) uses 32bit count of 1k blocks.
> 
> I haven't looked myself, but at OLS last week, Andreas Dilger 
> was talking about block/ext2/ext3-type limits, and the number
> I remember is 16 TB.  I presume that's actually 2^(31+9),
> that is, a signed count of sectors.

That's the optimistic limit, with an
   unsigned block offset and 4KB blocks
   signed block offset and 8KB blocks
   2^(32+12)
With more conservative assumptions you get the values 
   31+9   Don't trust signed/unsigned block offset assupmtion, 512 IDE blocks
   31+10  Most filesystems actually use minimum 1KB IDE dual-blocks.
   31+12  And filesystems typically use 4KB blocks.
   32+12  Block offsets can be treated as unsigned, with testing.

>  I don't recall whether 
> xfs/jfs/reiserfs presented a different limit, or whether 16 TB
> was only for post-2.4 kernels.

Various filesystems present smaller limits.  One argument is that there
is little reason to push the kernel into 64 bit block offsets, when most
filesystems on-disk formats still keep their block offsets as 32 bit
values.  The XFS, JFS and ReiserFS formats don't have this limit.

It is unexpected that the block offset is an issue.  Many people saw
byte offsets would be a problem before 32 bit machines went away.  But
we expected that 64 bit machines would be used for all but the
lowest-end general-purpose computing by now.

It's not that it's especially difficult to go through the kernel source
and change block offsets to be explicitly 64 bit.  However the first few
attempts are likely to be broken, and broken in a Very Bad Way.
The change requires explicit size operations on non-native types in
performance-critical paths, where the atomicity may be implicit.
I'm guessing that in some places overflow and truncation problems
might not be traceable/reportable to the originating application.

-- 
Donald Becker				becker at scyld.com
Scyld Computing Corporation		http://www.scyld.com
410 Severn Ave. Suite 210		Second Generation Beowulf Clusters
Annapolis MD 21403			410-990-9993




More information about the Beowulf mailing list