[Beowulf] Problem with Single RAID disk larger than 2TB and Linux

Anand Vaidya anandvaidya.ml at gmail.com
Wed Oct 3 21:36:56 PDT 2007


My apologies for top posting. Thanks to all the  respondents. I have collated 
your comments and questions into one response. Hope it is more useful...

[Joe Landman]: recommended using a newer version of parted

[My Reply]: We are seeing the error right when the driver loads, even before 
Linux is fully booted up. So I guess parted versions etc have no role to 
play.

[Leif Nixon]: As far as I know, neither the Hitachi AMS series nor the Hitachi 
NSC series support LUNs > 2TB.

[My Reply]: The Hitachi folks here claims any LUN size (< physical capacity) 
is configurable, 3TB definitely should be fine on the Hitachi storage side, 
but cannot comment on OS / HBA-drivers etc.

I did some search and found a document online that stated 2TB is max LUN size 
for some Enterprise storage , but I am still trying to ascertain the fact for 
Hitachi. However, we have seen the HDS Storage config GUI where a 3TB LUN was 
created and zoned correctly.

[Guy Coates]: Recommended staying with 2TB LUNs.

[My Reply]: While I agree that staying under 2TB and using LVM to stripe will 
avoid a lot of issues, this sizing is needed by the end customer, .... but I 
thought with x86_64 linux is 64-bit clean everywhere! (SCSI LBA  atleast?)

Moreover, I can earn karma points if I can find a bug or two in linux or linux 
driver :-) and contribute to the improvement of GNU/Linux. So while there is 
a temporary solution, I intend to pursue the matter to completion.

[Mark Hahn]: Suggested checking whether driver supports 16-byte READ CAPACITY. 

[My Reply]: I did check with Emulex. They say, "lpfc" driver does use 16-byte 
read cap. They think the Linux SCSI layer is the underlying cause. I am still 
working with their support staff on gathering detailed info via  "System Grab 
Diag Tool"

They say "The error would occur immediately after the lpfc driver loads 
because the target is then available (the Fibre Channel controller of the HDS 
storage) and the SCSI mid-layer has queried the LUNs, that then generates the 
error. The lpfc driver does not query the LUNs and so would never request a 
capacity nor any other inforamation. The SCSI mid-layer and higher levels 
would perform this function."

One thing I am very happy about is the level of support offered by Emulex, 
even though I have not purchased the product from them directly and cannot 
even furnish an Emulex partno. for the HBA. I wish other companies were as 
good at support.


Regards
Anand

On Thursday 04 October 2007 00:46:43 Joe Landman wrote:
> Hi Anand:
>
> Anand Vaidya wrote:
> > Dear Beowulfers,
> >
> > We ran into a problem with large disks which I suspect is fairly common,
> > however the usual solutions are not working.  IBM, RedHat have not been
> > able to provide any useful answers so I am turning to this list for
> > help. (Emulex is still helping, but I am not sure how far they can go
> > without access to the hardware)
> >
> > Details:
> >
> > * Linux Cluster for Weather modelling
> >
> > *  IBM Bladecenter blades and an IBM x3655 Opteron head node FC attached
> > to a Hitachi Tagmastore SAN storage, Emulex LightPulse FC HBA,
> > PCI-Express, Dual port
> >
> > * RHEL 4update5, x86_64 kernel 2.6.9-55 SMP and RHEL provided Emulex
> > driver (lpfc) and lpfcdfc also installed
>
> There is a problem in some parted versions (prior to 1.8.x) where they
> munge the partition table on gpt/large disks.
>
> Apart from suggesting a more modern kernel (2.6.22.6 or so), there may
> be other things you can do.
>
> > * GPT partition created with parted
>
> I presume this is 1.6.19 parted?
>
> > There is one 2TB LUN, works fine.
> >
> > There is a 3TB LUN on the Hitachi SAN which is reported as "only" 2199GB
> > ( 2.1TB) ,
>
> Yup.  Sounds like either the parted problem, or a driver issue.
>
> What does
>
> 	parted /dev/3TBlun/ print
>
> report where /dev/3TBlun  is the device containing the 3TB lun?
>
> We had seen this behavior with some 1.6.9 and 1.7.x versions of parted.
>   The only way to fix it was to rebuild parted with 1.8.x
>
> > We noticed that, when the emulex driver loads, the following error
> > message is reported:
> >
> >             Emulex LightPulse Fibre Channel SCSI driver 8.0.16.32
> > <http://8.0.16.32>
> >             Copyright(c) 2003-2007 Emulex.  All rights reserved.
> >             ACPI: PCI Interrupt 0000:2d:00.0[A] -> GSI 18 (level, low)
> > -> IRQ 185
> >             PCI: Setting latency timer of device 0000:2d:00.0 to 64
> >             lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data:
> > x2 x4 x1000
> >             lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data:
> > x2 x4 x1000
> >             lpfc 0000:2d:00.0: 0:1303 Link Up Event x3 received Data: x3
> > x1 x10 x0
> >             scsi5 : IBM 42C2071 4Gb 2-Port PCIe FC HBA for System x on
> > PCI bus 2d device 00 irq 185 port 0
> >             Vendor: HITACHI   Model: OPEN-V*3          Rev: 5007
> >             Type:   Direct-Access                      ANSI SCSI
> > revision: 03
> >             sdb : very big device. try to use READ CAPACITY(16).
>
> This is what our JackRabbit reports ...
>
> >             sdb : READ CAPACITY(16) failed.
>
> This is not what our JackRabbit reports.
>
> [...]
>
> > The problem is with the READ CAPACITY(16) failed, but we are unable to
> > find the source of this error.
> >
> > We conducted several experiments without success:
> >
> > - Tried compiling the latest driver from Emulex (8.0.16.32
> > <http://8.0.16.32>) - same error
> > - Tried Knoppix (2.6.19) and Gentoo LiveCD (2.6.19 ) , and CentOS 4.4
> > - same error
>
> Sounds a great deal like parted.
>
> > - Tried to boot Belenix (Solaris 32 bit live), failed to boot completely
> > (may be unrelated issue)
> >
> > We have a temporary workaround in place: We created 3x1TB disks and used
> > LVM to create a striped 3TB  volume with ext3 FS. This works fine.
> >
> > RedHat claims ext3 and RHEL4  supports disks upto 8TB and 16TB
> > respectively (since RHEL4u2)
>
> ... yeah.
>
> > I would like to know if anyone on the list has any pointers that can
> > help us solve the issue.
>
> Please run the parted command as indicated.  Lets see what the partition
> table thinks it is.
>
> Do you have data on that partition?  Can you remake the label on that
> device with a new version of parted?
>
> > Regards
> > Anand Vaidya
> >
> >
> >
> >
> > ------------------------------------------------------------------------
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org
> > To change your subscription (digest mode or unsubscribe) visit
> > http://www.beowulf.org/mailman/listinfo/beowulf





More information about the Beowulf mailing list