[Beowulf] Problem with Single RAID disk larger than 2TB and Linux

Mark Hahn hahn at mcmaster.ca
Wed Oct 3 08:49:42 PDT 2007


> Is someone using a signed int to represent the 1 KB blocks?
> 2 * 1024 * 1024 * 1024 * 1024 = 2199023255552

yes - you can see from the message that the kernel is trying
to use a 16-byte read-capacity command, but it's failing.
these days, scsi and especially fc and super-especially some 
obscure driver like this are less heavily scrutinized, so this 
is not hugely surprising.  I don't know whether the error status
indicates a flaw in the driver or perhaps in the target (which 
might not support large luns).

>> We ran into a problem with large disks which I suspect is fairly common,

in my world, large == sata raid, and scsi/fc is a vanishing breed,
which also coincides with the stunted size of scsi/fc disks.
that said, the only scsi my organization has (embedded in HP SFS clusters)
is (annoyingly) chopped up into piddly little 2TB luns.

>> however the usual solutions are not working.  IBM, RedHat have not been 
>> able to provide any useful answers so I am turning to this list for help. 
>> (Emulex is still helping, but I am not sure how far they can go without 
>> access to the hardware)

I think you should ask them whether the driver supports the 16-byte 
read-capacity.  and you should ask the target provider (hitachi) whether
they support >2TB luns and whether they implement 16-byte commands.

you paid through the nose for FC hardware; you should expect a high level
of service.




>> 
>> Details:
>> 
>> * Linux Cluster for Weather modelling
>> 
>> *  IBM Bladecenter blades and an IBM x3655 Opteron head node FC attached to 
>> a Hitachi Tagmastore SAN storage, Emulex LightPulse FC HBA, PCI-Express, 
>> Dual port
>> 
>> * RHEL 4update5, x86_64 kernel 2.6.9-55 SMP and RHEL provided Emulex driver 
>> (lpfc) and lpfcdfc also installed
>> 
>> * GPT partition created with parted
>> 
>> There is one 2TB LUN, works fine.
>> 
>> There is a 3TB LUN on the Hitachi SAN which is reported as "only" 2199GB ( 
>> 2.1TB) ,
>> 
>> We noticed that, when the emulex driver loads, the following error message 
>> is reported:
>>
>>            Emulex LightPulse Fibre Channel SCSI driver 8.0.16.32
>>            Copyright(c) 2003-2007 Emulex.  All rights reserved.
>>            ACPI: PCI Interrupt 0000:2d:00.0[A] -> GSI 18 (level, low) -> 
>> IRQ 185
>>            PCI: Setting latency timer of device 0000:2d:00.0 to 64
>>            lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data: x2 
>> x4 x1000
>>            lpfc 0000:2d:00.0: 0:1305 Link Down Event x2 received Data: x2 
>> x4 x1000
>>            lpfc 0000:2d:00.0: 0:1303 Link Up Event x3 received Data: x3 x1 
>> x10 x0
>>            scsi5 : IBM 42C2071 4Gb 2-Port PCIe FC HBA for System x on PCI 
>> bus 2d device 00 irq 185 port 0
>>            Vendor: HITACHI   Model: OPEN-V*3          Rev: 5007
>>            Type:   Direct-Access                      ANSI SCSI revision: 
>> 03
>>            sdb : very big device. try to use READ CAPACITY(16).
>>            sdb : READ CAPACITY(16) failed.
>>            sdb : status=1, message=00, host=0, driver=08
>>            sdb : use 0xffffffff as device size
>>            SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
>>            SCSI device sdb: drive cache: write back
>>            sdb : very big device. try to use READ CAPACITY(16).
>>            sdb : READ CAPACITY(16) failed.
>>            sdb : status=1, message=00, host=0, driver=08
>>            sdb : use 0xffffffff as device size
>>            SCSI device sdb: 4294967296 512-byte hdwr sectors (2199023 MB)
>>            SCSI device sdb: drive cache: write back
>> 
>> The problem is with the READ CAPACITY(16) failed, but we are unable to find 
>> the source of this error.
>> 
>> We conducted several experiments without success:
>> 
>> - Tried compiling the latest driver from Emulex (8.0.16.32) - same error
>> - Tried Knoppix (2.6.19) and Gentoo LiveCD (2.6.19 ) , and CentOS 4.4   - 
>> same error
>> - Tried to boot Belenix (Solaris 32 bit live), failed to boot completely 
>> (may be unrelated issue)
>> 
>> We have a temporary workaround in place: We created 3x1TB disks and used 
>> LVM to create a striped 3TB  volume with ext3 FS. This works fine.
>> 
>> RedHat claims ext3 and RHEL4  supports disks upto 8TB and 16TB respectively 
>> (since RHEL4u2)
>> 
>> I would like to know if anyone on the list has any pointers that can help 
>> us solve the issue.
>> 
>> Regards
>> Anand Vaidya
>> 
>> 
>> 
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit 
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit 
> http://www.beowulf.org/mailman/listinfo/beowulf

-- 
operator may differ from spokesperson.	            hahn at mcmaster.ca



More information about the Beowulf mailing list