[Beowulf] Nvidia K20 + Supermicro mobo

Adam DeConinck ajdecon at ajdecon.org
Tue Jul 16 10:44:09 PDT 2013


Hi Mikhail,

I've seen similar messages on CentOS when the Nouveau drivers are
loaded and a Tesla K20 is installed. You should make sure that nouveau
is blacklisted so the kernel won't load it.

Note that it hasn't always been enough for me to have nouveau listed
in /etc/modprobe.d/blacklist; sometimes I've had to actually put
"rdblacklist=nouveau" on the kernel line.

Disclaimer: I work at NVIDIA, but I haven't touched OpenSUSE in forever.

Cheers,
Adam

On Tue, Jul 16, 2013 at 10:29 AM, Mikhail Kuzminsky <mikky_m at mail.ru> wrote:
> I want to test NVIDIA GPU (PNY Tesla K20c) w/our own application for future using in our cluster. But I found problems w/NVIDIA driver (v.319.32) installation (OpenSUSE 12.3, kernel 3.7.10-1.1).
>
> 1st of all, before start of driver installation I've strange for me messages about BAR registers:
> -----------------------from /var/log/messages------
> 2013-07-04T01:43:43.666022+04:00 c6ws4 kernel: [ 0.421559] pci 0000:00:01.0: BAR 15: can't assign mem pref (size 0x18000000)
> 2013-07-04T01:43:43.666024+04:00 c6ws4 kernel: [ 0.421563] pci 0000:00:01.0: BAR 14: assigned [mem 0xe1000000-0xe1ffffff]
> 2013-07-04T01:43:43.666025+04:00 c6ws4 kernel: [ 0.421566] pci 0000:00:16.1: BAR 0: assigned [mem 0xe0001000-0xe000100f 64bit]
> 2013-07-04T01:43:43.666026+04:00 c6ws4 kernel: [ 0.421576] pci 0000:01:00.0: BAR 1: can't assign mem pref (size 0x10000000)
> 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421579] pci 0000:01:00.0: BAR 3: can't assign mem pref (size 0x2000000)
> 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421581] pci 0000:01:00.0: BAR 0: assigned [mem 0xe1000000-0xe1ffffff]
> 2013-07-04T01:43:43.666028+04:00 c6ws4 kernel: [ 0.421584] pci 0000:01:00.0: BAR 6: can't assign mem pref (size 0x80000)
> 2013-07-04T01:43:43.666029+04:00 c6ws4 kernel: [ 0.421586] pci 0000:00:01.0: PCI bridge to [bus 01]
> -----------------------------------------------------------------------------------------------
>
> May be it's hardware/BIOS (Supermicro X9SCA-F, last BIOS v.2.0b) error symptoms ? I tried both BIOS modes - "above 4G Decoding" enabled and disabled.
>
> It looks for me that NVIDIA driver uses BAR 1 (see below). Although it was also some unclear for me messages in nvidia-installer.log, installer shows that kernel interface of nvidia.ko was compiled, but then nvidia-installer.log contains
>
> --------------------------from nvidia-installer.log ----------------------------------
> -> Kernel module load error: No such device
> -> Kernel messages:
> ...[ 25.286079] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> [ 1379.760532] nvidia: module license 'NVIDIA' taints kernel.
> [ 1379.760536] Disabling lock debugging due to kernel taint
> [ 1379.765158] nvidia 0000:01:00.0: enabling device (0140 -> 0142)
> [ 1379.765165] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
> [ 1379.765165] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)
> [ 1379.765166] NVRM: The system BIOS may have misconfigured your GPU.
> [ 1379.765169] nvidia: probe of 0000:01:00.0 failed with error -1
> [ 1379.765177] NVRM: The NVIDIA probe routine failed for 1 device(s).
> [ 1379.765178] NVRM: None of the NVIDIA graphics adapters were initialized!
> ---------------------------------------------------------------------------------------------
>
> I add also lspci -v extraction :
>
> 01:00.0 3D controller: NVIDIA Corporation GK107 [Tesla K20c] (rev a1)
>         Subsystem: NVIDIA Corporation Device 0982
>         Flags: fast devsel, IRQ 11
>         Memory at e1000000 (32-bit, non-prefetchable) [disabled] [size=16M]
>         Memory at <unassigned> (64-bit, prefetchable) [disabled]
>         Memory at <unassigned> (64-bit, prefetchable) [disabled]
>
> Does this kernel messages above means that I have hardware/BIOS problems or it may be some NVIDIA driver problems ?
>
> Mikhail Kuzminsky
> Computer Assistance to Chemical Research Center
> Zelinsky Institute of Organic Chemistry
> Moscow
>
>
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf



More information about the Beowulf mailing list