[Beowulf] Nvidia K20 + Supermicro mobo

Mikhail Kuzminsky mikky_m at mail.ru
Mon Jul 22 08:59:36 PDT 2013


Addition of rdblacklist=nouveau to kernel parameter list don't helps :-( 
(BTW, it looks strange for me to have nouveau driver as part of initrd AND also as dynamically loaded nouveau.ko kernel module of the main kernel ).

Mikhail  


Пятница, 19 июля 2013, 13:40 -04:00 от Prentice Bisbal <prentice.bisbal at rutgers.edu>:
> I'd like to pipe in and say that I could not get NVidia drivers working 
> with RHEL 6.x until I added rdblacklist=nouveau to my kernel args, too.
> 
> Prentice
> 
> On 07/16/2013 02:43 PM, Alex Chekholko wrote:
> > I see on our GPU compute nodes, configured by a colleague, we use this
> > kernel line during install:
> >
> > # rocks list bootaction | grep gpu
> > gpuinstall:       vmlinuz-6.0-x86_64    initrd.img-6.0-x86_64 ks
> > ramdisk_size=150000 lang= devfs=nomount pxe kssendmac selinux=0 noipv6
> > ksdevice=bootif xdriver=vesa rdblacklist=nouveau nouveau.modeset=0
> >
> > This is RHEL6 (Rocks 6.0) on HP SL250s hardware.  I think they didn't
> > boot correctly without blacklisting nouveau.
> >
> > Hope that helps.
> >
> > Regards,
> > Alex
> >
> > On Tue, Jul 16, 2013 at 10:44 AM, Adam DeConinck <ajdecon at ajdecon.org> wrote:
> >> Hi Mikhail,
> >>
> >> I've seen similar messages on CentOS when the Nouveau drivers are
> >> loaded and a Tesla K20 is installed. You should make sure that nouveau
> >> is blacklisted so the kernel won't load it.
> >>
> >> Note that it hasn't always been enough for me to have nouveau listed
> >> in /etc/modprobe.d/blacklist; sometimes I've had to actually put
> >> "rdblacklist=nouveau" on the kernel line.
> >>
> >> Disclaimer: I work at NVIDIA, but I haven't touched OpenSUSE in forever.
> >>
> >> Cheers,
> >> Adam
> >>
> >> On Tue, Jul 16, 2013 at 10:29 AM, Mikhail Kuzminsky <mikky_m at mail.ru> wrote:
> >>> I want to test NVIDIA GPU (PNY Tesla K20c) w/our own application for future using in our cluster. But I found problems w/NVIDIA driver (v.319.32) installation (OpenSUSE 12.3, kernel 3.7.10-1.1).
> >>>
> >>> 1st of all, before start of driver installation I've strange for me messages about BAR registers:
> >>> -----------------------from /var/log/messages------
> >>> 2013-07-04T01:43:43.666022+04:00 c6ws4 kernel: [ 0.421559] pci 0000:00:01.0: BAR 15: can't assign mem pref (size 0x18000000)
> >>> 2013-07-04T01:43:43.666024+04:00 c6ws4 kernel: [ 0.421563] pci 0000:00:01.0: BAR 14: assigned [mem 0xe1000000-0xe1ffffff]
> >>> 2013-07-04T01:43:43.666025+04:00 c6ws4 kernel: [ 0.421566] pci 0000:00:16.1: BAR 0: assigned [mem 0xe0001000-0xe000100f 64bit]
> >>> 2013-07-04T01:43:43.666026+04:00 c6ws4 kernel: [ 0.421576] pci 0000:01:00.0: BAR 1: can't assign mem pref (size 0x10000000)
> >>> 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421579] pci 0000:01:00.0: BAR 3: can't assign mem pref (size 0x2000000)
> >>> 2013-07-04T01:43:43.666027+04:00 c6ws4 kernel: [ 0.421581] pci 0000:01:00.0: BAR 0: assigned [mem 0xe1000000-0xe1ffffff]
> >>> 2013-07-04T01:43:43.666028+04:00 c6ws4 kernel: [ 0.421584] pci 0000:01:00.0: BAR 6: can't assign mem pref (size 0x80000)
> >>> 2013-07-04T01:43:43.666029+04:00 c6ws4 kernel: [ 0.421586] pci 0000:00:01.0: PCI bridge to [bus 01]
> >>> -----------------------------------------------------------------------------------------------
> >>>
> >>> May be it's hardware/BIOS (Supermicro X9SCA-F, last BIOS v.2.0b) error symptoms ? I tried both BIOS modes - "above 4G Decoding" enabled and disabled.
> >>>
> >>> It looks for me that NVIDIA driver uses BAR 1 (see below). Although it was also some unclear for me messages in nvidia-installer.log, installer shows that kernel interface of nvidia.ko was compiled, but then nvidia-installer.log contains
> >>>
> >>> --------------------------from nvidia-installer.log ----------------------------------
> >>> -> Kernel module load error: No such device
> >>> -> Kernel messages:
> >>> ...[ 25.286079] IPv6: ADDRCONF(NETDEV_CHANGE): eth1: link becomes ready
> >>> [ 1379.760532] nvidia: module license 'NVIDIA' taints kernel.
> >>> [ 1379.760536] Disabling lock debugging due to kernel taint
> >>> [ 1379.765158] nvidia 0000:01:00.0: enabling device (0140 -> 0142)
> >>> [ 1379.765165] NVRM: This PCI I/O region assigned to your NVIDIA device is invalid:
> >>> [ 1379.765165] NVRM: BAR1 is 0M @ 0x0 (PCI:0000:01:00.0)
> >>> [ 1379.765166] NVRM: The system BIOS may have misconfigured your GPU.
> >>> [ 1379.765169] nvidia: probe of 0000:01:00.0 failed with error -1
> >>> [ 1379.765177] NVRM: The NVIDIA probe routine failed for 1 device(s).
> >>> [ 1379.765178] NVRM: None of the NVIDIA graphics adapters were initialized!
> >>> ---------------------------------------------------------------------------------------------
> >>>
> >>> I add also lspci -v extraction :
> >>>
> >>> 01:00.0 3D controller: NVIDIA Corporation GK107 [Tesla K20c] (rev a1)
> >>>          Subsystem: NVIDIA Corporation Device 0982
> >>>          Flags: fast devsel, IRQ 11
> >>>          Memory at e1000000 (32-bit, non-prefetchable) [disabled] [size=16M]
> >>>          Memory at <unassigned> (64-bit, prefetchable) [disabled]
> >>>          Memory at <unassigned> (64-bit, prefetchable) [disabled]
> >>>
> >>> Does this kernel messages above means that I have hardware/BIOS problems or it may be some NVIDIA driver problems ?
> >>>
> >>> Mikhail Kuzminsky
> >>> Computer Assistance to Chemical Research Center
> >>> Zelinsky Institute of Organic Chemistry
> >>> Moscow
> >>>
> >>>
> >>>
> >>>
> >>>
> >>> _______________________________________________
> >>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >>> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> >> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
> 

Mikhail Kuzminsky


More information about the Beowulf mailing list