[Beowulf] NFS+XFS+SMP on kernel 2.6 (Update)
Many of your questions may have already been answered in earlier discussions or in the FAQ. The search results page will indicate current discussions as well as past list serves, articles, and papers.
Suvendra Nath Dutta sdutta at cfa.harvard.eduTue Jun 21 07:10:21 PDT 2005
- Previous message: [Beowulf] NFS+XFS+SMP on kernel 2.6
- Next message: [Beowulf] NFS+XFS+SMP on kernel 2.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
Update on this:
I upgraded the kernel to 2.6.11 and the machine is a lot less sluggish.
A lot more memory is available now.
Thanks for a much cheaper (and better) solution than buying a new
machine.
Suvendra.
On Jun 15, 2005, at 5:48 PM, Joe Landman wrote:
> Eeek... nfs crashed ... atop xfs. You are running 2.6.8.1 with SuSE
> 9.1. Try upgrading to 9.3. 2.6.11 seems to have fixed many bugs on
> AMD64.
>
> Also, run xfs_check against that file system device. I had lots of
> problems with SuSE 9.1 crashing in general.
>
> Note also that you are using tg3. I had seen a fair number of tg3
> initiated oopses on other machines. The bcm5700 driver seemed more
> stable to me.
>
> Joe
>
>
>
> I don't think this is a 4k page issue.
>
> Suvendra Nath Dutta wrote:
>> /var/log/messages
>> Jun 14 16:39:48 sauron kernel: ----------- [cut here ] ---------
>> [please bite here ] ---------
>> Jun 14 16:39:48 sauron kernel: Kernel BUG at debug:106
>> Jun 14 16:39:48 sauron kernel: invalid operand: 0000 [1] SMP
>> Jun 14 16:39:48 sauron kernel: CPU 1
>> Jun 14 16:39:48 sauron kernel: Modules linked in: e1000 tg3 subfs
>> dm_mod
>> Jun 14 16:39:48 sauron kernel: Pid: 10070, comm: nfsd Not tainted
>> 2.6.8.1-suse91-osmp
>> Jun 14 16:39:48 sauron kernel: RIP: 0010:[cmn_err+278/299]
>> <ffffffff802c9456>{cmn_err+278}
>> Jun 14 16:39:48 sauron kernel: RIP: 0010:[<ffffffff802c9456>]
>> <ffffffff802c9456>{cmn_err+278}
>> Jun 14 16:39:48 sauron kernel: RSP: 0018:00000100791d17b8 EFLAGS:
>> 00010246
>> Jun 14 16:39:48 sauron kernel: RAX: 0000000000000050 RBX:
>> 0000000000000000 RCX: ffffffff805b4ae8
>> Jun 14 16:39:48 sauron kernel: RDX: ffffffff805b4ae8 RSI:
>> 0000000000000001 RDI: 000001006e6aab30
>> Jun 14 16:39:48 sauron kernel: RBP: 0000010033f47ac0 R08:
>> 0000000000000001 R09: 0000000000000001
>> Jun 14 16:39:50 sauron kernel: R10: 0000000000000000 R11:
>> 0000000000000000 R12: 0000010033f47af0
>> Jun 14 16:39:50 sauron kernel: R13: 0000000098ee8d60 R14:
>> 000001007e169000 R15: 000001007cf53a38
>> Jun 14 16:39:50 sauron kernel: FS: 0000002a9588d6e0(0000)
>> GS:ffffffff806f5040(0000) knlGS:0000000062693bb0
>> Jun 14 16:39:50 sauron kernel: CS: 0010 DS: 0000 ES: 0000 CR0:
>> 000000008005003b
>> Jun 14 16:39:51 sauron kernel: CR2: 0000002a9558c000 CR3:
>> 0000000037eca000 CR4: 00000000000006e0
>> Jun 14 16:39:51 sauron kernel: Process nfsd (pid: 10070, threadinfo
>> 00000100791d0000, task 000001006e6aab30)
>> Jun 14 16:39:51 sauron kernel: Stack: 0000000000000001
>> 0000000000000293 0000003000000020 00000100791d18a8
>> Jun 14 16:39:51 sauron kernel: 00000100791d17e8
>> ffffffff80153b08 0000000000001000 ffffffff8017677a
>> Jun 14 16:39:51 sauron kernel: 0000010078a8d080
>> 0000010033f47ac0
>> Jun 14 16:39:51 sauron kernel: Call
>> Trace:<ffffffff80153b08>{find_get_page+24}
>> <ffffffff8017677a>{__find_get_block_slow+74}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802c8ef8>{vn_purge+328}
>> <ffffffff80177e98>{unmap_underlying_metadata+8}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802c7c99>{linvfs_alloc_inode+41}
>> <ffffffff8018e6a6>{iget_locked+230}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802c91ec>{vn_initialize+124}
>> <ffffffff802a02b6>{xfs_iget+358}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802c8fe4>{vn_remove+68} <ffffffff802b6b73>{xfs_vget+51}
>> Jun 14 16:39:51 sauron kernel: <ffffffff802c87d8>{vfs_vget+40}
>> <ffffffff802a9e41>{xlog_write+1057}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802c77eb>{linvfs_get_dentry+59}
>> <ffffffff802186f0>{find_exported_dentry+64}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8021bdf0>{nfsd_acceptable+0}
>> <ffffffff8047b011>{sock_alloc_send_pskb+113}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff80491b88>{rt_hash_code+56}
>> <ffffffff80493c10>{__ip_route_output_key+48}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff804819fd>{netif_receive_skb+381}
>> <ffffffffa0013327>{:tg3:tg3_enable_ints+23}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8049a319>{ip_append_data+809}
>> <ffffffff8048f783>{qdisc_restart+35}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8022084e>{exp_find_key+126}
>> <ffffffff80218d7b>{export_decode_fh+123}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8021bc31>{fh_verify+961}
>> <ffffffff80135230>{autoremove_wake_function+0}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff80135230>{autoremove_wake_function+0}
>> <ffffffff8021d6d8>{nfsd_open+56}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8021da3b>{nfsd_write+107}
>> <ffffffff8036e63f>{scsi_end_request+223}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff8036e84c>{scsi_io_completion+492}
>> <ffffffff8015b99e>{cache_flusharray+110}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff80504bd2>{ip_map_lookup+306}
>> <ffffffff805053a5>{svcauth_unix_accept+597}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff802252d1>{nfsd3_proc_write+241}
>> <ffffffff80218f60>{nfsd_dispatch+256}
>> Jun 14 16:39:51 sauron kernel:
>> <ffffffff80501123>{svc_process+947} <ffffffff80219220>{nfsd+0}
>> Jun 14 16:39:51 sauron kernel: <ffffffff80219465>{nfsd+581}
>> <ffffffff801332ee>{schedule_tail+14}
>> Jun 14 16:39:51 sauron kernel: <ffffffff801102a7>{child_rip+8}
>> <ffffffff80219220>{nfsd+0}
>> Jun 14 16:39:51 sauron kernel: <ffffffff80219220>{nfsd+0}
>> <ffffffff8011029f>{child_rip+0}
>> Jun 14 16:39:51 sauron kernel:
>> Jun 14 16:39:51 sauron kernel:
>> Jun 14 16:39:51 sauron kernel: Code: 0f 0b cc 63 53 80 ff ff ff ff 6a
>> 00 48 81 c4 e0 00 00 00 5b
>> Jun 14 16:39:51 sauron kernel: RIP <ffffffff802c9456>{cmn_err+278}
>> RSP <00000100791d17b8>
>> On Jun 15, 2005, at 10:57 AM, Paul Nowoczynski wrote:
>>> What kernel bug did you run into? Was it a page_allocation failure?
>>> paul
>>>
>>> Suvendra Nath Dutta wrote:
>>>
>>>> We set up a 160 node cluster with a dual processor head node with
>>>> 2GB RAM. The head node also has two RAID devices attached to two
>>>> SCSI cards. These have a XFS filesystem on them and are NFS
>>>> exported to the cluster. The head node runs very low on memory (7-8
>>>> MB). And today I ran into a kernel bug that crashed the system.
>>>> Google suggests that I should upgrade to kernel 2.6.11, but that
>>>> sounds very unpleasant. I am thinking of putting the raid boxes on
>>>> a different box. Will separating the file-server and the head node
>>>> give me back stability on the head node?
>>>>
>>>> Suvendra.
>>>>
>>>> _______________________________________________
>>>> Beowulf mailing list, Beowulf at beowulf.org
>>>> To change your subscription (digest mode or unsubscribe) visit
>>>> http://www.beowulf.org/mailman/listinfo/beowulf
>>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org
>> To change your subscription (digest mode or unsubscribe) visit
>> http://www.beowulf.org/mailman/listinfo/beowulf
>
> --
> Joseph Landman, Ph.D
> Founder and CEO
> Scalable Informatics LLC,
> email: landman at scalableinformatics.com
> web : http://www.scalableinformatics.com
> phone: +1 734 786 8423
> fax : +1 734 786 8452
> cell : +1 734 612 4615
- Previous message: [Beowulf] NFS+XFS+SMP on kernel 2.6
- Next message: [Beowulf] NFS+XFS+SMP on kernel 2.6
- Messages sorted by: [ date ] [ thread ] [ subject ] [ author ]
More information about the Beowulf mailing list
