[Beowulf] XFS/EVMS/kernel 2.6.11

Suvendra Nath Dutta sdutta at cfa.harvard.edu
Wed Mar 15 13:46:10 PST 2006


I am using a the above combination. At boot up I get loads of the  
following:

<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed
<4>device-mapper: error adding target to table
<3>device-mapper: dm-linear: Device lookup failed

Then suddenly at some point after running fine I've started to get:
Mar 15 14:09:54 sauron kernel: XFS internal error  
XFS_WANT_CORRUPTED_GOTO at line 1610 of file fs/xfs/xfs_alloc.c.   
Caller 0xffffffff80277e49
Mar 15 14:09:54 sauron kernel:
Mar 15 14:09:54 sauron kernel: Call Trace:<ffffffff80276383> 
{xfs_free_ag_extent+1251} <ffffffff80277e49>{xfs_free_extent+185}
Mar 15 14:09:54 sauron kernel:        <ffffffff802a1994>{xfs_efd_init 
+68} <ffffffff802856bd>{xfs_bmap_finish+253}
Mar 15 14:09:54 sauron kernel:        <ffffffff802aad30> 
{xfs_itruncate_finish+416} <ffffffff802bc479>{xfs_trans_alloc+217}
Mar 15 14:09:54 sauron kernel:        <ffffffff802c1adf>{xfs_inactive 
+591} <ffffffff80155887>{find_get_pages+119}
Mar 15 14:09:54 sauron kernel:        <ffffffff801607e3> 
{truncate_inode_pages+435} <ffffffff802d12df>{vn_rele+95}
Mar 15 14:09:54 sauron kernel:        <ffffffff802cfc32> 
{linvfs_clear_inode+18} <ffffffff8019324e>{clear_inode+142}
Mar 15 14:09:54 sauron kernel:        <ffffffff80193865> 
{generic_delete_inode+165} <ffffffff8019262e>{iput+126}
Mar 15 14:09:54 sauron kernel:        <ffffffff8018a206>{sys_unlink 
+262} <ffffffff8018c078>{sys_getdents+232}
Mar 15 14:09:54 sauron kernel:        <ffffffff8018b50f>{sys_fcntl 
+815} <ffffffff8010d54e>{system_call+126}
Mar 15 14:09:54 sauron kernel:
Mar 15 14:09:54 sauron kernel: xfs_force_shutdown(dm-1,0x8) called  
from line 4073 of file fs/xfs/xfs_bmap.c.  Return address =  
0xffffffff802d0e28
Mar 15 14:09:54 sauron kernel: Filesystem "dm-1": Corruption of in- 
memory data detected.  Shutting down filesystem: dm-1
Mar 15 14:09:54 sauron kernel: Please umount the filesystem, and  
rectify the problem(s)

At this point the following ensues:
sauron:~ # ls /raid3/sdutta
/bin/ls: /raid3/sdutta: Input/output error

On reboot, the boot.msg has this:
<5>XFS mounting filesystem dm-1
<5>Starting XFS recovery on filesystem: dm-1 (dev: dm-1)
<1>XFS internal error XFS_WANT_CORRUPTED_GOTO at line 1583 of file fs/ 
xfs/xfs_alloc.c.  Caller 0xffffffff80277e49
<4>
<4>Call Trace:<ffffffff80276383>{xfs_free_ag_extent+1251}  
<ffffffff80277e49>{xfs_free_extent+185}
<4>       <ffffffff802a1994>{xfs_efd_init+68} <ffffffff802bd56b> 
{xfs_trans_get_efd+43}
<4>       <ffffffff802b5f41>{xlog_recover_finish+401}  
<ffffffff802df7c1>{__up_write+49}
<4>       <ffffffff802b1ddb>{xfs_log_mount_finish+27}  
<ffffffff802b9584>{xfs_mountfs+2612}
<4>       <ffffffff802c93fd>{xfs_setsize_buftarg_flags+61}  
<ffffffff802bf380>{xfs_mount+2432}
<4>       <ffffffff802d0020>{linvfs_fill_super+0} <ffffffff802d0b28> 
{vfs_mount+40}
<4>       <ffffffff802d00d3>{linvfs_fill_super+179} <ffffffff802d0020> 
{linvfs_fill_super+0}
<4>       <ffffffff802e12b3>{snprintf+131} <ffffffff80542cb3> 
{__down_write+51}
<4>       <ffffffff802dff1e>{strlcpy+78} <ffffffff8017f7f5>{sget+949}
<4>       <ffffffff8017ebc0>{set_bdev_super+0} <ffffffff8017fe34> 
{get_sb_bdev+276}
<4>       <ffffffff8017faff>{do_kern_mount+111} <ffffffff80196daa> 
{do_mount+1642}
<4>       <ffffffff8011e2c2>{do_page_fault+1202} <ffffffff802e1f2e> 
{_atomic_dec_and_lock+46}
<4>       <ffffffff80188f7d>{link_path_walk+3581} <ffffffff8015acd4> 
{buffered_rmqueue+516}
<4>       <ffffffff8015aa80>{__get_free_pages+16} <ffffffff80196ebc> 
{sys_mount+156}
<4>       <ffffffff8010d54e>{system_call+126}
<5>Ending XFS recovery on filesystem: dm-1 (dev: dm-1)

And sooner or later filesystem on this partition will crash and not  
recover. Until the next reboot.

Has anyone seen this behaviour also? Is there a solution?

Suvendra.



More information about the Beowulf mailing list