[Beowulf] strange problem with large file moving between server

Jörg Saßmannshausen j.sassmannshausen at ucl.ac.uk
Sun Sep 21 07:22:21 PDT 2014

Dear all,

I got a rather strange problem with one of my file servers which I recently 
have upgraded in order to accommodate more disc space. 

The problem: I have copies the files from the old file space to a temporary disc 
storage space using this rsync command:

rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo  tempspace:baa

I am doing this now for some years and never had any problems. 

As always, I am running md5sum afterwards to be sure ther is not a problem 
later and the user is loosing data. This time around a rather large file 
(around 16 GB) the md5sum failed after I moved the files from the temp space 
back to the new destination using the same command as above.

Having still access to the old file space, I decided to move this file from the 
old file space. Strangely enough, rsync does not sync the file again so I had to 
delete the file. Even after deleting the file and re-sync it from the old 
source, the md5sum is wrong. 

Copying the file to a different file space did not cause these problem, i.e. the 
md5sum is correct.
As it is a tar.gz file, I simply decided to decompress the original file on the 
different file server. That worked. The file where the md5sum is wrong did not 
decompress on the different file server but crashed with an error message when I 
executed gunzip. So the file is broken. 

The setup:

Originally I was using an old Infortrand box which had old PATA discs in it. 
This box is connected via scsi to a frontend server which exports the file 
space via iscsi. The backend for that, i.e. the one the user is accessing is 
on a different physical machine and it is a XEN guest. The reason behind that 
setting is as the frontend is acting as a backup server and I don't want 
people to have access to it. 
I then exchanged the Infortrend box with a more recent model which got SATA 
capeabilities but still got scsi connection to the frontend. The frontend is 
the same. I got a new controller for that box as the old one was broken.  
There is no changes in the backend, that is still the same XEN guest on the 
same hardware.

What I cannot work out is why the old Infortrend box does not have any 
problems with the new file, the newer one has a problem here. Also, when I have 
copied over some files (again using the rsync command above) a few files did not 
copy correctly (again md5sum) in the first instance but done so later. 

I find that highly alarming as that means that at least for larger and/or some 
binary files there seems to be a problem. However, I am not sure there to look 
at it as I am out of ideas. 

Could it be there is a problem with the 'new' controller?
In all cases I was using ext4 as a file system and I did not have any problems 
with that.

Anybody got some sentiments here?

All the best from a sunny London


P.S. To make things worse I am off on a work related trip from Monday onwards 
and I am working on that problem since Friday evening. 

Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
Gordon Street

email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net

Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140921/1954601a/attachment.sig>

More information about the Beowulf mailing list