[Beowulf] strange problem with large file moving between server
Jörg Saßmannshausen
j.sassmannshausen at ucl.ac.uk
Sun Sep 21 07:22:21 PDT 2014
Dear all,
I got a rather strange problem with one of my file servers which I recently
have upgraded in order to accommodate more disc space.
The problem: I have copies the files from the old file space to a temporary disc
storage space using this rsync command:
rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo tempspace:baa
I am doing this now for some years and never had any problems.
As always, I am running md5sum afterwards to be sure ther is not a problem
later and the user is loosing data. This time around a rather large file
(around 16 GB) the md5sum failed after I moved the files from the temp space
back to the new destination using the same command as above.
Having still access to the old file space, I decided to move this file from the
old file space. Strangely enough, rsync does not sync the file again so I had to
delete the file. Even after deleting the file and re-sync it from the old
source, the md5sum is wrong.
Copying the file to a different file space did not cause these problem, i.e. the
md5sum is correct.
As it is a tar.gz file, I simply decided to decompress the original file on the
different file server. That worked. The file where the md5sum is wrong did not
decompress on the different file server but crashed with an error message when I
executed gunzip. So the file is broken.
The setup:
Originally I was using an old Infortrand box which had old PATA discs in it.
This box is connected via scsi to a frontend server which exports the file
space via iscsi. The backend for that, i.e. the one the user is accessing is
on a different physical machine and it is a XEN guest. The reason behind that
setting is as the frontend is acting as a backup server and I don't want
people to have access to it.
I then exchanged the Infortrend box with a more recent model which got SATA
capeabilities but still got scsi connection to the frontend. The frontend is
the same. I got a new controller for that box as the old one was broken.
There is no changes in the backend, that is still the same XEN guest on the
same hardware.
What I cannot work out is why the old Infortrend box does not have any
problems with the new file, the newer one has a problem here. Also, when I have
copied over some files (again using the rsync command above) a few files did not
copy correctly (again md5sum) in the first instance but done so later.
I find that highly alarming as that means that at least for larger and/or some
binary files there seems to be a problem. However, I am not sure there to look
at it as I am out of ideas.
Could it be there is a problem with the 'new' controller?
In all cases I was using ext4 as a file system and I did not have any problems
with that.
Anybody got some sentiments here?
All the best from a sunny London
Jörg
P.S. To make things worse I am off on a work related trip from Monday onwards
and I am working on that problem since Friday evening.
--
*************************************************************
Dr. Jörg Saßmannshausen, MRSC
University College London
Department of Chemistry
Gordon Street
London
WC1H 0AJ
email: j.sassmannshausen at ucl.ac.uk
web: http://sassy.formativ.net
Please avoid sending me Word or PowerPoint attachments.
See http://www.gnu.org/philosophy/no-word-attachments.html
-------------- next part --------------
A non-text attachment was scrubbed...
Name: signature.asc
Type: application/pgp-signature
Size: 230 bytes
Desc: This is a digitally signed message part.
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20140921/1954601a/attachment.sig>
More information about the Beowulf
mailing list