[Beowulf] strange problem with large file moving between server
dimitrisz at gmail.com
Thu Oct 2 02:35:13 PDT 2014
RAM somewhere could also be faulty. Have a look at the logs for any ECC
errors (both system memory and RAID controller) and memtest the boxes
involved for a couple of days. I would suggest some stress testing of the
new server if not done already.
On Sun, Sep 21, 2014 at 3:22 PM, Jörg Saßmannshausen <
j.sassmannshausen at ucl.ac.uk> wrote:
> Dear all,
> I got a rather strange problem with one of my file servers which I recently
> have upgraded in order to accommodate more disc space.
> The problem: I have copies the files from the old file space to a
> temporary disc
> storage space using this rsync command:
> rsync -vrltH -pgo --stats -D --numeric-ids -x oldserver:foo tempspace:baa
> I am doing this now for some years and never had any problems.
> As always, I am running md5sum afterwards to be sure ther is not a problem
> later and the user is loosing data. This time around a rather large file
> (around 16 GB) the md5sum failed after I moved the files from the temp
> back to the new destination using the same command as above.
> Having still access to the old file space, I decided to move this file
> from the
> old file space. Strangely enough, rsync does not sync the file again so I
> had to
> delete the file. Even after deleting the file and re-sync it from the old
> source, the md5sum is wrong.
> Copying the file to a different file space did not cause these problem,
> i.e. the
> md5sum is correct.
> As it is a tar.gz file, I simply decided to decompress the original file
> on the
> different file server. That worked. The file where the md5sum is wrong did
> decompress on the different file server but crashed with an error message
> when I
> executed gunzip. So the file is broken.
> The setup:
> Originally I was using an old Infortrand box which had old PATA discs in
> This box is connected via scsi to a frontend server which exports the file
> space via iscsi. The backend for that, i.e. the one the user is accessing
> on a different physical machine and it is a XEN guest. The reason behind
> setting is as the frontend is acting as a backup server and I don't want
> people to have access to it.
> I then exchanged the Infortrend box with a more recent model which got SATA
> capeabilities but still got scsi connection to the frontend. The frontend
> the same. I got a new controller for that box as the old one was broken.
> There is no changes in the backend, that is still the same XEN guest on the
> same hardware.
> What I cannot work out is why the old Infortrend box does not have any
> problems with the new file, the newer one has a problem here. Also, when I
> copied over some files (again using the rsync command above) a few files
> did not
> copy correctly (again md5sum) in the first instance but done so later.
> I find that highly alarming as that means that at least for larger and/or
> binary files there seems to be a problem. However, I am not sure there to
> at it as I am out of ideas.
> Could it be there is a problem with the 'new' controller?
> In all cases I was using ext4 as a file system and I did not have any
> with that.
> Anybody got some sentiments here?
> All the best from a sunny London
> P.S. To make things worse I am off on a work related trip from Monday
> and I am working on that problem since Friday evening.
> Dr. Jörg Saßmannshausen, MRSC
> University College London
> Department of Chemistry
> Gordon Street
> WC1H 0AJ
> email: j.sassmannshausen at ucl.ac.uk
> web: http://sassy.formativ.net
> Please avoid sending me Word or PowerPoint attachments.
> See http://www.gnu.org/philosophy/no-word-attachments.html
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
-------------- next part --------------
An HTML attachment was scrubbed...
More information about the Beowulf