[Beowulf] Re: Interesting

Robert G. Brown rgb at phy.duke.edu
Wed Nov 3 03:55:41 PDT 2010


On Wed, 3 Nov 2010, John Hearns wrote:

> What I think we should be doing is working towards a media-agnostic
> form of storing data.

Media agnostic with OPEN codices.  A great curse on all such IT
mechanisms is the closed codex.  ASCII works because it is utterly open,
as is its logical extension to UTF.  But what about music?  What about
movies?  What about books?  What about spreadsheets, processed text
documents, etc etc?  Perhaps html5 will magically solve all such
problems, but I doubt it.

> A recognition that scientific data (and other forms, like movies and
> music etc.) will carry with them metadata and that the data will
> migrate through many types of physical media in its lifetime, and will
> from the outset have multiple copies made.
> I guess the HPC Grid computing types are doing this already, what I'm
> rather thinking about is a universal standard for this, and a way of
> carrying the metadata with the actual data in a way it cannot be lost.

I think this is all dead on correct, but bearing in mind the forces of
darkness arrayed on the other side of this, concerned with everything
from DRM to encryption to owning and controlling the codex, I personally
am not holding my breath.  There are also numerous purely technical
issues -- modulus problems, for example, in conversion between ogg and
mp3 that result in artifacts when switching between lossy compression
algorithms that result in nonlinear degradation of information.  Similar
issues when dealing with old VGA vs 1080p and so on.  None of which will
go away as the technology evolves.  I'm not certain that this is a truly
solvable problem.

> Its also funny that I use the term "lifetime"  - I guess in the past
> we all have assumed digital data will have an infinite lifetime, as as
> discussed above it has come to pass that the decay of media, or
> reading apparatus being unavailable has made data have a finite
> lifetime.
>
> The real point I am making here is that with cloud type data storage
> over IP connections even in HPC we will be seeing data accessed not on
> SCSI volumes (be that direct SCSI, fibrechannel, iSCSI, RAID etc) but
> from an HTTP accessed object store. You might then say that "Hey -
> performance matters and that's why we still have SCSI" - I would
> counter that you will see home users accessing data via ADSL, business
> users via gigabit, and those HPC class systems will have 10 / 40 / 100
> gigabit interfaces.

All of which is groovy and I would never argue, but that doesn't address
the relative vulnerability of centralized data both to certain kinds of
attack and to other kinds of accidents.  Or to political control.  If
Google (ultimately) controls all the data, who controls Google?  What
happens if they use it for evil instead of for good?  How could one
<i>stop</i> them from using it for evil if they have your data and also
provide you with all of the software you are using to access that data?

Who will, after all, guard the guardians?

    rgb

>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf
>

Robert G. Brown	                       http://www.phy.duke.edu/~rgb/
Duke University Dept. of Physics, Box 90305
Durham, N.C. 27708-0305
Phone: 1-919-660-2567  Fax: 919-660-2525     email:rgb at phy.duke.edu





More information about the Beowulf mailing list