[Beowulf] Large amounts of data to store and process
Douglas Eadline
deadline at eadline.org
Thu Mar 14 14:17:16 PDT 2019
> I don't want to interrupt the flow but I'M feeling cheeky. One word can
> solve everything "Fortran". There I said it.
Of course, but you forgot "now get off my lawn"
--
Doug
>
> Jeff
>
>
> On Thu, Mar 14, 2019, 17:03 Douglas Eadline <deadline at eadline.org> wrote:
>
>>
>> > Then given we are reaching these limitations how come we donââ¬â¢t
>> integrate
>> > certain things from the HPC world into every day computing so to
>> speak.
>>
>> Scalable/parallel computing is hard and hard costs time and money.
>> In HPC the performance often justifies the means, in other
>> sectors the cost must justify the means.
>>
>> HPC has traditionally trickled down in to other sectors. However,
>> many or the HPC problem types are not traditional computing
>> problems. This situation is changing a bit with things
>> like Hadoop/Spark/Tensor Flow
>>
>> --
>> Doug
>>
>>
>> >
>> > On 14/03/2019, 19:14, "Douglas Eadline" <deadline at eadline.org>
>> wrote:
>> >
>> >
>> > > Hi Douglas,
>> > >
>> > > Isnt there quantum computing being developed in terms of CPUs at
>> > this
>> > > point?
>> >
>> > QC is (theoretically) unreasonably good at some things at other
>> > there may me classic algorithms that work better. As far as I
>> know,
>> > there has been no demonstration of "quantum
>> > supremacy" where a quantum computer is shown
>> > to be faster than a classical algorithm.
>> >
>> > Getting there, not there yet.
>> >
>> > BTW, if you want to know what is going on with QC
>> > read Scott Aaronson's blog
>> >
>> > https://www.scottaaronson.com/blog/
>> >
>> > I usually get through the first few paragraphs and
>> > then whoosh over my scientific pay grade
>> >
>> >
>> > > Also is it really about the speed any more rather then how
>> > > optimized the code is to take advantage of the multiple cores
>> that
>> > a
>> > > system has?
>> >
>> > That is because the clock rate increase slowed to a crawl.
>> > Adding cores was a way to "offer" more performance, but introduced
>> > the "multi-core tax." That is, programing for multi-core is
>> > harder and costlier than a single core. Also, much
>> > harder to optimize. In HPC we are lucky, we are used to
>> > designing MPI codes that scale with more cores (no mater
>> > where they live, same die, next socket, another server).
>> >
>> > Also, more cores usually means lower single core
>> > frequency to fit into a given power envelope (die shrinks help
>> > with this but based on everything I have read, we are about
>> > at the end of the line) It also means lower absolute memory
>> > BW per core although more memory channels help a bit.
>> >
>> > --
>> > Doug
>> >
>> >
>> > >
>> > > ïûÿOn 13/03/2019, 22:22, "Douglas Eadline" <
>> deadline at eadline.org>
>> > wrote:
>> > >
>> > >
>> > > I realize it is bad form to reply ones own post and
>> > > I forgot to mention something.
>> > >
>> > > Basically the HW performance parade is getting harder
>> > > to celebrate. Clock frequencies have been slowly
>> > > increasing while cores are multiply rather quickly.
>> > > Single core performance boosts are mostly coming
>> > > from accelerators. Added to the fact that speculation
>> > > technology when managed for security, slows things down.
>> > >
>> > > What this means, the focus on software performance
>> > > and optimization is going to increase because we can just
>> > > buy new hardware and improve things anymore.
>> > >
>> > > I believe languages like Julia can help with this situation.
>> > > For a while.
>> > >
>> > > --
>> > > Doug
>> > >
>> > > >> Hi All,
>> > > >> Basically I have sat down with my colleague and we have
>> opted
>> > to go
>> > > down
>> > > > the route of Julia with JuliaDB for this project. But here
>> is
>> > an
>> > > > interesting thought that I have been pondering if Julia is
>> an
>> > up
>> > > and
>> > > > coming fast language to work with for large amounts of
>> data
>> > how
>> > > will
>> > > > that
>> > > >> affect HPC and the way it is currently used and HPC
>> systems
>> > > created?
>> > > >
>> > > >
>> > > > First, IMO good choice.
>> > > >
>> > > > Second a short list of actual conversations.
>> > > >
>> > > > 1) "This code is written in Fortran." I have been met with
>> > > > puzzling looks when I say the the word "Fortran." Then it
>> > > > comes, "... ancient language, why not port to modern ..."
>> > > > If you are asking that question young Padawan you have
>> > > > much to learn, maybe try web pages"
>> > > >
>> > > > 2) I'll just use Python because it works on my Laptop.
>> > > > Later, "It will just run faster on a cluster, right?"
>> > > > and "My little Python program is now kind-of big and has
>> > > > become slow, should I use TensorFlow?"
>> > > >
>> > > > 3) <mcoy>
>> > > > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and
>> > MPI.
>> > > > I'm a (fill in domain specific scientific/technical
>> > position)"
>> > > > </mcoy>
>> > > >
>> > > > My reply,"I agree and wish there was a better answer to
>> that
>> > > question.
>> > > > The computing industry has made great strides in HW with
>> > > > multi-core, clusters etc. Software tools have always
>> lagged
>> > > > hardware. In the case of HPC it is a slow process and
>> > > > in HPC the whole programming "thing" is not as "easy" as
>> > > > it is in other sectors, warp drives and transporters
>> > > > take a little extra effort.
>> > > >
>> > > > 4) Then I suggest Julia, "I invite you to try Julia. It is
>> > > > easy to get started, fast, and can grow with you
>> > application."
>> > > > Then I might say, "In a way it is HPC BASIC, it you are
>> old
>> > > > enough you will understand what I mean by that."
>> > > >
>> > > > The question with languages like Julia (or Chapel, etc)
>> is:
>> > > >
>> > > > "How much performance are you willing to give up for
>> > > convenience?"
>> > > >
>> > > > The goal is to keep the programmer close to the problem at
>> > hand
>> > > > and away from the nuances of the underlying hardware.
>> > Obviously
>> > > > the more performance needed, the closer you need to get to
>> > the
>> > > hardware.
>> > > > This decision goes beyond software tools, there are all
>> kinds
>> > > > of cost/benefits that need to be considered. And, then
>> there
>> > > > is IO ...
>> > > >
>> > > > --
>> > > > Doug
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >
>> > > >> Regards,
>> > > >> Jonathan
>> > > >> -----Original Message-----
>> > > >> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of
>> > Michael
>> > > Di
>> > > > Domenico
>> > > >> Sent: 04 March 2019 17:39
>> > > >> Cc: Beowulf Mailing List <beowulf at beowulf.org>
>> > > >> Subject: Re: [Beowulf] Large amounts of data to store and
>> > process
>> > > On
>> > > > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
>> > > > <jaquilina at eagleeyet.net>
>> > > >> wrote:
>> > > >>> As previously mentioned we
>> > donÃÆÃâÃââÃÆââ¬Å¡Ã¢ââ¬Å¡Ã¬ÃÆââ¬Å¡Ã¢ââ¬Å¾Ã¢t
>> really need to have
>> > > anything
>> > > >>> indexed
>> > > > so I am thinking flat files are the way to go my only
>> concern
>> > is
>> > > the
>> > > > performance of large flat files.
>> > > >> potentially, there are many factors in the work flow that
>> > > ultimately
>> > > > influence the decision as others have pointed out. my
>> flat
>> > file
>> > > example
>> > > > is only one, where we just repeatable blow through the
>> files.
>> > > >>> Isnt that what HDFS is for to deal with large flat
>> files.
>> > > >> large is relative. 256GB file isn't "large" anymore.
>> i've
>> > pushed
>> > > TB
>> > > > files through hadoop and run the terabyte sort benchmark,
>> and
>> > yes it
>> > > can
>> > > > be done in minutes (time-scale), but you need an
>> astounding
>> > amount
>> > > of
>> > > > hardware to do it (the last benchmark paper i saw, it was
>> > something
>> > > 1000
>> > > > nodes). you can accomplish the same feat using less and
>> less
>> > > > complicated hardware/software
>> > > >> and if your dev's are willing to adapt to the hadoop
>> > ecosystem, you
>> > > sunk
>> > > > right off the dock.
>> > > >> to get a more targeted answer from the numerous smart
>> people
>> > on
>> > > the
>> > > > list,
>> > > >> you'd need to open up the app and workflow to us.
>> there's
>> > just too
>> > > many
>> > > > variables _______________________________________________
>> > > >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
>> > Penguin
>> > > Computing
>> > > > To change your subscription (digest mode or unsubscribe)
>> > visit
>> > > >> http://www.beowulf.org/mailman/listinfo/beowulf
>> > > >> _______________________________________________
>> > > >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
>> > Penguin
>> > > Computing
>> > > > To change your subscription (digest mode or unsubscribe)
>> > visit
>> > > >> http://www.beowulf.org/mailman/listinfo/beowulf
>> > > >
>> > > >
>> > > > --
>> > > > Doug
>> > > >
>> > > >
>> > > >
>> > > >
>> > > > _______________________________________________
>> > > > Beowulf mailing list, Beowulf at beowulf.org sponsored by
>> > Penguin
>> > > Computing
>> > > > To change your subscription (digest mode or unsubscribe)
>> > visit
>> > > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>> > > >
>> > >
>> > >
>> > > --
>> > > Doug
>> > >
>> > >
>> > >
>> > >
>> >
>> >
>> > --
>> > Doug
>> >
>> >
>> >
>> >
>>
>>
>> --
>> Doug
>>
>> _______________________________________________
>> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
>> To change your subscription (digest mode or unsubscribe) visit
>> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>>
>
--
Doug
More information about the Beowulf
mailing list