[Beowulf] Large amounts of data to store and process
Jeffrey Layton
laytonjb at gmail.com
Thu Mar 14 14:52:12 PDT 2019
Damn. I knew I forgot something. Now where are my glasses.
On Thu, Mar 14, 2019, 17:17 Douglas Eadline <deadline at eadline.org> wrote:
>
> > I don't want to interrupt the flow but I'M feeling cheeky. One word can
> > solve everything "Fortran". There I said it.
>
> Of course, but you forgot "now get off my lawn"
>
> --
> Doug
>
> >
> > Jeff
> >
> >
> > On Thu, Mar 14, 2019, 17:03 Douglas Eadline <deadline at eadline.org>
> wrote:
> >
> >>
> >> > Then given we are reaching these limitations how come we don’t
> >> integrate
> >> > certain things from the HPC world into every day computing so to
> >> speak.
> >>
> >> Scalable/parallel computing is hard and hard costs time and money.
> >> In HPC the performance often justifies the means, in other
> >> sectors the cost must justify the means.
> >>
> >> HPC has traditionally trickled down in to other sectors. However,
> >> many or the HPC problem types are not traditional computing
> >> problems. This situation is changing a bit with things
> >> like Hadoop/Spark/Tensor Flow
> >>
> >> --
> >> Doug
> >>
> >>
> >> >
> >> > On 14/03/2019, 19:14, "Douglas Eadline" <deadline at eadline.org>
> >> wrote:
> >> >
> >> >
> >> > > Hi Douglas,
> >> > >
> >> > > Isnt there quantum computing being developed in terms of CPUs at
> >> > this
> >> > > point?
> >> >
> >> > QC is (theoretically) unreasonably good at some things at other
> >> > there may me classic algorithms that work better. As far as I
> >> know,
> >> > there has been no demonstration of "quantum
> >> > supremacy" where a quantum computer is shown
> >> > to be faster than a classical algorithm.
> >> >
> >> > Getting there, not there yet.
> >> >
> >> > BTW, if you want to know what is going on with QC
> >> > read Scott Aaronson's blog
> >> >
> >> > https://www.scottaaronson.com/blog/
> >> >
> >> > I usually get through the first few paragraphs and
> >> > then whoosh over my scientific pay grade
> >> >
> >> >
> >> > > Also is it really about the speed any more rather then how
> >> > > optimized the code is to take advantage of the multiple cores
> >> that
> >> > a
> >> > > system has?
> >> >
> >> > That is because the clock rate increase slowed to a crawl.
> >> > Adding cores was a way to "offer" more performance, but introduced
> >> > the "multi-core tax." That is, programing for multi-core is
> >> > harder and costlier than a single core. Also, much
> >> > harder to optimize. In HPC we are lucky, we are used to
> >> > designing MPI codes that scale with more cores (no mater
> >> > where they live, same die, next socket, another server).
> >> >
> >> > Also, more cores usually means lower single core
> >> > frequency to fit into a given power envelope (die shrinks help
> >> > with this but based on everything I have read, we are about
> >> > at the end of the line) It also means lower absolute memory
> >> > BW per core although more memory channels help a bit.
> >> >
> >> > --
> >> > Doug
> >> >
> >> >
> >> > >
> >> > > On 13/03/2019, 22:22, "Douglas Eadline" <
> >> deadline at eadline.org>
> >> > wrote:
> >> > >
> >> > >
> >> > > I realize it is bad form to reply ones own post and
> >> > > I forgot to mention something.
> >> > >
> >> > > Basically the HW performance parade is getting harder
> >> > > to celebrate. Clock frequencies have been slowly
> >> > > increasing while cores are multiply rather quickly.
> >> > > Single core performance boosts are mostly coming
> >> > > from accelerators. Added to the fact that speculation
> >> > > technology when managed for security, slows things down.
> >> > >
> >> > > What this means, the focus on software performance
> >> > > and optimization is going to increase because we can just
> >> > > buy new hardware and improve things anymore.
> >> > >
> >> > > I believe languages like Julia can help with this situation.
> >> > > For a while.
> >> > >
> >> > > --
> >> > > Doug
> >> > >
> >> > > >> Hi All,
> >> > > >> Basically I have sat down with my colleague and we have
> >> opted
> >> > to go
> >> > > down
> >> > > > the route of Julia with JuliaDB for this project. But here
> >> is
> >> > an
> >> > > > interesting thought that I have been pondering if Julia is
> >> an
> >> > up
> >> > > and
> >> > > > coming fast language to work with for large amounts of
> >> data
> >> > how
> >> > > will
> >> > > > that
> >> > > >> affect HPC and the way it is currently used and HPC
> >> systems
> >> > > created?
> >> > > >
> >> > > >
> >> > > > First, IMO good choice.
> >> > > >
> >> > > > Second a short list of actual conversations.
> >> > > >
> >> > > > 1) "This code is written in Fortran." I have been met with
> >> > > > puzzling looks when I say the the word "Fortran." Then it
> >> > > > comes, "... ancient language, why not port to modern ..."
> >> > > > If you are asking that question young Padawan you have
> >> > > > much to learn, maybe try web pages"
> >> > > >
> >> > > > 2) I'll just use Python because it works on my Laptop.
> >> > > > Later, "It will just run faster on a cluster, right?"
> >> > > > and "My little Python program is now kind-of big and has
> >> > > > become slow, should I use TensorFlow?"
> >> > > >
> >> > > > 3) <mcoy>
> >> > > > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and
> >> > MPI.
> >> > > > I'm a (fill in domain specific scientific/technical
> >> > position)"
> >> > > > </mcoy>
> >> > > >
> >> > > > My reply,"I agree and wish there was a better answer to
> >> that
> >> > > question.
> >> > > > The computing industry has made great strides in HW with
> >> > > > multi-core, clusters etc. Software tools have always
> >> lagged
> >> > > > hardware. In the case of HPC it is a slow process and
> >> > > > in HPC the whole programming "thing" is not as "easy" as
> >> > > > it is in other sectors, warp drives and transporters
> >> > > > take a little extra effort.
> >> > > >
> >> > > > 4) Then I suggest Julia, "I invite you to try Julia. It is
> >> > > > easy to get started, fast, and can grow with you
> >> > application."
> >> > > > Then I might say, "In a way it is HPC BASIC, it you are
> >> old
> >> > > > enough you will understand what I mean by that."
> >> > > >
> >> > > > The question with languages like Julia (or Chapel, etc)
> >> is:
> >> > > >
> >> > > > "How much performance are you willing to give up for
> >> > > convenience?"
> >> > > >
> >> > > > The goal is to keep the programmer close to the problem at
> >> > hand
> >> > > > and away from the nuances of the underlying hardware.
> >> > Obviously
> >> > > > the more performance needed, the closer you need to get to
> >> > the
> >> > > hardware.
> >> > > > This decision goes beyond software tools, there are all
> >> kinds
> >> > > > of cost/benefits that need to be considered. And, then
> >> there
> >> > > > is IO ...
> >> > > >
> >> > > > --
> >> > > > Doug
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > >> Regards,
> >> > > >> Jonathan
> >> > > >> -----Original Message-----
> >> > > >> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of
> >> > Michael
> >> > > Di
> >> > > > Domenico
> >> > > >> Sent: 04 March 2019 17:39
> >> > > >> Cc: Beowulf Mailing List <beowulf at beowulf.org>
> >> > > >> Subject: Re: [Beowulf] Large amounts of data to store and
> >> > process
> >> > > On
> >> > > > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
> >> > > > <jaquilina at eagleeyet.net>
> >> > > >> wrote:
> >> > > >>> As previously mentioned we
> >> > don’t
> >> really need to have
> >> > > anything
> >> > > >>> indexed
> >> > > > so I am thinking flat files are the way to go my only
> >> concern
> >> > is
> >> > > the
> >> > > > performance of large flat files.
> >> > > >> potentially, there are many factors in the work flow that
> >> > > ultimately
> >> > > > influence the decision as others have pointed out. my
> >> flat
> >> > file
> >> > > example
> >> > > > is only one, where we just repeatable blow through the
> >> files.
> >> > > >>> Isnt that what HDFS is for to deal with large flat
> >> files.
> >> > > >> large is relative. 256GB file isn't "large" anymore.
> >> i've
> >> > pushed
> >> > > TB
> >> > > > files through hadoop and run the terabyte sort benchmark,
> >> and
> >> > yes it
> >> > > can
> >> > > > be done in minutes (time-scale), but you need an
> >> astounding
> >> > amount
> >> > > of
> >> > > > hardware to do it (the last benchmark paper i saw, it was
> >> > something
> >> > > 1000
> >> > > > nodes). you can accomplish the same feat using less and
> >> less
> >> > > > complicated hardware/software
> >> > > >> and if your dev's are willing to adapt to the hadoop
> >> > ecosystem, you
> >> > > sunk
> >> > > > right off the dock.
> >> > > >> to get a more targeted answer from the numerous smart
> >> people
> >> > on
> >> > > the
> >> > > > list,
> >> > > >> you'd need to open up the app and workflow to us.
> >> there's
> >> > just too
> >> > > many
> >> > > > variables _______________________________________________
> >> > > >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> >> > Penguin
> >> > > Computing
> >> > > > To change your subscription (digest mode or unsubscribe)
> >> > visit
> >> > > >> http://www.beowulf.org/mailman/listinfo/beowulf
> >> > > >> _______________________________________________
> >> > > >> Beowulf mailing list, Beowulf at beowulf.org sponsored by
> >> > Penguin
> >> > > Computing
> >> > > > To change your subscription (digest mode or unsubscribe)
> >> > visit
> >> > > >> http://www.beowulf.org/mailman/listinfo/beowulf
> >> > > >
> >> > > >
> >> > > > --
> >> > > > Doug
> >> > > >
> >> > > >
> >> > > >
> >> > > >
> >> > > > _______________________________________________
> >> > > > Beowulf mailing list, Beowulf at beowulf.org sponsored by
> >> > Penguin
> >> > > Computing
> >> > > > To change your subscription (digest mode or unsubscribe)
> >> > visit
> >> > > > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >> > > >
> >> > >
> >> > >
> >> > > --
> >> > > Doug
> >> > >
> >> > >
> >> > >
> >> > >
> >> >
> >> >
> >> > --
> >> > Doug
> >> >
> >> >
> >> >
> >> >
> >>
> >>
> >> --
> >> Doug
> >>
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> >> To change your subscription (digest mode or unsubscribe) visit
> >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >>
> >
>
>
> --
> Doug
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190314/dd74198f/attachment-0001.html>
More information about the Beowulf
mailing list