[Beowulf] Large amounts of data to store and process

Prentice Bisbal pbisbal at pppl.gov
Fri Mar 15 09:21:06 PDT 2019


Next to your cane.

On 3/14/19 5:52 PM, Jeffrey Layton wrote:
> Damn. I knew I forgot something. Now where are my glasses.
>
>
> On Thu, Mar 14, 2019, 17:17 Douglas Eadline <deadline at eadline.org 
> <mailto:deadline at eadline.org>> wrote:
>
>
>     > I don't want to interrupt the flow but I'M feeling cheeky. One
>     word can
>     > solve everything "Fortran". There I said it.
>
>     Of course, but you forgot "now get off my lawn"
>
>     --
>     Doug
>
>     >
>     > Jeff
>     >
>     >
>     > On Thu, Mar 14, 2019, 17:03 Douglas Eadline
>     <deadline at eadline.org <mailto:deadline at eadline.org>> wrote:
>     >
>     >>
>     >> > Then given we are reaching these limitations how come we don’t
>     >> integrate
>     >> > certain things from the HPC world into every day computing so to
>     >> speak.
>     >>
>     >> Scalable/parallel computing is hard and hard costs time and money.
>     >> In HPC the performance often justifies the means, in other
>     >> sectors the cost must justify the means.
>     >>
>     >> HPC has traditionally trickled down in to other sectors. However,
>     >> many or the HPC problem types are not traditional computing
>     >> problems. This situation is changing a bit with things
>     >> like Hadoop/Spark/Tensor Flow
>     >>
>     >> --
>     >> Doug
>     >>
>     >>
>     >> >
>     >> > On 14/03/2019, 19:14, "Douglas Eadline"
>     <deadline at eadline.org <mailto:deadline at eadline.org>>
>     >> wrote:
>     >> >
>     >> >
>     >> >     > Hi Douglas,
>     >> >     >
>     >> >     > Isnt there quantum computing being developed in terms
>     of CPUs at
>     >> > this
>     >> >     > point?
>     >> >
>     >> >     QC is (theoretically) unreasonably good at some things at
>     other
>     >> >     there may me classic algorithms that work better. As far as I
>     >> know,
>     >> >     there has been no demonstration of "quantum
>     >> >     supremacy" where a quantum computer is shown
>     >> >     to be faster than a classical algorithm.
>     >> >
>     >> >     Getting there, not there yet.
>     >> >
>     >> >     BTW, if you want to know what is going on with QC
>     >> >     read Scott Aaronson's blog
>     >> >
>     >> > https://www.scottaaronson.com/blog/
>     >> >
>     >> >     I usually get through the first few paragraphs and
>     >> >     then whoosh over my scientific pay grade
>     >> >
>     >> >
>     >> >     > Also is it really about the speed any more rather then how
>     >> >     > optimized the code is to take advantage of the multiple
>     cores
>     >> that
>     >> > a
>     >> >     > system has?
>     >> >
>     >> >     That is because the clock rate increase slowed to a crawl.
>     >> >     Adding cores was a way to "offer" more performance, but
>     introduced
>     >> >     the "multi-core tax." That is, programing for multi-core is
>     >> >     harder and costlier than a single core. Also, much
>     >> >     harder to optimize. In HPC we are lucky, we are used to
>     >> >     designing MPI codes that scale with more cores (no mater
>     >> >     where they live, same die, next socket, another server).
>     >> >
>     >> >     Also, more cores usually means lower single core
>     >> >     frequency to fit into a given power envelope (die shrinks
>     help
>     >> >     with this but based on everything I have read, we are about
>     >> >     at the end of the line) It also means lower absolute memory
>     >> >     BW per core although more memory channels help a bit.
>     >> >
>     >> >     --
>     >> >     Doug
>     >> >
>     >> >
>     >> >     >
>     >> >     > On 13/03/2019, 22:22, "Douglas Eadline" <
>     >> deadline at eadline.org <mailto:deadline at eadline.org>>
>     >> > wrote:
>     >> >     >
>     >> >     >
>     >> >     >     I realize it is bad form to reply ones own post and
>     >> >     >     I forgot to mention something.
>     >> >     >
>     >> >     >     Basically the HW performance parade is getting harder
>     >> >     >     to celebrate. Clock frequencies have been slowly
>     >> >     >     increasing while cores are multiply rather quickly.
>     >> >     >     Single core performance boosts are mostly coming
>     >> >     >     from accelerators. Added to the fact that speculation
>     >> >     >     technology when managed for security, slows things
>     down.
>     >> >     >
>     >> >     >     What this means, the focus on software performance
>     >> >     >     and optimization is going to increase because we
>     can just
>     >> >     >     buy new hardware and improve things anymore.
>     >> >     >
>     >> >     >     I believe languages like Julia can help with this
>     situation.
>     >> >     >     For a while.
>     >> >     >
>     >> >     >     --
>     >> >     >     Doug
>     >> >     >
>     >> >     >     >> Hi All,
>     >> >     >     >> Basically I have sat down with my colleague and
>     we have
>     >> opted
>     >> > to go
>     >> >     > down
>     >> >     >     > the route of Julia with JuliaDB for this project.
>     But here
>     >> is
>     >> > an
>     >> >     >     > interesting thought that I have been pondering if
>     Julia is
>     >> an
>     >> > up
>     >> >     > and
>     >> >     >     > coming fast language to work with for large
>     amounts of
>     >> data
>     >> > how
>     >> >     > will
>     >> >     >     > that
>     >> >     >     >> affect HPC and the way it is currently used and HPC
>     >> systems
>     >> >     > created?
>     >> >     >     >
>     >> >     >     >
>     >> >     >     > First, IMO good choice.
>     >> >     >     >
>     >> >     >     > Second a short list of actual conversations.
>     >> >     >     >
>     >> >     >     > 1) "This code is written in Fortran." I have been
>     met with
>     >> >     >     > puzzling looks when I say the the word "Fortran."
>     Then it
>     >> >     >     > comes, "... ancient language, why not port to
>     modern ..."
>     >> >     >     > If you are asking that question young Padawan you
>     have
>     >> >     >     > much to learn, maybe try web pages"
>     >> >     >     >
>     >> >     >     > 2) I'll just use Python because it works on my
>     Laptop.
>     >> >     >     > Later, "It will just run faster on a cluster, right?"
>     >> >     >     > and "My little Python program is now kind-of big
>     and has
>     >> >     >     > become slow, should I use TensorFlow?"
>     >> >     >     >
>     >> >     >     > 3) <mcoy>
>     >> >     >     > "Dammit Jim, I don't want to learn/write
>     Fortran,C,C++ and
>     >> > MPI.
>     >> >     >     > I'm a (fill in  domain specific scientific/technical
>     >> > position)"
>     >> >     >     > </mcoy>
>     >> >     >     >
>     >> >     >     > My reply,"I agree and wish there was a better
>     answer to
>     >> that
>     >> >     > question.
>     >> >     >     > The computing industry has made great strides in
>     HW with
>     >> >     >     > multi-core, clusters etc. Software tools have always
>     >> lagged
>     >> >     >     > hardware. In the case of HPC it is a slow process and
>     >> >     >     > in HPC the whole programming "thing" is not as
>     "easy" as
>     >> >     >     > it is in other sectors, warp drives and transporters
>     >> >     >     > take a little extra effort.
>     >> >     >     >
>     >> >     >     > 4) Then I suggest Julia, "I invite you to try
>     Julia. It is
>     >> >     >     > easy to get started, fast, and can grow with you
>     >> > application."
>     >> >     >     > Then I might say, "In a way it is HPC BASIC, it
>     you are
>     >> old
>     >> >     >     > enough you will understand what I mean by that."
>     >> >     >     >
>     >> >     >     > The question with languages like Julia (or
>     Chapel, etc)
>     >> is:
>     >> >     >     >
>     >> >     >     >   "How much performance are you willing to give
>     up for
>     >> >     > convenience?"
>     >> >     >     >
>     >> >     >     > The goal is to keep the programmer close to the
>     problem at
>     >> > hand
>     >> >     >     > and away from the nuances of the underlying hardware.
>     >> > Obviously
>     >> >     >     > the more performance needed, the closer you need
>     to get to
>     >> > the
>     >> >     > hardware.
>     >> >     >     > This decision goes beyond software tools, there
>     are all
>     >> kinds
>     >> >     >     > of cost/benefits that need to be considered. And,
>     then
>     >> there
>     >> >     >     > is IO ...
>     >> >     >     >
>     >> >     >     > --
>     >> >     >     > Doug
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >> Regards,
>     >> >     >     >> Jonathan
>     >> >     >     >> -----Original Message-----
>     >> >     >     >> From: Beowulf <beowulf-bounces at beowulf.org
>     <mailto:beowulf-bounces at beowulf.org>> On Behalf Of
>     >> > Michael
>     >> >     > Di
>     >> >     >     > Domenico
>     >> >     >     >> Sent: 04 March 2019 17:39
>     >> >     >     >> Cc: Beowulf Mailing List <beowulf at beowulf.org
>     <mailto:beowulf at beowulf.org>>
>     >> >     >     >> Subject: Re: [Beowulf] Large amounts of data to
>     store and
>     >> > process
>     >> >     > On
>     >> >     >     > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
>     >> >     >     > <jaquilina at eagleeyet.net
>     <mailto:jaquilina at eagleeyet.net>>
>     >> >     >     >> wrote:
>     >> >     >     >>> As previously mentioned we
>     >> > don’t
>     >> really need to have
>     >> >     > anything
>     >> >     >     >>> indexed
>     >> >     >     > so I am thinking flat files are the way to go my only
>     >> concern
>     >> > is
>     >> >     > the
>     >> >     >     > performance of large flat files.
>     >> >     >     >> potentially, there are many factors in the work
>     flow that
>     >> >     > ultimately
>     >> >     >     > influence the decision as others have pointed
>     out.  my
>     >> flat
>     >> > file
>     >> >     > example
>     >> >     >     > is only one, where we just repeatable blow
>     through the
>     >> files.
>     >> >     >     >>> Isnt that what HDFS is for to deal with large flat
>     >> files.
>     >> >     >     >> large is relative.  256GB file isn't "large"
>     anymore.
>     >> i've
>     >> > pushed
>     >> >     > TB
>     >> >     >     > files through hadoop and run the terabyte sort
>     benchmark,
>     >> and
>     >> > yes it
>     >> >     > can
>     >> >     >     > be done in minutes (time-scale), but you need an
>     >> astounding
>     >> > amount
>     >> >     > of
>     >> >     >     > hardware to do it (the last benchmark paper i
>     saw, it was
>     >> > something
>     >> >     > 1000
>     >> >     >     > nodes).  you can accomplish the same feat using
>     less and
>     >> less
>     >> >     >     > complicated hardware/software
>     >> >     >     >> and if your dev's are willing to adapt to the hadoop
>     >> > ecosystem, you
>     >> >     > sunk
>     >> >     >     > right off the dock.
>     >> >     >     >> to get a more targeted answer from the numerous
>     smart
>     >> people
>     >> > on
>     >> >     > the
>     >> >     >     > list,
>     >> >     >     >> you'd need to open up the app and workflow to us.
>     >> there's
>     >> > just too
>     >> >     > many
>     >> >     >     > variables
>     _______________________________________________
>     >> >     >     >> Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by
>     >> > Penguin
>     >> >     > Computing
>     >> >     >     > To change your subscription (digest mode or
>     unsubscribe)
>     >> > visit
>     >> >     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
>     >> >     >     >> _______________________________________________
>     >> >     >     >> Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by
>     >> > Penguin
>     >> >     > Computing
>     >> >     >     > To change your subscription (digest mode or
>     unsubscribe)
>     >> > visit
>     >> >     >     >> http://www.beowulf.org/mailman/listinfo/beowulf
>     >> >     >     >
>     >> >     >     >
>     >> >     >     > --
>     >> >     >     > Doug
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     >
>     >> >     >     > _______________________________________________
>     >> >     >     > Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by
>     >> > Penguin
>     >> >     > Computing
>     >> >     >     > To change your subscription (digest mode or
>     unsubscribe)
>     >> > visit
>     >> >     >     > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     >> >     >     >
>     >> >     >
>     >> >     >
>     >> >     >     --
>     >> >     >     Doug
>     >> >     >
>     >> >     >
>     >> >     >
>     >> >     >
>     >> >
>     >> >
>     >> >     --
>     >> >     Doug
>     >> >
>     >> >
>     >> >
>     >> >
>     >>
>     >>
>     >> --
>     >> Doug
>     >>
>     >> _______________________________________________
>     >> Beowulf mailing list, Beowulf at beowulf.org
>     <mailto:Beowulf at beowulf.org> sponsored by Penguin Computing
>     >> To change your subscription (digest mode or unsubscribe) visit
>     >> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>     >>
>     >
>
>
>     -- 
>     Doug
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit https://beowulf.org/cgi-bin/mailman/listinfo/beowulf

-- 
Prentice Bisbal
Lead Software Engineer
Princeton Plasma Physics Laboratory
https://www.pppl.gov

-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190315/774612e3/attachment-0001.html>


More information about the Beowulf mailing list