[Beowulf] Large amounts of data to store and process

John Hearns hearnsj at googlemail.com
Thu Mar 14 04:11:21 PDT 2019


Jonathan, a small correction if I may. Julia is not JIT - I asked on the
Julia discourse. A much better description is Ahead of Time compilation.
Not really important, but JIT triggers a certain response with most people.


On Thu, 14 Mar 2019 at 07:31, Jonathan Aquilina <jaquilina at eagleeyet.net>
wrote:

> Hi All,
>
>
>
> What sets Julia apart is it is not a compiled language but a Just In Time
> (JIT) language. I am still getting into it but it seems to be geared to
> complex and large data sets. As mentioned previously I am still working
> with a colleague on this prototype. With Julia at least there is an IDE so
> to speak for it. It is based on the ATOM IDE with a package that is
> installed specifically for Julia.
>
>
>
> I will obviously keep the list updated in regards to Julia and my
> experiences with it but the little I have looked at the language it is easy
> to write code for. Its still in its infancy as the latest version I believe
> is 1.0.1
>
>
>
> Regards,
>
> Jonathan
>
>
>
> *From:* Beowulf <beowulf-bounces at beowulf.org> *On Behalf Of *Scott Atchley
> *Sent:* 14 March 2019 01:17
> *To:* Douglas Eadline <deadline at eadline.org>
> *Cc:* Beowulf Mailing List <beowulf at beowulf.org>
> *Subject:* Re: [Beowulf] Large amounts of data to store and process
>
>
>
> I agree with your take about slower progress on the hardware front and
> that software has to improve. DOE funds several vendors to do research to
> improve technologies that will hopefully benefit HPC, in particular, as
> well as the general market. I am reviewing a vendor's latest report on
> micro-architectural techniques to improve performance (e.g., lower latency,
> increase bandwidth). For this study, they use a combination of DOE
> mini-apps/proxies as well as commercial benchmarks. The techniques that
> this vendor investigated showed potential improvements for commercial
> benchmarks but much less, if any, for the DOE apps, which are highly
> optimized.
>
>
>
> I will state that I know nothing about Julia, but I assume it is a
> higher-level language than C/C++ (or Fortran for numerical codes). I am
> skeptical that a higher-level language (assuming Julia is) can help. I
> believe the vendor's techniques that I am reviewing benefited commercial
> benchmarks because they are less optimized than the DOE apps. Using a
> high-level language relies on the language's compiler/interpreter and
> runtime. The developer has no idea what is happening or does not have the
> ability to improve it if profiling shows that the issue is in the runtime.
> I believe that if you need more performance, you will have to work for it
> in a lower-level language and there is no more free lunch (i.e., hoping the
> latest hardware will do it for me).
>
>
>
> Hope I am wrong.
>
>
>
>
>
> On Wed, Mar 13, 2019 at 5:23 PM Douglas Eadline <deadline at eadline.org>
> wrote:
>
>
> I realize it is bad form to reply ones own post and
> I forgot to mention something.
>
> Basically the HW performance parade is getting harder
> to celebrate. Clock frequencies have been slowly
> increasing while cores are multiply rather quickly.
> Single core performance boosts are mostly coming
> from accelerators. Added to the fact that speculation
> technology when managed for security, slows things down.
>
> What this means, the focus on software performance
> and optimization is going to increase because we can just
> buy new hardware and improve things anymore.
>
> I believe languages like Julia can help with this situation.
> For a while.
>
> --
> Doug
>
> >> Hi All,
> >> Basically I have sat down with my colleague and we have opted to go down
> > the route of Julia with JuliaDB for this project. But here is an
> > interesting thought that I have been pondering if Julia is an up and
> > coming fast language to work with for large amounts of data how will
> > that
> >> affect HPC and the way it is currently used and HPC systems created?
> >
> >
> > First, IMO good choice.
> >
> > Second a short list of actual conversations.
> >
> > 1) "This code is written in Fortran." I have been met with
> > puzzling looks when I say the the word "Fortran." Then it
> > comes, "... ancient language, why not port to modern ..."
> > If you are asking that question young Padawan you have
> > much to learn, maybe try web pages"
> >
> > 2) I'll just use Python because it works on my Laptop.
> > Later, "It will just run faster on a cluster, right?"
> > and "My little Python program is now kind-of big and has
> > become slow, should I use TensorFlow?"
> >
> > 3) <mcoy>
> > "Dammit Jim, I don't want to learn/write Fortran,C,C++ and MPI.
> > I'm a (fill in  domain specific scientific/technical position)"
> > </mcoy>
> >
> > My reply,"I agree and wish there was a better answer to that question.
> > The computing industry has made great strides in HW with
> > multi-core, clusters etc. Software tools have always lagged
> > hardware. In the case of HPC it is a slow process and
> > in HPC the whole programming "thing" is not as "easy" as
> > it is in other sectors, warp drives and transporters
> > take a little extra effort.
> >
> > 4) Then I suggest Julia, "I invite you to try Julia. It is
> > easy to get started, fast, and can grow with you application."
> > Then I might say, "In a way it is HPC BASIC, it you are old
> > enough you will understand what I mean by that."
> >
> > The question with languages like Julia (or Chapel, etc) is:
> >
> >   "How much performance are you willing to give up for convenience?"
> >
> > The goal is to keep the programmer close to the problem at hand
> > and away from the nuances of the underlying hardware. Obviously
> > the more performance needed, the closer you need to get to the hardware.
> > This decision goes beyond software tools, there are all kinds
> > of cost/benefits that need to be considered. And, then there
> > is IO ...
> >
> > --
> > Doug
> >
> >
> >
> >
> >
> >
> >
> >> Regards,
> >> Jonathan
> >> -----Original Message-----
> >> From: Beowulf <beowulf-bounces at beowulf.org> On Behalf Of Michael Di
> > Domenico
> >> Sent: 04 March 2019 17:39
> >> Cc: Beowulf Mailing List <beowulf at beowulf.org>
> >> Subject: Re: [Beowulf] Large amounts of data to store and process On
> > Mon, Mar 4, 2019 at 8:18 AM Jonathan Aquilina
> > <jaquilina at eagleeyet.net>
> >> wrote:
> >>> As previously mentioned we don’t really need to have anything
> >>> indexed
> > so I am thinking flat files are the way to go my only concern is the
> > performance of large flat files.
> >> potentially, there are many factors in the work flow that ultimately
> > influence the decision as others have pointed out.  my flat file example
> > is only one, where we just repeatable blow through the files.
> >>> Isnt that what HDFS is for to deal with large flat files.
> >> large is relative.  256GB file isn't "large" anymore.  i've pushed TB
> > files through hadoop and run the terabyte sort benchmark, and yes it can
> > be done in minutes (time-scale), but you need an astounding amount of
> > hardware to do it (the last benchmark paper i saw, it was something 1000
> > nodes).  you can accomplish the same feat using less and less
> > complicated hardware/software
> >> and if your dev's are willing to adapt to the hadoop ecosystem, you sunk
> > right off the dock.
> >> to get a more targeted answer from the numerous smart people on the
> > list,
> >> you'd need to open up the app and workflow to us.  there's just too many
> > variables _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> > To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >> _______________________________________________
> >> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin
> Computing
> > To change your subscription (digest mode or unsubscribe) visit
> >> http://www.beowulf.org/mailman/listinfo/beowulf
> >
> >
> > --
> > Doug
> >
> >
> >
> >
> > _______________________________________________
> > Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> > To change your subscription (digest mode or unsubscribe) visit
> > https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
> >
>
>
> --
> Doug
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> https://beowulf.org/cgi-bin/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://beowulf.org/pipermail/beowulf/attachments/20190314/4e02041a/attachment.html>


More information about the Beowulf mailing list