[Beowulf] Re: GPU Beowulf Clusters

Micha michf at post.tau.ac.il
Mon Feb 1 15:56:44 PST 2010


On 01/02/2010 22:54, richard.walsh at comcast.net wrote:
>
> Jon Forrest <jlforrest at berkeley.edu> wrote:
>
>  >On 2/1/2010 7:24 AM, richard.walsh at comcast.net wrote:
>  >
>  >> Coming in on this late, but to reduce this work load there is PGI's
> version
>  >> 10.0 compiler suite which supports accelerator compiler directives. This
>  >> will reduce the coding effort, but probably suffer from the classical
>  >> "if it is
>  >> easy, it won't perform as well" trade-off. My experience is limited, but
>  >> a nice intro can be found at:
>  >
>  >I'm not sure how much traction such a thing will get.
>  >Let's say you have a big Fortran program that you want
>  >to port to CUDA. Let's assume you already know where the
>  >program spends its time, so you know which routines
>  >are good candidates for running on the GPU.
>  >
>  >Rather than rewriting the whole program in C[++],
>  >wouldn't it be easiest to leave all the non-CUDA
>  >parts of the program in Fortran, and then to call
>  >CUDA routines written in C[++]. Since the CUDA
>  >routines will have to be rewritten anyway, why
>  >write them in a language which would require
>  >purchasing yet another compiler?
>
> Mmm ... not sure I understand the response, but perhaps this response
> was to a different message ... ?? In any case, the PGI software supports
> accelerator directives for both C and Fortran, so for those languages I do
> not see a need to rewrite whole applications. The question presented is
> the same as always, what does the performance-programming effort function
> look like and how well does your code perform with directives to start
> with. The PGI models is also hardware generic and the code runs on
> the CPU in parallel when there is no GPU around I believe. What will
> gate interest is how well PGI compiler group does at delivering performance
> and how important portability is to the person developing the code.
>

As far as I know pgi also has a Cuda Fortran similar to cuda c, not only a 
directive based approach, but I have to admit that I don't have any experience 
with it.

As for why spend money on a compiler since the code has to be re-written. Even 
an expensive compiler is cheap with regards to a programmer's time. Even for the 
salary of a cheap programmer you can buy the compiler in at most two weeks 
salary's worth.

On the other hand, you have a programmer that already knows fortran and a piece 
of code that is already written and debugged in fortran. Quite a few programs 
can produce a first unoptimized version with very little work.

Just sorting through counter based bugs and memory order bugs can cost you a lot 
more than the compiler. Fortran is 1 based compared to c that is 0 based 
(actually fortran 90/95 can use any index range for matrices). Fortran is column 
order while c is row order. Do you know how much head ache that can bring into 
the porting?

Translating matlab code into fortran is also much easier that into c due to 
these issues.

> HMPP make offers a similar proposition ...
>
> rbw
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org sponsored by Penguin Computing
> To change your subscription (digest mode or unsubscribe) visit http://www.beowulf.org/mailman/listinfo/beowulf




More information about the Beowulf mailing list