[Beowulf] fftw2, mpi, from 32 bit to 64 and fortran

Peter St. John peter.st.john at gmail.com
Thu Aug 7 07:58:01 PDT 2008


Maybe in the 32-bit compile, a value is stored in a 64-bit register, and
when it gets "robbed" (to populate the missing value for an adjacent
variable) the 32 bits of backfill are taken, so the remaining value is good;
but in a 64-bit compile, all 64 bits are taken so the remaininder is
rubbish. It would depend on both the compiler and the hardware, and the
takeaway is to not do that :-)
Peter

On 8/6/08, Gus Correa <gus at ldeo.columbia.edu> wrote:
>
> Hi Ricardo, David, Mark, and list
>
> If as Ricardo says, he suppressed the 5th parameter ("use_work") on the
> call
> to rfftwnd_f77_mpi, which has 6 parameters, wouldn't it start mismatching
> pointers
> on the 5th parameter, instead of on the 2nd parameter ("n_fields")?
> I.e. "use_work" would take the value of "FFTW_NORMAL_ORDER",
> and "FFTW_NORMAL_ORDER" would get a random value (OS permitting),
> but the initial 4 parameters would be correct, right?
> In any case, there is little difference between this and what David said,
> the point of failure is different, the nature is the same.
>
> However, it is interesting that somehow
> at runtime the program segfaults in 64-bits, but doesn't fail in 32-bits,
> although it most likely computes wrong stuff.
> Ricardo have you ever QCd' the 32-bit output before you fixed/inserted
> "use_work"?
> If you were in a big lucky strike the random value left on the
> FFTW_NORMAL_ORDER
> address matched your needs, and the result may be correct!   :)
>
> Anyway, somehow the program seems to behave differently,
> with the OS superego being more compliant (in a nasty sense) in 32-bits
> than it is in  64-bits.
> Does the OS paradoxically give less memory room for the stack in 64-bits,
> leading to the segfault?
> Or does it give the same room, but because the pointers are bigger the
> segfault is more likely?
> Or does the segfault happen somewhere else, not on the stack?
> Where?
> Why in 64-bits?
> Why not in 32 bits?
>
> Yes, as David noted about programming, here I also got and continue to get
> these bugs,
> particularly in Fortran programs where no parameter checking is enforced.
> And the nastier ones are those that don't segfault,
> then come back to haunt you when somebody looks at the output,
> if you are not careful enough to look at it before anybody else does.
>
> Cheers,
> Gus Correa
>
> Compilar e' preciso,
> rodar e' impreciso!
>
> ... mais uma do vosso alter-ego P'ssoa ... :)
>
> --
> ---------------------------------------------------------------------
> Gustavo J. Ponce Correa, PhD - Email: gus at ldeo.columbia.edu
> Lamont-Doherty Earth Observatory - Columbia University
> P.O. Box 1000 [61 Route 9W] - Palisades, NY, 10964-8000 - USA
> ---------------------------------------------------------------------
>
>
> Lombard, David N wrote:
>
>  On Tue, Aug 05, 2008 at 02:57:42AM -0700, Ricardo Reis wrote:
>>
>>
>>> On Mon, 4 Aug 2008, Mark Kosmowski wrote:
>>>
>>>
>>>
>>>> So, why did the 32-bit test case work?  Shouldn't the same problem
>>>> crash both systems if it is a code issue?
>>>>
>>>>
>>>
>> Not necessarily given the error described below.
>>
>>
>>
>>> I asked the same question myself... The function interface is:
>>>
>>>  call rfftwnd_f77_mpi(plan_c2r, &
>>>       1, local_data, work, use_work, FFTW_NORMAL_ORDER)
>>>
>>> where use_work is an integer, value 1 if you use the work temporary
>>> array, 0 otherwise. This was the variable I wasn't passing.
>>>
>>>
>> ...
>>
>>
>>> The wrapper function for this is (from rfftw_f77_mpi.c):
>>>
>>> void F77_FUNC_(rfftwnd_f77_mpi,RFFTWND_F77_MPI)
>>> (rfftwnd_mpi_plan *p, int *n_fields, fftw_real *local_data,
>>>  fftw_real *work, int *use_work, int *ioutput_order)
>>>
>>>
>>
>>
>>
>>> .... So it must be a pointer issue revealed by the 64 bit, no? When I
>>> wasn't doing it "properly" the value of *ioutput_order wasn't set.
>>>
>>>
>>
>> The value of the first element of local_data was used for the n_fields
>> scalar.
>>
>> The work array was being laid down starting at the location of the
>> use_work scalar.
>>
>> The FFTW_NORMAL_ORDER value was being interpreted as use_work scalar.
>>
>> Finally, ioutput_order scalar was some random value.
>>
>> So, a lot was going wrong there.  It's just one of life's little, um,
>>  pleasures
>> that it looked like it was working for your 32-bit test case.  Don't
>> worry, you'll
>> likely do this again, as likely *every* one of us on this list has, too.
>>
>> BTW, Fortran passes by reference; that's why all args are pointers.
>>
>>
>>
>
> _______________________________________________
> Beowulf mailing list, Beowulf at beowulf.org
> To change your subscription (digest mode or unsubscribe) visit
> http://www.beowulf.org/mailman/listinfo/beowulf
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.beowulf.org/pipermail/beowulf/attachments/20080807/399aaf0a/attachment.html>


More information about the Beowulf mailing list