How can I compute the range of signed and unsigned types

James Cownie jcownie at etnus.com
Thu Apr 19 02:01:34 PDT 2001


For floating point you may also want to look at 

http://www.netlib.org/blas/machar.c

which appears to calculate most (all ?) of the interesting properties
of your floating point representation. (Of course it may get confused
by x86s working in 80 bit intermediates unless you're very careful how
you compile it, so, as ever, YMMV).

 This subroutine is intended to determine the parameters of the
 floating-point arithmetic system specified below.  The
 determination of the first three uses an extension of an algorithm
 due to M. Malcolm, CACM 15 (1972), pp. 949-951, incorporating some,
 but not all, of the improvements suggested by M. Gentleman and S.
 Marovich, CACM 17 (1974), pp. 276-277.  An earlier version of this
 program was published in the book Software Manual for the
 Elementary Functions by W. J. Cody and W. Waite, Prentice-Hall,
 Englewood Cliffs, NJ, 1980.  The present program is a
 translation of the Fortran 77 program in W. J. Cody, "MACHAR:
 A subroutine to dynamically determine machine parameters".
 TOMS (14), 1988.
 
 Parameter values reported are as follows:
 
      ibeta   - the radix for the floating-point representation
      it      - the number of base ibeta digits in the floating-point
                significand
      irnd    - 0 if floating-point addition chops
                1 if floating-point addition rounds, but not in the
                  IEEE style
                2 if floating-point addition rounds in the IEEE style
                3 if floating-point addition chops, and there is
                  partial underflow
                4 if floating-point addition rounds, but not in the
                  IEEE style, and there is partial underflow
                5 if floating-point addition rounds in the IEEE style,
                  and there is partial underflow
      ngrd    - the number of guard digits for multiplication with
                truncating arithmetic.  It is
                0 if floating-point arithmetic rounds, or if it
                  truncates and only  it  base  ibeta digits
                  participate in the post-normalization shift of the
                  floating-point significand in multiplication;
                1 if floating-point arithmetic truncates and more
                  than  it  base  ibeta  digits participate in the
                  post-normalization shift of the floating-point
                  significand in multiplication.
      machep  - the largest negative integer such that
                1.0+FLOAT(ibeta)**machep .NE. 1.0, except that
                machep is bounded below by  -(it+3)
      negeps  - the largest negative integer such that
                1.0-FLOAT(ibeta)**negeps .NE. 1.0, except that
                negeps is bounded below by  -(it+3)
      iexp    - the number of bits (decimal places if ibeta = 10)
                reserved for the representation of the exponent
                (including the bias or sign) of a floating-point
                number
      minexp  - the largest in magnitude negative integer such that
                FLOAT(ibeta)**minexp is positive and normalized
      maxexp  - the smallest positive power of  BETA  that overflows
      eps     - the smallest positive floating-point number such
                that  1.0+eps .NE. 1.0. In particular, if either
                ibeta = 2  or  IRND = 0, eps = FLOAT(ibeta)**machep.
                Otherwise,  eps = (FLOAT(ibeta)**machep)/2
      epsneg  - A small positive floating-point number such that
                1.0-epsneg .NE. 1.0. In particular, if ibeta = 2
                or  IRND = 0, epsneg = FLOAT(ibeta)**negeps.
                Otherwise,  epsneg = (ibeta**negeps)/2.  Because
                negeps is bounded below by -(it+3), epsneg may not
                be the smallest number that can alter 1.0 by
                subtraction.
      xmin    - the smallest non-vanishing normalized floating-point
                power of the radix, i.e.,  xmin = FLOAT(ibeta)**minexp
      xmax    - the largest finite floating-point number.  In
                particular  xmax = (1.0-epsneg)*FLOAT(ibeta)**maxexp
                Note - on some machines  xmax  will be only the
                second, or perhaps third, largest number, being
                too small by 1 or 2 units in the last digit of
                the significand.
 
-- Jim 

James Cownie	<jcownie at etnus.com>
Etnus, LLC.     +44 117 9071438
http://www.etnus.com





More information about the Beowulf mailing list