Thursday, September 12, 2013

C Program to find the floating point IEEE 754 representation


A computer can use only two kinds of values. That is, fixed point and floating point. The fixed point values are stored in the computer memory in binary format representing their ASCII value.
For example:-
Character ‘A’ can be stored as- 1000001. Because, 65 is ASCII value of ‘a’. In case of floating point values, these follow the IEEE 754 standard to store in memory. Whenever any programming language declared- float a; Then the variable 'a's value will be stored in memory by following IEEE 754 standard.
This standard specifies the single precision and double precision format. In case of C, C++ and Java, float and double data types specify the single and double precision which requires 32 bits (4-bytes) and 64 bits (8-bytes) respectively to store the data.
Lets have a look at these precision formats.
Single Precision:-
It requires 32 bit to store. Following is the format of single precision.
In order to store a float value in computer memory, a specified algorithm is followed.
Take an example at float value- 3948.125
  1. Covert 3948 to binary. i.e. 111101101100
  2. Convert .125 to binary,
         0.125 x 2 = 0.25    0
         0.25 x 2 = 0.5        0
        0.5 x 2 = 1             1
            = 0.001
Now 3948.125 = 111101101100.001
  1. Normalize the number so that the decimal point will be placed after MSB-1. i.e.
111101101100.001 = 1.11101101100001 x 211
  1. Now, for this number s=0, as the number is positive.
Exponent' = 11 and
Mantissa = 11101101100001
  1. Bias for single precision used is 127 so,
Final exponent = exponent' + 127 i.e.
E= 11 + 127= 138 = 10001010 in binary.
  1. Final value-


In this format the number 3948.125 will be stored in main memory.

For double precision values following changes are expected:
Total bits required – 64
Exponent – 11 bits
Mantissa – 52 bits
Bias value – 1023
Now, if you want to find the IEEE 754 representation at any floating point number, following program can be used.

#include<stdio.h>
int binary(int n, int i)
{
    int k;
    for (i--; i >= 0; i--)
   {
      k = n >> i;
      if (k & 1)
          printf("1");
      else
         printf("0");
    }
}
typedef union
{
      float f;
      struct
      {
            unsigned int mantissa : 23;
            unsigned int exponent : 8;
            unsigned int sign : 1;
       } field;
} myfloat;
int main()
{
           myfloat var;
    printf("Enter any float number: ");
           scanf("%f",&var.f);
           printf("%d ",var.field.sign);
           binary(var.field.exponent, 8);
           printf(" ");
           binary(var.field.mantissa, 23);
           printf("\n");
           return 0;
}
Explanation-
The function binary( ) is used to convert the number ‘n’ into binary format and print its ‘i’ number of bits.
In C, structure members can be specified with no. of bits with size. It is known as bit fields. As ‘float f’ is declared in ‘union myfloat’. It can use 23 bits to store mantissa exponent can use 8 and sign can use one! The variable ‘var’ is at myfloat type. So, in order to access mantissa, we can use ‘var.field. mantissa’. Here, mantissa is the name of internal structure. So, float value’s internal bits can be accessed bitwise with sign, exponent and mantissa separately.
Run the program and see the output of the said example!

2 comments:

  1. please provide verilog code also for the conversion of decimal to ieee 754 floating point.

    ReplyDelete
  2. please let me know how to convert c code to matlab or verilog

    ReplyDelete