Floating Point Representation of Binary Numbers

Binary Numbers (floating-point representation): In this tutorial, we will learn about the floating-point representation of binary numbers with the help of examples. By Saurabh Gupta Last updated : May 10, 2023

Prerequisite: Number systems

Binary representation of the floating-point numbers

We all very well know that very small and very large numbers in the decimal number system are represented using scientific notation form by stating a number (mantissa) and an exponent in the power of 10. Some of the examples are 6.27 * 10-27 and 5.21 * 1034. Similarly, Binary numbers can also be represented in the same form by stating a number (mantissa) and an exponent of 2. The format of this representation will be different for different machines.

The 16-bit machine consists of 10 bits as the mantissa and 6 bits for the exponent part whereas 24-bit machine consists of 15 bits for mantissa and 9 bits for exponent.

Format of the 16-bit machine can be represented as:

Mantissa PartExponent Part
0110011010101010

The mantissa is written in 2's complement form, so the MSB of the Mantissa can be thought of as a sign bit. The binary point is assumed to be to the right of this sign bit. The 6-bit of the exponent can be used to represent 0 to 63, however, to express negative exponents a number (32)10 or (100000)2 is added to the desired exponent.

Excess-32 Representation

This is a common system to represent floating-point numbers. In this notation, to represent a negative exponent, we add (32)10 to the given exponent which are given by the 6 bits.

Given table illustrates representation of exponent part.

Desired Exponent 2's complement notation Excess-32 notation (in 6 bits) Binary representation
-32 100000 100000 +100000 = 000000 000000
-31 100001 100001 +100000 = 000001 000001
-30 100010 100010 +100000 = 000010 000010
-15 110001 110001 +100000 = 010001 010001
0 000000 000000 +100000 = 100000 100000
+1 000001 000001 +100000 = 100001 100001
+15 001111 001111 +100000 = 101111 101111
+30 011110 011110 +100000 = 111110 111110
+31 011111 011111 +100000 = 111111 111111
Mantissa PartExponent Part
0110011010101010

As given above, the floating-point number given in the above format is:

At the extreme left (MSB) is the sign-bit '0', which represents it is a positive number. Also, just after the sign-bit, we assume a binary point. Thus,

In Mantissa Part: .110011010
In Exponent Part:  101010, In Excess-32 notation,32 is already added. So
Subtracting 100000   001010 (i.e.,10 in decimal, so exponent part is 210)
The number is N
    = +(.110011010)2 * 210
    = +(1100110100.00)
    = +(820)10

Example 1: Express the following decimal number into 16-bit floating point number (45365.125)10

Solution

Binary equivalent of (45365.125)10: 1011000100110101.001
Binary format: .1011000100110101 * 216
Mantissa: + .101100010
Exponent: 010000 (Value of exponent is 16)
Equivalent exponent: 010000 + 100000 = 110000

Since the number is a positive number an additional sign-bit '0' is added in the MSB.

So, the floating-point format will be 0101100010110000

Example 2: What floating point number do the given number 0100101001101011 represents

Solution

At the extreme left (MSB) is the sign-bit '0' which represents it is a positive number. Also, just after the sign-bit we assume a binary point. Thus,

In Mantissa Part: .100101001
In Exponent Part: 10101, In Excess-32 notation,32 is already added. So
Subtracting 100000 001011 (i.e.,11 in decimal, so exponent part is 211)
The number is N 
    = +(.100101001)2 * 211
    = +(10010100100.0)
    = +(1188)10



Comments and Discussions!

Load comments ↻





Copyright © 2024 www.includehelp.com. All rights reserved.