Floating Point Representation of Binary Numbers

Binary Numbers (floating-point representation): In this tutorial, we will learn about the floating-point representation of binary numbers with the help of examples. By Saurabh Gupta Last updated : May 10, 2023

Prerequisite: Number systems

Binary representation of the floating-point numbers

We all very well know that very small and very large numbers in the decimal number system are represented using scientific notation form by stating a number (mantissa) and an exponent in the power of 10. Some of the examples are 6.27 * 10^-27 and 5.21 * 10³⁴. Similarly, Binary numbers can also be represented in the same form by stating a number (mantissa) and an exponent of 2. The format of this representation will be different for different machines.

The 16-bit machine consists of 10 bits as the mantissa and 6 bits for the exponent part whereas 24-bit machine consists of 15 bits for mantissa and 9 bits for exponent.

Format of the 16-bit machine can be represented as:

Mantissa Part	Exponent Part
0110011010	101010

The mantissa is written in 2's complement form, so the MSB of the Mantissa can be thought of as a sign bit. The binary point is assumed to be to the right of this sign bit. The 6-bit of the exponent can be used to represent 0 to 63, however, to express negative exponents a number (32)₁₀ or (100000)₂ is added to the desired exponent.

Excess-32 Representation

This is a common system to represent floating-point numbers. In this notation, to represent a negative exponent, we add (32)₁₀ to the given exponent which are given by the 6 bits.

Given table illustrates representation of exponent part.

Desired Exponent	2's complement notation	Excess-32 notation (in 6 bits)	Binary representation
-32	100000	100000 +100000 = 000000	000000
-31	100001	100001 +100000 = 000001	000001
-30	100010	100010 +100000 = 000010	000010
-15	110001	110001 +100000 = 010001	010001
0	000000	000000 +100000 = 100000	100000
+1	000001	000001 +100000 = 100001	100001
+15	001111	001111 +100000 = 101111	101111
+30	011110	011110 +100000 = 111110	111110
+31	011111	011111 +100000 = 111111	111111

Mantissa Part	Exponent Part
0110011010	101010

As given above, the floating-point number given in the above format is:

At the extreme left (MSB) is the sign-bit '0', which represents it is a positive number. Also, just after the sign-bit, we assume a binary point. Thus,

In Mantissa Part: .110011010
In Exponent Part:  101010, In Excess-32 notation,32 is already added. So
Subtracting 100000   001010 (i.e.,10 in decimal, so exponent part is 2¹⁰)
The number is N
    = +(.110011010)₂ * 2¹⁰
    = +(1100110100.00)
    = +(820)₁₀

Example 1: Express the following decimal number into 16-bit floating point number (45365.125)₁₀

Solution

Binary equivalent of (45365.125)₁₀: 1011000100110101.001
Binary format: .1011000100110101 * 2¹⁶
Mantissa: + .101100010
Exponent: 010000 (Value of exponent is 16)
Equivalent exponent: 010000 + 100000 = 110000

Since the number is a positive number an additional sign-bit '0' is added in the MSB.

So, the floating-point format will be 0101100010110000

Example 2: What floating point number do the given number 0100101001101011 represents

Solution

At the extreme left (MSB) is the sign-bit '0' which represents it is a positive number. Also, just after the sign-bit we assume a binary point. Thus,

In Mantissa Part: .100101001
In Exponent Part: 10101, In Excess-32 notation,32 is already added. So
Subtracting 100000 001011 (i.e.,11 in decimal, so exponent part is 2¹¹)
The number is N 
    = +(.100101001)₂ * 2¹¹
    = +(10010100100.0)
    = +(1188)₁₀

Signed Representation of Binary Numbers

r's and (r-1)'s Complement of Numbers

Floating Point Representation of Binary Numbers

Binary representation of the floating-point numbers

Excess-32 Representation

Example 1: Express the following decimal number into 16-bit floating point number (45365.125)10

Solution

Example 2: What floating point number do the given number 0100101001101011 represents

Solution

Example 1: Express the following decimal number into 16-bit floating point number (45365.125)₁₀