Data Representation and Number Systems

Objective

At the end of this chapter, you should be able to:

Represent non-negative whole numbers in different base/radix.
Convert non-negative whole numbers from one base to another base.
Represent negative whole numbers in different representations: sign-and-magnitude, 1s complement, 2s complement and excess-N.
Represent real numbers in fixed-point representations.
Represent real numbers in floating-point representations, mainly on single-precision floating-point representation.

Positive Integer

Decimal Number Systems

Decimal number system is our usual number system¹. This number system is a weighted-positional number system. In other words, the actual value of a symbol is based on its position in the number.

It is used because we have ten fingers (decimal means proceeding by 10). In fact, 10 is a magic number here. Decimal uses 10 symbols called digits: 0, , 1, 2, 3, 4, 5, 6, 7, 8 and 9. Each position has a weight of power of 10. Like we say, 10 is the magic number here. It is called the base or the radix.

Consider the number 7594.36. To get the value of the number, we simply multiply each digit by its weight. The weight is simply the base (i.e., 10) to the power of the position. Like a good programmer using array, the position of the first number to the left of the decimal point (i.e., a dot .) is 0. From here, moving to the left increments the position by 1 and moving to the right decrements the position by 1. And yes, we can go into a negative position.

Weighted Positional Number System

(7594.36)₁₀

= (7 × 10³) + (5 × 10²) + (9 × 10¹) + (4 × 10⁰) + (3 × 10^-1) + (6 × 10^-2)

= 7000 + 500 + 90 + 4 + 0.3 + 0.06

= 7594.36

Don't worry too much if you feel that the above exercise seems futile as the last value that we get is exactly the same as the value we start with. It will make more sense after we discuss other bases. As a convention, we write the base as a subscript after the number (e.g., (7594.36)₁₀). In the case that there is no subscript, we assume that it's base 10 unless the base is clear from the context (e.g., if we are talking about binary, then it makes more sense that the number we give is a binary even without subscript).

Other Number Systems

We can state other base systems using two sets of values: the weights and the set of symbols together with the set of mappings from the symbols to the base 10 numbers. The mapping to base 10 numbers is done because that's our usual number systems.

Typically, the number of symbols match the base number. However, it may not always be the case. Fortunately, for our purpose, the number of symbols should match the base number. As such, for simplicity, instead of specifying the mapping, we will simply order the symbols in increasing order. Each sucessive symbol is one more than the previous symbol. By definition, 0 is always 0.

Weight	Name	Symbols
2	Binary	Binary digits (bits): 0, 1
8	Octal	Octal digits: 0, 1, 2, 3, 4, 5, 6, 7
16	Hexadecimal	Hexadecimal digits: 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, A, B, C, D, E, F

In fact, we can define an arbitrary base R where the weight is R. To simplify the naming, we simply call these numbers as base R number or radix R number.

Encoding

We can represent numbers in certain bases in some programming languages and/or software.

CReplItMIPSVerilog

Octal: Prefix 0. e.g., 032 represents the octal number (32)₈.
Hexadecimal: Prefix 0x. e.g., 0x32 represents the hexadecimal number (32)₁₆.
Binary: Prefix 0b. e.g., 0b10100 represents the binary number (10100)₂.

Non-Standard

Some of these conventions may not be available in all compilers. However, the two most common compiles --GCC and Clang-- supports all of these.

In the MIPS Simulators that we will use, hexadecimal is represented with a prefix 0x. e.g., 0x100 represents the hexadecimal number (100)₁₆.

The following values are all the same in Verilog:

8'b11110000: Binary number (11110000)₂.
8'hF0: Hexadecimal number (F0)₁₆.
8'd240: Decimal number (240)₁₀.

Non-Standard Number System

For our purpose, the base/radix will be an integer at least 2.

Base 1Non-Integer Base

In theory, there is a base 1 number system but using our discussion method, it is a corner case and will be very hard to describe.

The main problem with this number system is that we can only have a single symbol. In which case, since we also assume that all 0 are the same in any number system, the symbol will have to be 0. But we also mention that leading 0s are ignored. So, we are now in a conundrum. In base 1 number system, we want to have the following "number":

(0)₁ = 0
(1)₁ = 00
(2)₁ = 000
(3)₁ = 0000
(4)₁ = 00000

But alas, they are now all 0. The typical representation chosen is to use the length of a string.

(0)₁ = ""
(1)₁ = "0"
(2)₁ = "00"
(3)₁ = "000"
(4)₁ = "0000"

In theory, we can also have a non-integer base such as base √2. In such cases, we will need to specify two values:

The base/radix.
The number of symbols.

Let's assume that we are using 2 symbols: 0 and 1. Then, we can translate (10001010)_√2 as usual:

(100010101)_√2

= (1 × √2⁸) + (1 × √2⁴) + (1 × √2²) + (1 × √2⁰)

= 16 + 4 + 2 + 1

= 23

As a general abstraction, given a base \(b\), any number in this base is simply the following formula:

\(\sum^{\infty}_{i=0} c_i \times b^i\)

Quick Quiz

QuestionAnswer

Can you find a simple way to convert from base √2 to base 2? You may assume that the number in base √2 is rational.

Simply remove all the bits in odd-numbered position. Assuming the number is rational, it will not contain multiples of √2. So, these positions will be 0.

Negative Integer

Negative values poses certain unique problem. To have a glimpse on the problem, note that the symbol used to represent negative number (e.g., -123, the symbol is -) is not part of the numeric symbol (i.e., [0-9]). This is going to be a problem when the only symbols we have are numeric symbols¹. As you will see later, the common solutions to the representation of negative number is to fix the number of bits.

Each negative number representation that we will discuss solves a certain problem. In particular, they make certain operations easier than others. As such, you should choose the representation that best suited for your problem.

Real Numbers

Unfortunately, we cannot represent all real numbers. For example, the number \(\pi\) has non-repeating infinite digits. Since a computer is a finite machine, we can never represent such infinitely repeating digits².

So any representation of real number is by design an approximation. Again, there are several possible low-level representation each with advantages and disadvantages. Choose whichever suits your current problem.

Or rather, symbol that can be interpreted as numeric symbol. Remember, computer only recognise two symbols: 0 and 1. Both of these can be interpreter as numeric. ↩↩
If you think about it, there can be such a way. Why not just represent the number symbolically like a circle ○. The problem with this is that even mathematics is known to be incomplete. In fact, some numbers are known to be non-computable. So yeah, unfortunately there is a limit to what we can represent. ↩