## NATIONAL UNIVERSITY OF SINGAPORE

## CS2100 - COMPUTER ORGANISATION

(Semester 2: AY2020/21)

Time Allowed: 2 Hours

## INSTRUCTIONS TO CANDIDATES

1. This assessment paper consists of SIXTEEN (16) questions (excluding question 0) in THREE (3) parts and comprises ELEVEN (11) printed pages.
2. Answer ALL questions.
3. This is an OPEN BOOK assessment.
4. The maximum mark of this assessment is 100.

| Question | Max. mark |
| :---: | :---: |
| Q0 | 3 |
| Part A: Q1-6 | 12 |
| Part B: Q7 -11 | 15 |
| Part C: Q12 | 12 |
| Part C: Q13 | 13 |
| Part C: Q14 | 13 |
| Part C: Q15 | 18 |
| Part C: Q16 | 14 |
| Total | $\mathbf{1 0 0}$ |

5. You are to submit a single pdf file (size $\leq 20 \mathrm{MB}$ ) to your submission folder on LumiNUS.
6. Your submitted file should be named after your Student Number (eg: A1234567X.pdf) and your Student Number should be written on the first page of your file.
7. Do NOT write your name anywhere in your submitted file.

## You do not need to write any answer for question 0.

0 . (a) Submit a single pdf file into the submission folder.
(b) Name your pdf file with your Student Number (eg: A1234567X.pdf).
(c) Write your Student Number (not your name!) on the first page of your file.

## Part A: Multiple-Choice Questions [Total: 6×2=12 marks]

Each multiple-choice question ( MCQ ) is worth TWO marks and has exactly one correct answer. You may write your answers into the boxes in the Answer Sheets provided or if you are using your own paper, write your answers on a single line to conserve space. For example:

1. A
2. B
3. C
4. D

Please write in CAPITAL LETTERS.

1. Given the Boolean function $\boldsymbol{F}(\mathbf{A}, \mathbf{B}, \mathbf{C}, \mathbf{D})=\Sigma \boldsymbol{\Sigma}(\mathbf{0}, \mathbf{1}, \mathbf{2}, \mathbf{7}, \mathbf{1 5})+\Sigma x(\mathbf{3}, \mathbf{9}, \mathbf{1 1}, \mathbf{1 2}, \mathbf{1 3})$ where $x$ denotes don't-care, how many prime implicants (PIs) are there on the K-map of $F$ ?
A. 3
B. 4
C. 5
D. 6
E. None of the above.
2. Given the same Boolean function in question 1 above, how many essential prime implicants (EPIs) are there on the K-map of $F$ ?
A. 0
B. 1
C. 2
D. 3
E. None of the above.
3. A certain machine has 3 types of instructions: $A, B$ and $C$, which have opcode of 3 bits, 5 bits, and 7 bits respectively. Assuming that each type must have at least one instruction, and the encoding space for opcode is completely utilized, what is the maximum total number of instructions using an expanding opcode scheme?
A. 98
B. 108
C. 110
D. 120
E. 128
4. Study the MIPS code below. $\$ \mathrm{vO}$ contains a 7-bit value in its least significant bits; the rest of its bits are zeroes.

| addi \$s0, \$zero, 0 |  |  |
| :---: | :---: | :---: |
| addi \$t0, \$v0, 0 |  |  |
| addi \$t1, \$zero, 0 |  |  |
| Loop: andi \$t2, \$t0, 1 |  |  |
| xor \$t1, \$t1, \$t2 |  |  |
| addi \$s0, \$s0, 1 |  |  |
| s7ti \$t3, \$s0, 7 |  |  |
| beq \$t3, \$zero, out |  |  |
| sr1 \$t0, \$t0, 1 |  |  |
| j Loop |  |  |
| Out: | s11 | \$t1, \$t1, 7 |
|  | or | \$v0, \$v0, \$t1 |

What does the code do?
A. Writes the number of 1 's in the value in $\$ v 0$ to bit 7 of $\$ v 0$.
B. Writes 0 to bit 0 of $\$ v 0$ if there are odd numbers of 1 's in the value in $\$ v 0$, otherwise writes 1 to bit 0 of $\$ \mathrm{v} 0$.
C. Writes 0 to bit 0 of $\$ v 0$ if there are even numbers of 1 's in the value in $\$ v 0$, otherwise writes 1 to bit 0 of $\$ \mathrm{v} 0$.
D. Writes 0 to bit 7 of $\$ v 0$ if there are odd numbers of 1 's in the value in $\$ v 0$, otherwise writes 1 to bit 7 of $\$ \mathrm{v} 0$.
E. Writes 0 to bit 7 of $\$ v 0$ if there are even numbers of 1 's in the value in $\$ v 0$, otherwise writes 1 to bit 7 of $\$ v 0$.
5. How many instructions are executed by the code in question 4 when it completes?
A. 12
B. 47
C. 52
D. 54
E. 59
6. Given the Boolean expression below:

$$
A \cdot B^{\prime} \cdot\left(C+D^{\prime}+E\right)^{\prime} \cdot F+\left(C^{\prime}+F^{\prime}\right)^{\prime}+D \cdot F
$$

Which of the following is equivalent to the above expression?
A. $F$
B. $D \cdot F$
C. $\left(C^{\prime}+D^{\prime}\right) \cdot F$
D. $(C+D) \cdot F$
E. None of the above.

## Part B: Multiple-Response Questions [Total: $5 \times 3=15$ marks]

Each multiple-response question (MRQ) is worth THREE marks and may have one answer or multiple answers. Write out all correct answers. For example, if you think that A, B, C are the correct answers, write A, B, C.

Only if you get all the answers correct will you be awarded three marks. No partial credit will be given for partially correct answers.
You may write your answers into the boxes in the Answer Sheets provided or if you are using your own paper, write your answers on a single line to conserve space. For example:
7. $A, B$
8. B,D
9. C
10. $A, B, C, D$
11. B,D,C

Please write in CAPITAL LETTERS.
7. Yucan wrote the value 66512 in his CS2100 exam, but did not specify the base. Choose from the list below all possible interpretations of this value.
A. The 5-digit 7's complement of 1557 .
B. The 5 -digit 6 's complement of $331566_{6}$.
C. $66512{ }_{10}$.
D. $-33488_{10}$ in 5 -digit 10 's complement of base 10.
E. $-22377_{10}$ in 5 -digit 9 's complement of base 9 .
8. Which of the following statements are true about the MIPS datapath?
A. The register file can only read one register at a time.
B. The register file can only write to one register at a time.
C. A jump instruction (j) may not always be possible to jump to another instruction that is 16 instructions away from it.
D. The target address of a branch instruction is given by PC + Immed x 4 .
E. In the pipelined system, the PC contains the address of the instruction that is currently in the WB stage.
9. Which of the following MIPS instructions may be used to set the value of register $\mathbf{\$ t} \mathbf{t}$ to zero?
A. and \$t2, \$zero, \$zero
B. andi \$t2, \$t2, 0
C. xor $\$ t 2, \$ t 2, \$ t 2$
D. xori $\$ t 2, \$ t 2,0$
E. nor \$t2, \$zero, \$zero
10. If the datapath control generates the wrong signal for MemToReg, that is, it generates 0 when it should be 1 , and 1 when it should be 0 , which of the following instructions will be wrongly executed?
A. bne
B. sw
C. Iw
D. add
E. ori
11. Assuming that logical constants 0 and 1 are available but complemented literals are not, which of the following functions can be implemented using a single 4-bit magnitude comparator with no additional logic gates?
A. $F 1(A, B, C, D)=A \cdot B \cdot C$
B. $F 2(A, B, C, D)=\Sigma m(8,10,12,14)$
C. $F 3(A, B, C, D)=\Sigma m(0,15)$
D. $F 4(A, B, C, D)=\Sigma m(1,2,3,4)$

## Part C: There are 5 questions in this part [Total: 70 marks]

## Q12. Sequential circuits [12 marks]

Study the following sequential circuit with three JK flip-flops and an external input $x$. The states are represented by $A B C$. This circuit implements a machine with 5 valid states $A B C=000$ to 100 (or 0 to 4 in decimal). The other 3 states ( $A B C=101$ to 111, or 5 to 7 in decimal) are unused.

(a) Complete the state diagram below by drawing the arrows and labelling each arrow with the value of $x$. States are shown in decimal values. You need only fill in the transitions to the used states $0,1,2,3$ and 4 . You do not need to include the unused states 5,6 and 7 in your diagram. The transition from state 4 has been drawn for you.

(b) If you did your state table correctly, you would see that state 4 is unreachable from any of the other used states $0,1,2$ and 3 . Redesign the above sequential circuit by removing state 4, using two D flip-flops to replace the 3 JK flip-flops. Call you flip flops $F$ and $G$. Your circuit now has 4 used states $F G=00$ to 11 (or 0 to 3 in decimal). What are your flip-flop inputs $\boldsymbol{D F}$ and $\boldsymbol{D G}$ ?

## Q13. Combinational circuits [13 marks]

Note that logical constants 0 and 1 are available, but not complemented literals.
(a) Given the following Boolean function

$$
F(A, B, C, D)=\Sigma m(4,6,7,9,11,12,14,15)
$$

implement the function using a single 8:1 multiplexer as shown on the right without any additional logic gates. Once you have used a variable on a selector line, you should not use the same variable on the multiplexer inputs.

(b) Given the same Boolean function in part (a) above, implement the function using a single 4:1 multiplexer as shown on the right without any additional logic gates. Once you have used a variable on a selector line, you should not use the same variable on the multiplexer inputs.

(c) A Boolean function $G(A, B, C)$ is implemented using a $3 \times 8$ decoder with 1-enable and active high outputs, 2 XNOR gates and an AND gate as shown on the right. Replace this circuit with a single logic gate.

(d) The F-block contains two half adders and a full adder as shown below. Using a single F-block, without any additional logic gates, design a circuit that takes in a 3-bit unsigned binary number $X=X_{2} X_{1} X_{0}$ and a one-bit value $y$ to generate the output $Z_{3} Z_{2} Z_{1} Z_{0}$ which is a binary number representing the value $X+5 y$. You may only connect inputs to the 6 inwards arrows and use the 5 outwards arrows for your outputs.


Page 7 of 11

## Q14. MIPS [13 marks]

Study the following MIPS code on integer arrays $A$ and $B$ which contain the same number of elements. The following are the variable mappings:

- $\$ \mathrm{~s} 0=$ base address of array $A$
- \$s1 = base address of array $B$
- $\$$ s2 = size (number of elements in array $A$ )
- $\$ \mathrm{~s} 5$ = count

```
.data
A: .word 10,21,12,17,9,1,20,33
B: .word 100,3,20,15,2,2,65,11
size: .word 8
.text
main: la $s0, A # $s0 is the base address of array A
            la $s1, B # $sl is the base address of array B
                                # Box 1
#
    add $s5, $0, $0 # I1
    add $t0, $0, $0 # I2
loop: slt $t8, $t0, $s2 # I3
    beq $t8, $0, end # I4
    sll $t1, $t0, 2 # I5
    add $t3, $t1, $s0 # I6
    lw $s3, 0($t3) # I7
    andi $t9, $s3, 1 # I8
    beq $t9, $0, skip # I9
    add $t4, $t1, $s1 # I10
    lw $s4, 0($t4) # I11
    sub $s3, $s3, $s4 # I12
    sw $s3, 0($t3) # I13
    addi $s5, $s5, 1 # I14
skip: addi $t0, $t0, 1 # I15
    j loop # I16
# --------------------------------
luscall # system call to print
```

Q14. (continue...)
Note that pseudo-instructions la and li are allowed, but not other pseudo-instructions.
(a) Fill in Box 1 with MIPS instruction(s) to load the value of size into $\mathbf{\$} \mathbf{s} \mathbf{2}$.
(b) Fill in Box 2 with MIPS instruction(s) to prepare for the printing of the value of count (\$s5). [2]
(c) Using the variable names ( $A, B$, size, count) shown in the variable mappings above, write an equivalent $C$ code that corresponds to instructions 11 to 116 in the above MIPS code. You may use additional variable(s) if needed. You do not need to declare the variables in your C code.
(d) Write the instruction encoding of instruction I3 (slt \$t8, \$t0, \$s2). Write your answer in hexadecimal.
(e) Write the instruction encoding of instruction I9 (beq \$t9, \$0, skip). Write your answer in hexadecimal.
(f) Write the instruction encoding of instruction I16 (j loop), assuming that instruction I1 (add $\$ 55, \$ 0, \$ 0$ ) is stored at address $0 \times 10000 \mathrm{C} 50$. Write your answer in hexadecimal.

## Q15. Cache [18 marks]

Refer to the following MIPS code which is the same as the one in question 14. Here, we only look at instructions I1 to I16.

|  | add | \$s5, | \$0, \$0 | \# | I1 |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  | add | \$t0, | \$0, \$0 | \# | I2 |
| loop: | slt | \$t8, | \$t0, \$s2 | \# | I3 |
|  | beq | \$t8, | \$0, end | \# | I4 |
|  | sll | \$t1, | \$t0, 2 | \# | I5 |
|  | add | \$t3, | \$t1, \$s0 | \# | I6 |
|  | lw | \$s3, | 0 (\$t3) | \# | I7 |
|  | andi | \$t9, | \$s3, 1 | \# | I8 |
|  | beq | \$t9, | \$0, skip | \# | I9 |
|  | add | \$t4, | \$t1, \$s1 | \# | I10 |
|  | lw | \$s4, | 0 (\$t4) | \# | I11 |
|  | sub | \$s3, | \$s3, \$s4 | \# | I12 |
|  | sw | \$s3, | 0 (\$t3) |  | I13 |
|  | addi | \$s5, | \$s5, 1 |  | I14 |
| skip: | addi | \$t0, | \$t0, 1 |  | I15 |
|  | j | loop |  | \# | I16 |
| end: |  |  |  |  |  |

Q15. (continue...)
For parts (a) to (d): You are given a direct-mapped data cache with 128 words in total and each block contains 4 words. You may assume the following:

- The data cache is used for lw instructions but not for sw instructions.
- Array $A$ starts at address $0 \times 5 F 000 C 00$ and all elements in array $A$ are positive odd integers.
- Array $B$ follows immediately after array $A$ in the memory. That is, if the last element of array $A$ is at address $x$, then the first element of array $B$ is at address $x+4$.
(a) How many bits are there in the index field? In the byte offset field?
(b) Given that array $A$ contains $\mathbf{3 2 0 0}$ elements and array $B$ contains the same number of elements, what is the hit rate of the data cache (i) for array $A$ and (ii) for array $B$ ? Note that you need to consider only the lw instructions but not the sw instructions.
(c) Given that array $A$ contains 3216 elements and array $B$ contains the same number of elements, what is the hit rate of the data cache (i) for array $A$ and (ii) for array $B$ ? Note that you need to consider only the Iw instructions but not the sw instructions.
(d) Given that array $A$ contains 3210 elements and array $B$ contains the same number of elements, what is the hit rate of the data cache (i) for array $A$ and (ii) for array $B$ ? Note that you need to consider only the lw instructions but not the sw instructions.

For parts (e) to (g): You are given a 2-way set-associative instruction cache with 16 words in total and each block contains 2 words, with LRU replacement policy. You may assume the following:

- There are 100 elements in array $A$ and the same number of elements in array $B$.
- All elements in array $A$ are positive odd integers.
(e) How many bits are there in the set index field? In the byte offset field?
(f) Assuming that instruction I1 (add \$s5, \$0, \$0) is stored at address 0x10000C50, how many hits in total are there in the instruction cache in the execution of the code? Consider only instructions I1 to I16.
(g) Assuming that instruction I1 (add $\$ \mathrm{~s} 5, \$ 0$, \$0) is stored at address 0x10000C54, how many hits in total are there in the instruction cache in the execution of the code? Consider only instructions I1 to I16.


## Q16. Pipelining [14 marks]

Refer to the following MIPS code which is the same as the one in question 14. Here, we only look at instructions I1 to I16.

| loop: | add | \$s5, | \$0, \$0 | \# | I1 |
| :---: | :---: | :---: | :---: | :---: | :---: |
|  | add | \$t0, | \$0, \$0 | \# | I2 |
|  | slt | \$t8, | \$t0, \$s2 | \# | I3 |
|  | beq | \$t8, | \$0, end | \# | I4 |
|  | sll | \$t1, | \$t0, 2 | \# | I5 |
|  | add | \$t3, | \$t1, \$s0 | \# | I6 |
|  | lw | \$s3, | 0 (\$t3) | \# | I7 |
|  | andi | \$t9, | \$s3, 1 | \# | I8 |
|  | beq | \$t9, | \$0, skip | \# | I9 |
|  | add | \$t4, | \$t1, \$s1 | \# | I10 |
|  | 1 w | \$s4, | 0 (\$t4) | \# | I11 |
|  | sub | \$s3, | \$s3, \$s4 | \# | I12 |
|  | Sw | \$s3, | 0 (\$t3) | \# | I13 |
| skip: | addi | \$s5, | \$s5, 1 | \# | I14 |
|  | addi | \$t0, | \$t0, 1 | \# | I15 |
|  | j | loop |  | \# | I16 |

Assuming a 5-stage MIPS pipeline and all elements in array $A$ are positive odd integers, answer the following questions. You need to count until the last stage of instruction I16.
(a) How many cycles does this code segment take to complete its execution in the first iteration (I1 to I16) in an ideal pipeline, that is, one with no delays?

For parts (b) to (d) below, given the assumption for each part, how many additional cycles does this code segment (I1 to I16) take to complete its execution in the first iteration as compared to an ideal pipeline? (For example, if part (a) takes 12 cycles and part (b) takes 20 cycles, you are to answer part (b) with the value 8 and not 20.)
(b) Assuming without forwarding and branch decision is made at MEM stage (stage 4). No branch prediction is made and no delayed branching is used.
(c) Assuming without forwarding and branch decision is made at ID stage (stage 2). No branch prediction is made and no delayed branching is used.
(d) Assuming with forwarding and branch decision is made at ID stage (stage 2). Branch prediction is made where the branch is predicted not taken, and no delayed branching is used.
(e) Assuming the setting in part (d) above and you are not allowed to modify any of the instructions, is it possible to reduce the additional delay cycles in part (d) by rearranging some instructions, and if possible, by how many cycles? Explain your answer. (Answer with no explanation will not be awarded any mark.)
=== END OF PAPER ===

