Mutations
Lab 3 (CS1101 AY2008/9 Semester 1)
This lab assignment is graded.
Date of release: 30 September 2008, Tuesday, 23:59.
Submission deadline: 8 October 2008, Wednesday, 23:59.
School of Computing, National University of Singapore

0 Introduction

This is your third graded lab. The fourth lab will be released on 14 October and the fifth and final lab will be released on 28th October.

This lab requires you to do two exercises. You are advised to review the material covered in chapters 1 through 5, and sections 10.1 through 10.6, and read Programming Style, Design and Marking Guidelines before attempting this lab assignment. You should not use syntax or constructs not covered in lectures; you shouldn't need to and may even be penalized for doing so (if the objective of the assignment is undermined). Unless a template is given, you should start each program afresh (ie. don't cut and paste and modify) so that you get sufficient practice with writing a complete Java program.

A word of advice: Code incrementally. Don't try to finish all parts of a program then compile it. Write your program in bits and pieces and compile frequently. Try to maintain a compilable program even while you are working on it. Submitting a compilable program that partially works is better than submitting an un-compilable one. This last point especially applies to the programming exams. Seriously, code incrementally.

Note that:

For more information on CourseMarker, please refer to CS1101X lab page.

For this lab, the number of submissions is set to 12. For subsequent labs, it may be reduced. Only your last submission will be graded.

If you have any questions, you may post your queries in the IVLE forum of your respective lecture group. Do not post your programs (partial or complete) in the forum before the deadline!


1 Exercise 1: Mutations (50%)

1.1 Learning objectives

1.2 Task

Deoxyribonucleic acid (DNA) is a nucleic acid that contains the genetic instructions used in the development and functioning of all known living organisms and some viruses. The main role of DNA molecules is the long-term storage of information. DNA is often compared to a set of blueprints or a recipe, since it contains the instructions needed to construct other components of cells, such as proteins and RNA molecules. The DNA segments that carry this genetic information are called genes, but other DNA sequences have structural purposes, or are involved in regulating the use of this genetic information. (Wikipedia).

During cell reproduction or under exposure to certain effects (such as ultraviolet light), mutations can occur to a segment of DNA. While most mutations can be disastrous (causing cancer), advantageous mutations can occur which results in beneficial evolutionary changes.

In this exercise we want to simulate the effects of mutations on a DNA string (with at most 1000 characters). For simplicity we will only consider a selected subset of chromosomal mutations. We would simulate the following (simplified) mutations:

Amplification (amp) : this inserts a duplicate copy of the specified region beside it;
Deletion (del) : this deletes the selected region;
Inversion (inv) : this reverses the orientation of the specified region.

The first line of the input consists of a string (with at most 1000 characters) that denotes the DNA sequence. The second line consists of a positive integer n denoting the number of mutations. The next n lines specify the mutations, each on a single line.

Each line representing a mutation consists of (1) the mutation command (which can be amp, del, or inv), (2) the start index (inclusive), and (3) the end index (inclusive). The first character in the DNA sequence has index 0. All commands are in lower case.

For example "amp 2 4" on "AGCTAGATT" results in "AGCTACTAGATT" as CTA is duplicated. The duplicated part is marked in blue for your reference.

The following is always true about the input and the DNA at every point of time:

The output consists of a single line containing the modified string.

You are not allowed to use String operations to solve the problem, see 1.5 Important notes. You are also not allowed to use the StringBuffer class at all. To be more specific, to perform operations, you should be working on a character array.

1.3 Sample run

Sample run using interactive input (user's input shown in blue; output shown in bold purple):

$ javac Mutations.java
$ java Mutations
AAAGGCCTTAGCTTAGATTACGATCG
1
amp 2 7
AAAGGCCTAGGCCTTAGCTTAGATTACGATCG

Another sample run is shown below

$ javac Mutations.java
$ java Mutations
AAAGGCCTTAGCTTAGATTACGATCG
2
amp 2 7
del 10 12
AAAGGCCTAGTTAGCTTAGATTACGATCG

Another sample run is shown below

$ javac Mutations.java
$ java Mutations
AAAGGCCTTAGCTTAGATTACGATCG
3
amp 2 7
del 10 12
del 0 2
GGCCTAGTTAGCTTAGATTACGATCG

Another sample run is shown below

$ javac Mutations.java
$ java Mutations
GGCCTTAGCTTAGATTACGATCG
1
inv 1 5
GTTCCGAGCTTAGATTACGATCG

You can try your own sample runs here:

1.4 Submission

Submit your program through CourseMarker.

1.5 Important notes

Strings are immutable - this means that once a String object has been created, it cannot be changed. When we perform string concatenation, we are actually creating a new string.

For example, if we have a loop as shown

String s = "ABCDE";
String output = "";
for (int i = 0; i < s.length; i++)
{
   output += "" + s.charAt(i) + s.charAt(i); 
}
System.out.println(output);

We are actually creating multiple new String objects at each iteration of the loop (at the bolded line)! This is very memory inefficient, especially for very long strings. The length of human DNA is about 3 billion base pairs (that is a very long string).


2 Exercise 2: Cache (50%)

2.1 Learning objectives

2.2 Task

In Computer Architecture, a cache is a collection of data duplicating some original values in the computer memory, where the original data is expensive to fetch (owing to longer access time) compared to the cost of reading the cache. In other words, a cache is a temporary storage area where frequently accessed data can be stored for rapid access.

An analogy of a cache would be a librarian. When requested for some books, a librarian would have to walk to the shelves to pick up the books. However, for frequently requested books, the librarian may put them into a bag she carries so that she can quickly produce them on request.

A computer typically has multiple levels of memory caches. However, to keep things simple (as we always do), we will only consider a computer with 1 level of cache to supplement the main memory. The main memory is equivalent to the library shelves, while the cache is like the librarian's bag.

When we look up the cache for some data and find it there, it is called a cache hit. Otherwise, it is called a cache miss. A cache hit requires only 20 nanoseconds. However in the event of a cache miss, since we have to copy the data from the main memory into the cache before we can read it from the cache, it would require 100 nanoseconds.

We assume the cache is initially empty. When the cache is full and we need to bring in some data from the main memory into the cache, we have to decide which existing item in the cache is to be replaced by the incoming item. We shall replace the least recently used item in the cache in this case.

We shall use an example below to illustrate the states of a cache. We assume that the cache can hold 8 items.

Initial state of the cache
Cache:          [1][5][9][8][2][7][12][13]
Timestamp:   [0][7][4][1][5][6][  3][  2]
Current time : 8

The cache array contains the items in each slot of the cache. The timestamp array denotes the last time a particular item was used. For example, item "1" (at index 0) was used at the 0th unit of time, and item "5" was used at the 7th unit of time, and so on. The current time is 8 now. If now the operating system (OS) requests for item "1", we see that there is a cache hit, and consequently, item 1 can be retrieved from the cache in 20 nanoseconds. We also update the timestamp of item "1" to 8 and increase the current time by 1 unit. See the new state of the cache below.

Cache after retrieving item 1.
Cache:          [1][5][9][8][2][7][12][13]
Timestamp:   [8][7][4][1][5][6][  3][  2]
Current time : 9

Now (at time 9), if the OS requests for item "14", we see that there is a cache miss. In this case, the OS has to fetch the data from the memory and put it into the cache which takes 100ns. Since the cache is already full, we replace the LEAST RECENTLY USED item with the newly retrieved item from the memory. In this case the least recently used item is item 8 with a timestamp of 1. See the new state of cache below.

Cache after retrieving item 14.
Cache:          [1][5][9][14][2][7][12][13]
Timestamp:   [8][7][4][  9][5][6][  3][  2]
Current time : 10

In this exercise, we want to simulate memory access on a cache with 8 slots, and a main memory of arbitrary large size. You may assume that all requested items are in the main memory and the initial state of the cache is all empty. The input is a series of LOAD commands (e.g. LOAD 12), with the last input being a STOP command. All commands are in upper case. Your output should be a statement denoting the total time required for all operations. Recall that a cache hit takes 20ns, and a cache miss takes 100ns.

The following is always true about this exercise:

2.3 Sample run

Sample run using interactive input (user's input shown in blue; output shown in bold purple):

$ javac Cache.java
$ java Cache
LOAD 19
LOAD 21
LOAD 3
LOAD 10
LOAD 7
STOP
Total time: 500ns.

Another sample run is shown below:

$ javac Cache.java
$ java Cache
LOAD 100
LOAD 300
LOAD 200
LOAD 300
LOAD 100
STOP
Total time: 340ns.

Another sample run is shown below:

$ javac Cache.java
$ java Cache
LOAD 3
LOAD 51
LOAD 24
LOAD 12
LOAD 3
LOAD 7
LOAD 51
LOAD 8
LOAD 90
LOAD 10
LOAD 5
LOAD 24
STOP
Total time: 1040ns.

You can try your own sample runs here:

2.4 Submission

Submit your program through CourseMarker.

2.5 Important notes

2.6 For discussion


3 Deadline

The deadline for handing in all programs is 8 October 2008, Wednesday, 23:59. Late submissions will NOT be accepted.




Mon Oct 6 12:01:00 SGT 2008