Lab #8 (AY2007/8 Semester 1)

Date of release: 23 October 2007, Tuesday, 23:59.

Submission deadline: 31 October 2007, Wednesday, 23:59.

School of Computing, National University of Singapore

You are advised to review the material covered in chapters 1 through 12.

This is an **ungraded** lab assignment, but you are encouraged to
complete at least Exercise 1. Exercise 2 is a challenger.

This lab assignment concerns **program efficiency**. While the study
of efficiency is a topic of a follow-up module (CS1102), usually under
the subject of **asymptotic analysis**, **complexity analysis**
or **analysis of algorithms** where concepts such as big-O notation
are introduced, here we shall contend ourselves with empirical results.
For those who are interested
in the theoretical aspects of asymptotic analysis of algorithms, there
are reading materials abound on the Internet. For group CS1101X, I
will cover this non-examinable topic at an introductory level if
I have spare time, to whet your appetite a little.

Do not use the following in your programs for this lab assignment. (If you are in doubt whether you can use certain features not mentioned below and are not yet covered in class, please consult your discussion leader or lecturer first):

- Recursion
- Inheritance and polymorphism
- Advanced data structures (such as, but not limited to, trees, heaps, graphs) that are not covered in CS1101.

You may assume that **all input data are correct**.

**Test your programs thoroughly with your own input data before you
submit to CourseMarker.** Do not use CourseMarker as a debugging tool.
For this lab, since it will not be graded, the number of submissions
is set to **30**.

- File input/output
- Random number
- Writing a basic sort
- Timing a program

In preparation for Exercise 1 in this assignment, you need to create a
text file `numbers` that contains many integers. The
text file should contain a positive value *n* on its first line.
This is to be followed by *n* lines of random non-negative integers
in the range [0, 20000].

Write a program `GenerateIntegerList.java` to create the
above text file `numbers`, Your program should preferably take
an inline argument for *n*, as shown below:

$ java GenerateIntegerList 5

instead of requesting for *n* during the run:

$ java GenerateIntegerList Please enter n: 5

The above examples should create the text file `numbers` that
contains 6 lines of integers, as shown below (the first line contains
5, while the subsequent lines contain the 5 random integers generated):

5 7843 13918 3607 16886 17699

Refer to Chapter 12 File Input and Output on how to create a text file, or you may use the output redirection operator '>' if you are using UNIX.

The 3 basic sorting techniques are **Bubble Sort**, **Selection Sort**,
and **Insertion Sort**. Bubble Sort and Selection Sort have been covered
in class; the algorithm of Insertion Sort on an *n*-element array A is
as follows: on pass *i* (1 ≤ *i* < *n*),
insert A[*i*] into the correct position in the sorted region to
its left: A[0] ... A[*i* - 1]. At the end of pass *n* - 1,
the array would be sorted.

Write a program `BasicSort.java` to implement one of the 3 basic
sorts. The array should be an integer array, and your program should
read the values from the text file `numbers` created earlier.
You may use the `Scanner` class for reading from a text file.
Two demonstration programs
DemoLab8v1.java
and
DemoLab8v2.java
show how you may read data from a text file
numbers
that contains some integers are given.

We usually assess the running time of a sort algorithm by counting
the number of comparisons or the number of swaps between elements.
In the worst case all the 3 basic sorts have
to make roughly *n* × (*n*-1) / 2 comparisons, or
O(*n*^{2}) comparisons in big-O notation.
Faster sorting techniques, such as
heap sort, merge sort, quick sort, just to name a few, have fewer
comparisons, and they are covered in CS1102 and other advanced modules.

To find out how fast your sort algorithm works, time your program.
(Refer to **section 6.10 Estimating the Execution Time**
on page 336 in your textbook.) You should run your program on
different arrays of various sizes, for example, 2000, 4000, 6000 elements,
etc. Plot a graph of *n* (array size) against the running time.

Sample run (output shown in
**bold purple**):

$ javac BasicSort.java $ java BasicSortArray size : 30000 Elapsed time: 3.309 sec.

Submit your program `BasicSort.java` through CourseMarker.

- As the basic sorts are slow, you will find that they take a long time for large arrays.
- There are plenty of reading materials on asymptotic analysis and big-O notation. You may read up Big O notation from Wikipedia.

- Writing efficient program

The most common way of representing **pixels** (picture elements)
is to use the RGB (red, green and blue) values, which range from 0 to 255.

Given a list of pixels represented by their RGB values, we may define
**similarity** as follows:
A list is of similarity measure ** D** if for all pixels in
this list,

Given the RGB values of *n* pixels, and the integer constants
*A*, *B*, *C*, and *D*, we want to find the
**size of the largest sublist of pixels** with similarity *D*.
Write a program `Pixels.java` to solve this problem.

The first line of the input contains 5 integers:
*n* (where 1 ≤ *n* ≤ 200), *A*, *B*,
*C*, and *D* (where 1 ≤ *A*, *B*, *C*
≤ 10000, and 1 ≤ *D* ≤ 10000000).

The following *n* lines each contains three integers
*r*, *g* and *b* (where 0 ≤ *r*, *g*,
*b* ≤ 255) representing the RGB values of a pixel.

Your program should output the size of the largest sublist of these
pixels of similarity measure *D*.

For example, given a list of 5 pixels:

3 7 5 4 8 6 5 9 7 4 9 7 8 9 6

and given also that *A* = *B* = *C* = 1, and *D* = 5,
if we pick a sublist that contains the first, second and fourth pixels
{(3,7,5), (4,8,6), (4,9,7)},
then *r*_{min} = 3,
*g*_{min} = 7, and
*b*_{min} = 5.
For the first pixel (3,7,5), the measure is
1×(3-3) +
1×(7-7) +
1×(5-5) = 0.
For the second pixel (4,8,6), the measure is
1×(4-3) +
1×(8-7) +
1×(6-5) = 3.
For the fourth pixel (4,9,7), the measure is
1×(4-3) +
1×(9-7) +
1×(7-5) = 5.

Since all the measures are less than or equal to *D* (5),
this sublist of 3 pixels satisfies the similarity requirement. However,
this is not the largest sublist possible. The largest sublist that satisfies
the similarity requirement consists of the second, third, fourth
and fifth pixels. (Verify this yourself.) Since the size of this sublist
is 4, and we cannot find a larger sublist that satisfies the similarity
requirement, the answer is 4.

This problem is a challenger because an inefficient algorithm would take too long. To have an indication of how fast your algorithm runs, you should time your program. A fast algorithm should be able to produce the result within 1 second for the most complicated data set on which your program will be tested.

Four test data sets used by CourseMarker for this task are included here for your testing. Study these data sets and their answers so that you know exactly what the question demands. Your program will be tested with 10 data sets.

Input:
set1 |
set2 |
set3 |
set9

Output (with timing on sunfire):
set1 |
set2 |
set3 |
set9

Output (with timing on Pentium IV):
set1 |
set2 |
set3 |
set9

(The timing information in the output above is only for your information, so that you have an idea how fast your algorithm works. Your program should only generate one line of output, containing a single integer.)

If your are able to clear these three test data sets, you may send your program directly to me (tantc @ comp.nus.edu.sg) before 1 November, 2007, 1pm for further testing. I will test your program in sunfire on more test data sets only after 1 November (therefore, you should ensure that your program works, instead of asking me to check whether it works). The first 3 students (from any CS1101 lecture group) who are able to pass all my test data sets within the stipulated running time of 1 second will each get a very small prize as a token of encouragement. (Selection is based on the first 3 successful submissions, and not on the speed of the programs.)

Please note that if you search hard enough in the Internet, you might be
able to find the solution. However, I trust that you would attempt
this __on your own__ without external help.

This problem can be solved using what you have learned in CS1101 so far. Programs that use advanced data structures (such as, but not limited to, trees, heaps and graphs) will not be accepted.

Sample run (user's input shown in
blue; output shown in
**bold purple**):

$ javac Pixels.java $ java Pixels 5 1 1 1 5 3 7 5 4 8 6 5 9 7 4 9 7 8 9 64

Submit your program `Pixels.java` through CourseMarker.
If it passes the four test data sets, you mail email the program
to Aaron (tantc @ comp.nus.edu.sg).

- An inefficient algorithm will probably not be able to compute the answer within the time limit.
- The 1-second time limit should be viewed as a guide. I am not going to distinguish a program that takes 0.9 second from another that takes 0.5 second. An efficient algorithm will solve all test data sets within 1 second on most machines, or maybe slightly longer on the slowest machine. On the other hand, an inefficient algorithm will take exceedingly long, probably many seconds or even minutes, to run.

This is an ungraded assignment, but a deadline is still set for
CourseMarker.
The deadline for handing in all programs is **31 October 2007,
Wednesday, 23:59**.

tantc@comp.nus.edu.sg

Tue Oct 16 08:55:53 SGT 2007