CS1101C Practical Exam

Session 1 (0800 - 0945 hours)

Word Frequency

The name of your C program file must be called freq.c, files with any other name will not be marked.

Some dictionaries have compiled a list of the top 100 most frequently occurring words in the English language. Being the curious sort, you would like to see how often these 100 words appear in a normal document.

You are given the text file called common.txt which has been placed in your directory by the pesetup program. It contains a list of the top 100 most frequently occuring words, sorted in alphabetical order. The words are listed with each one on a separate line. The first ten words in the text file are:

a
about
all
an
and
are
as
at
be
been

... and so on.

Your program must read a document text file called document.txt and record the frequency of each of the 100 words found in common.txt. You can assume that each line in document.txt will never exceed 100 characters, and that none of the words in common.txt and document.txt will exceed 30 characters. Assume both common.txt and document.txt always exist. The file document.txt may contain multiple lines. Suppose document.txt contains:

Duplicate bridge is a game for people who are seeking intelligent
diversions in a social setting. It is both fast-paced and mentally
challenging. While each hand takes only 5 or 10 minutes to play, each
hand presents a new mystery to be solved. If you have never played
bridge before, we can help you get started.

Then the frequency of the first ten words in common.txt is:

a 3
about 0
all 0
an 0
and 1
are 1
as 0
at 0
be 1
been 0

This is because the word a occurs three times, the word and occurs one time, the word are occurs one time, and the word be occurs one time, while the rest of the words (about, all, an, as, at, been) do not occur in document.txt.

Your program must read in common.txt, then read in document.txt (ignoring the case of the words), count the frequency of the 100 words, and write the output into the text file freqcount.txt. Assume that the list of delimiters is given by ",.?! :;()-" (do not forget the space). Overwrite the output file if it already exists. The first ten lines of the output file freqcount.txt (based on the example document.txt given) is shown as follows:

a 3
about 0
all 0
an 0
and 1
are 1
as 0
at 0
be 1
been 0

Sample common.txt and document.txt files have been placed in your directory by the pesetup program.

We will test your program with other common.txt files and other document.txt files.

All the best!

Some useful UNIX commands (in case you forgot what you did in Lab 0):

  1. "ls -l": lists all the files in the directory.
  2. "cp a.txt b.txt": copies a.txt to b.txt.
  3. "mv a.txt b.txt": moves / renames a.txt to b.txt.