CS1101C Practical Exam

Question 1 (xxxx - xxxx hours)

Hash Tables

The name of your C program file must be called hashtable.c, files with any other name will not be marked.

The deadline for this lab is Wednesday 12 November 2008, xx:xx:xx hours. Strictly no submissions will be accepted after the deadline.

Background

A hash table is a data structure that is used to quickly access a piece of data by using a “key”. Essentially a hash table consists of N bins labeled 0 to N-1. Given a key k, a hashing-function produces a value in the range of 0 to N-1, thus mapping the key k to one of the N bins. Data relating to k can then be placed in the bin. This concept is shown in the diagram below.

Initially, the hash table's bins are all empty. Each bin can contain one object (or one word in this case). If a word maps to a bin that is empty, we can simply place the word in that bin. If a word maps to a bin that already contains an existing word, there is a collision.

Given the infinite number of keys possible and the finite number of bins, it is very likely that more than one key will map to the same bin. This is called a “collision”, and we want to collect certain statistics about collisions.

TASK

Given a word k consisting of characters ci in ASCII (take each character's ASCII value), your hashing function f is defined as:

Here N is the number of bins, and modulo is the modulus operator. In your program, let N=50.

Assume that the text file is always called “test.txt” and that this file always exists. (It has been copied into your directory by the “pesetup” command.)

In this question you will write a program that will read, word-by-word, from a text file called “test.txt”. Simulate the insertion of each word in the order read from the file into the hash table. Once the entire file is read, print out the number of collisions, average number of collisions per bin, number of bins with collisions, and percentage of bins with collisions. Your program will produce an output similar to the following:

Number of collision for hash table = 37 collisions
Average number of collisions per bin = 0.74
Number of bins with collisions = 18
Percentage of bins with collisions = 36.00%

You may assume that the text file contains only letters, spaces, and newlines. There are no punctuation nor digits in the text file.

You may use this skeleton if you wish.

#include <stdio.h>

#define TESTFILE "test.txt"

#define N 50 /* Number of bins */

/* Hashing prototype */

/* First parameter is the key, second parameter is the # of bins.

Returns a bin number 0..N-1 */

int hash(char[], int);

int main()

{

return 0;

}

int hash(char str[], int n)

{

/* Hashing function. Returns a value between 0 and n-1 depending on the contents of str */

}

Important Notes

Do not use any structures or any form of dynamic memory allocation (using malloc or calloc) in your program, else no credit will be given.

Remember to submit your program frequently using the submit hashtable.c command, and check your submission using the check command.

All the best!

UNIX commands

Some useful UNIX commands (in case you forgot what you did in Lab 0):

  1. dir”: lists all the files in the directory.
  2. cp a.txt b.txt”: copies a.txt to b.txt.
  3. mv a.txt b.txt”: moves / renames a.txt to b.txt.
  4. cat a.txt”: shows the contents of a.txt.