CS1101C Practical Exam

Session 3 (1200 - 1345 hours)

HTML Tag Matching

The name of your C program file must be called tags.c, files with any other name will not be marked.

The deadline for this lab is Wednesday 16 April 2008, 13:45:59 hours. Strictly no submissions will be accepted after the deadline.

Introduction

Non-matching HTML Tags are a common problem for newbies when writing HTML text. Wouldn't it be useful to have an automatic checker to check that each opening HTML tag is matched by the appropriate closing HTML tag?

Your task is to write a simple program that reads in a text file, determines if HTML tag matching occurs correctly, and display errors on the screen, if any. The program terminates upon the first error it finds.

The Task

Every opening HTML tag <B> must be matched by a closing HTML tag </B>. Similarly for <I> and </I>, <U> and </U>. For simplicity, we are interested only in these six HTML tags.

Note that HTML tags may contain of more than one word, for example <A TARGET="_blank">. For simplicity, you may assume that for each HTML tag, the < and > characters are on the same line.

Let's look at an example text file (containing HTML text) called text31.txt:


<HTML>
<HEAD>
<TITLE>CS1101C Lab 1 (Even Week)</TITLE>
</HEAD>

<BODY>
<H1>CS1101C Lab 1 (Even Week)</H1>

<H2>Binary to Decimal Conversion</H2>

The deadline for this lab question is <B>Friday 22 February 2008,
23:59:59 hours</b>.<P>

The name of your C program file must be called
<B><TT>bin2dec.c</TT></B>, files with any other name will not be
marked.<P>

<H3>Preliminary</H3>

Write a simple binary number to decimal number converter. You may find
<A TARGET="_blank">this website</A> useful.<P>

<H3>Sample Runs</H3>

Assuming that the executable is <TT>bin2dec</TT>, sample runs of the
program are shown below. User input is denoted in <U><B>bold</B></U>.<P>

<PRE>
$ <U><B><B><U>gcc -Wall bin2dec.c -o bin2dec</U></B></B></U>

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>0</B></U>
Equivalent decimal number: 0

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>1</B></U>
Equivalent decimal number: 1

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>11</B></U>
Equivalent decimal number: 3

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>01010101</B></U>
Equivalent decimal number: 85

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>1111101000</B></U>
Equivalent decimal number: 1000

$ <U><B>./bin2dec</B></U>

Enter binary number (maximum 10 digits): <U><B>1111111111</B></U>
Equivalent decimal number: 1023

$
</PRE>

<H3>Note and Ponder</H3>

<OL>
<LI>Assume that all user input is valid.<P>

<LI>Use %d instead of %i in your scanf format specifier. (Why?)<P>

<LI>Loops and the modulus operator may prove to be very useful.<P>

<LI>If your program does not work as you expect (logical errors), use
extra <B>printf</B> statements to print out all the values of your
variables to aid in your debugging.<P>

<LI>Most importantly, have lots of fun programming!<P>

</OL><P>

<HR>

This document, <i>index.html</I>, has been accessed 1205 times since
19-Feb-08 17:59:01 SGT.
This is the 10th time it has been accessed today.
<P>
A total of 689 different hosts have accessed this document in the last
52 days; your host, <I>suna0.comp.nus.edu.sg</I>, has accessed it 2
times.
<P>

<HR>

</BODY>
</HTML>

First, we can extract out only the relevant HTML tags, to get:

<B> </B> <B> </B> <U> <B> </B> </U> <U> <B> <B> <U> </U> </B>
</B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B>
</U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U>
<U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B>
</B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <B> </B> <I>
</I> <I> </I>

The relevant opening HTML tags are <B>, <I>, and <U>.

The relevant closing HTML tags are </B>, </I>, and </U>.

Lowercase HTML tags are to be converted to uppercase HTML tags.

Opening HTML Tags

The idea is to maintain a list of opening HTML tags, which initially does not contain anything. We add opening HTML tags to the end of the list as they are read from the file.

Closing HTML Tags

Each time we encounter a closing HTML tag, it must match the correct opening HTML tag. When we read a closing HTML tag, we check the opening HTML tag at the end of the list. If it matches, we remove the opening HTML tag at the end of the list from the list. </B> must match with <B>, </I> must match with <I>, and </U> must match with <U>. If it does not match, print out the error message and exit the program. The program should terminate on the first error it finds; it also prints the line number in which the error occurred.

There are three possible errors:

  1. Closing HTML tag does not match the opening HTML tag at the end of the list.

  2. There is a closing HTML tag but there are no opening HTML tags in the list.

  3. The end of the file has been reached, but there are still opening HTML tags in the list (unmatched opening HTML tags). Display only the opening HTML tag at the end of the list.

Text Files

Several sample text files called text31.txt, text32.txt, text33.txt, and text34.txt have been copied into your directory by the pesetup command.

Sample Output

Assuming that the executable is tags.c, sample runs of the program are shown below. The sample runs are self-explanatory. Follow the sample output precisely. User input is denoted in bold.

We will test your program with other text files.

$ gcc -Wall tags.c -o tags

$ ./tags text31.txt

All tags are matching!

$ ./tags text32.txt 

Error: Line 31: </U> does not match <B>

$ ./tags text33.txt 

Error: Line 29: Cannot start with </U>

$ ./tags text34.txt 

Error: Reached end of file: Unmatched <B>

$

You may assume that for each HTML tag, the < and > characters are on the same line. You may also assume that the < and > characters are used only for HTML tags.

You may assume that there will be no more than 100 opening HTML tags in the list (but there may of course be more than 100 HTML tags in the text file).

You may assume that each line in the text file contains at the most 80 characters (excluding the newline characters).

You may assume that the text file always exists, and is always supplied as an argument on the command line. The name of the text file is obtained from this command line argument.

Important Notes

Do not use any structures or any form of dynamic memory allocation (using malloc or calloc) in your program, else no credit will be given.

Remember to submit your program frequently using the submit tags.c command, and check your submission using the check command.

All the best!

UNIX commands

Some useful UNIX commands (in case you forgot what you did in Lab 0):

  1. dir”: lists all the files in the directory.
  2. cp a.txt b.txt”: copies a.txt to b.txt.
  3. mv a.txt b.txt”: moves / renames a.txt to b.txt.
  4. cat a.txt”: shows the contents of a.txt.