The deadline for this lab is Wednesday 16 April 2008, 13:45:59 hours. Strictly no submissions will be accepted after the deadline.
Your task is to write a simple program that reads in a text file, determines if HTML tag matching occurs correctly, and display errors on the screen, if any. The program terminates upon the first error it finds.
Note that HTML tags may contain of more than one word, for example <A TARGET="_blank">. For simplicity, you may assume that for each HTML tag, the < and > characters are on the same line.
Let's look at an example text file (containing HTML text) called text31.txt:
<HTML> <HEAD> <TITLE>CS1101C Lab 1 (Even Week)</TITLE> </HEAD> <BODY> <H1>CS1101C Lab 1 (Even Week)</H1> <H2>Binary to Decimal Conversion</H2> The deadline for this lab question is <B>Friday 22 February 2008, 23:59:59 hours</b>.<P> The name of your C program file must be called <B><TT>bin2dec.c</TT></B>, files with any other name will not be marked.<P> <H3>Preliminary</H3> Write a simple binary number to decimal number converter. You may find <A TARGET="_blank">this website</A> useful.<P> <H3>Sample Runs</H3> Assuming that the executable is <TT>bin2dec</TT>, sample runs of the program are shown below. User input is denoted in <U><B>bold</B></U>.<P> <PRE> $ <U><B><B><U>gcc -Wall bin2dec.c -o bin2dec</U></B></B></U> $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>0</B></U> Equivalent decimal number: 0 $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>1</B></U> Equivalent decimal number: 1 $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>11</B></U> Equivalent decimal number: 3 $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>01010101</B></U> Equivalent decimal number: 85 $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>1111101000</B></U> Equivalent decimal number: 1000 $ <U><B>./bin2dec</B></U> Enter binary number (maximum 10 digits): <U><B>1111111111</B></U> Equivalent decimal number: 1023 $ </PRE> <H3>Note and Ponder</H3> <OL> <LI>Assume that all user input is valid.<P> <LI>Use %d instead of %i in your scanf format specifier. (Why?)<P> <LI>Loops and the modulus operator may prove to be very useful.<P> <LI>If your program does not work as you expect (logical errors), use extra <B>printf</B> statements to print out all the values of your variables to aid in your debugging.<P> <LI>Most importantly, have lots of fun programming!<P> </OL><P> <HR> This document, <i>index.html</I>, has been accessed 1205 times since 19-Feb-08 17:59:01 SGT. This is the 10th time it has been accessed today. <P> A total of 689 different hosts have accessed this document in the last 52 days; your host, <I>suna0.comp.nus.edu.sg</I>, has accessed it 2 times. <P> <HR> </BODY> </HTML>
First, we can extract out only the relevant HTML tags, to get:
<B> </B> <B> </B> <U> <B> </B> </U> <U> <B> <B> <U> </U> </B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <U> <B> </B> </U> <B> </B> <I> </I> <I> </I>
The relevant opening HTML tags are <B>, <I>, and <U>.
The relevant closing HTML tags are </B>, </I>, and </U>.
Lowercase HTML tags are to be converted to uppercase HTML tags.
There are three possible errors:
We will test your program with other text files.
$ gcc -Wall tags.c -o tags $ ./tags text31.txt All tags are matching! $ ./tags text32.txt Error: Line 31: </U> does not match <B> $ ./tags text33.txt Error: Line 29: Cannot start with </U> $ ./tags text34.txt Error: Reached end of file: Unmatched <B> $
You may assume that for each HTML tag, the < and > characters are on the same line. You may also assume that the < and > characters are used only for HTML tags.
You may assume that there will be no more than 100 opening HTML tags in the list (but there may of course be more than 100 HTML tags in the text file).
You may assume that each line in the text file contains at the most 80 characters (excluding the newline characters).
You may assume that the text file always exists, and is always supplied as an argument on the command line. The name of the text file is obtained from this command line argument.
Remember to submit your program frequently using the submit tags.c command, and check your submission using the check command.
All the best!