Assignment 16 - Finite Automaton
You can download this homework and set the permissions
with the commands
cd ~/public_html
wget http://www.comp.nus.edu.sg/~gem1501/assignment16.html
chmod 755 *.html
Outline
A finite automaton is an algorithm with a finite memory which
scans through an input string or text. The finite memory is
given by states, one of them being a starting state and several
accepting states. Furthermore, it has an update function which
computes from the old state and the current datum the new state.
For example, the following automaton checks whether in a string
there is an even number of vowels.
State-Names: Even, Odd;
Starting State: Even;
Accepting States: Even;
Rejecting States: Odd;
Alphabet: Lower case letters;
Update Function:
Old State New State at "a","e","i","o","u"; at other letters
Even Odd Even
Odd Even Odd
So New State ("a",Even) is Odd,
New State ("b",Even) is Even,
New State ("c",Odd) is Odd.
Sample Runs: Input seen so far State of Automaton
(1) - Even
w Even
wh Even
who Odd
whos Odd
whose Even
(2) - Even
w Even
wh Even
wha Odd
what Odd
Example (1) accepts and Example (2) rejects the input as
after processing the complete word "whose" the automaton
is in the accepting state Even and after processing the
complete word "what" the automaton is in the rejecting state Odd.
The type of finite automata dealt with here is called deterministic
since the function to compute the new state permits always only
one unique value; otherwise the automaton would be called
nondeterministic. For more information on finite automata, see
Wikipedia and
Panet Math.
The task
The automaton of this exercise does the following:
It scans through a text and checks whether this text has
the following minimum qualities:
- It consists of one or several sentences.
- Each sentence ends with a dot ("."), an exclamation mark ("!") or
a question mark ("?").
- Inside a sentence, a colon (":") or a comma (",") can occur.
- Each sentence starts with a word which has the first character
being an upper case letter.
- Spaces can occur after but not before sentence signs.
- A word can start with any letter followed by arbitrary many
lower case letters.
- After a word there is a space or one of the characters mentiioned
in points 2 and 3.
- Numbers can occur inside the sentence and every number is either
a zero ("0") or one of the digits "1", "2", "3", "4", "5", "6",
"7", "8", "9" followed by arbitrary many further digits.
- There should be no spaces at the beginning of the sentence and no
double spaces at all.
The task is to extend these rules as follows and to adapt the below
syntax checker accordingly.
- Semicolons (";") should be permitted and treated in the same way
as a comma or colon.
- Plus ("+") and minus ("-") signs might precede a number, but not
a zero.
- Hyphens are permitted, but only inside a word, so there should be
at least one letter before and at least one letter after the
hyphen.
- Quotations are marked by single quotes like 'this is a quotation'
and a quotation
starts with a quote-symbol after a space and finishes with
another quote-symbol either before a space or a sentence-sign.
- Inside a word or at the end of a word, there can be an omission
symbol, written as a single quote, as in the sample sentence
"His brothers' cars are green and he's going to his sister's house."
which should not be confused with quotes.
The automaton which does the syntax-check has the following states
which are listed together with a short intuitive explanation of their
meaning:
"e": error - this state is taken when an error is found;
"f": automaton is going to read the first character of sentence;
"c": automaton has seen a comma or similar sign (",", ":");
"d": automaton has seen a dot or similar sign (".", "!", "?");
"n": automaton is processing or has been processing a number;
"z": automaton has seen a single zero;
"s": automaton has seen a space;
"h": automaton has seen a hyphen ("-");
"p": automaton has seen a plus ("+") or a minus ("-");
"q": automaton is between two quotes of a quotation;
"r": automaton has seen the second quote of a quotation;
"w": automaton is processing or has been processing a word;
The function newstate(oldstate,symbol) computes how to update the
state of the automaton when it is processing a symbol. The only
thing which needs to be adapted is this function. Here some explanation
on an example how this function works. Consider the following sample
run:
My number is 007.
wwswwwwwwswwszeee
The sentence is syntactically incorrect.
In this example, the sentencecheck is already doing what it should
as in the specification above, it is said that no number except
0 itself should state with the symbol "0", so James Bond would
have to write his number as "0 0 7" in order to pass the syntax-check.
The sample run is a short representations for the following calls of the
function newstate() which always takes the old value of the state and
computes from it and the symbol above the new value of the state. The
initial value "f" is dropped in above representation, but the
second line gives the states after having processed the corresponding
part of the word above it, so "wwsw" are the states taken while processing
"My n" with "w" being the state after these four symbols have all been
processed. A longer representation of this expression would be the
following one.
"w" = newstate("f","M");
"w" = newstate("w","y");
"s" = newstate("w"," ");
"w" = newstate("s","n");
"w" = newstate("w","u");
"w" = newstate("w","m");
"w" = newstate("w","b");
"w" = newstate("w","e");
"w" = newstate("w","r");
"s" = newstate("w"," ");
"w" = newstate("s","i");
"w" = newstate("w","s");
"s" = newstate("w"," ");
"z" = newstate("s","0");
"e" = newstate("z","0");
"e" = newstate("e","7");
"e" = newstate("e",".");
This input is rejected, that is, classified as false by the
finite automaton. Indeed, it would reject any input where the
machine is not in the state "d" after having processed all
the symbols of the input. The most straightforward implementation
of the function newstate() would be a large table which tells
the function what value to assign for which state and which
input, this table would be finite as there are only finitely
many states and finitely many symbols. Somehow, the table
would be extremely large and difficult to typeset, therefore
it is much more convenient to write a Java Script function instead
as it is done in this exercise.
The function has a variable "rv" which contains the return-value
of the function which is passed back as the new state after it
finishes. The initial value of "rv" is "e", so that the error-state
is taken in the case that the function does not provide an explicit
case for what to do with the old state and symbol in the input.
The switch-command in the function has entries for all states
which need to be addressed, for example in the case of the state "f",
the entry looks like this.
case "f":
if (("A" <= symbol) && ("Z" >= symbol)) { rv = "w"; }
break;
So if the symbol is one of the upper case letters "A", "B", ..., "Z",
the variable "rv" which has the next state takes the value "w"
for indicating that a word is processed. Indeed, as a sentence
should start with a word having an upper case letter at the
beginning, this is exactly what the program should do. Other
options should make the automaton to take the error-state "e"
what indeed happens whenever it is not clearly overwritten.
The command "break" indicates that the case dealing with "f"
is completed. Here another example, namely that one inside a word.
case "w":
if (("a" <= symbol) && ("z" >= symbol)) { rv = "w"; }
if (symbol == " ") { rv = "s"; }
if ((symbol == ".") || (symbol == "?") || (symbol == "!")) { rv = "d"; }
if ((symbol == ",") || (symbol == ":")) { rv = "c"; }
break;
Here the update-rule is more complicated. There are four if-statements,
each dealing with another transfer of the state. If the symbol is
one of the letters "a", "b", ..., "z" then the next state is again "w".
If the symbol is a space (" ") then the next state is "s".
If the symbol is ".", "?", "!" then the next state is "d" which
stands for dot-like sentence-signs. If the symbol is "," or ":"
then the next state is "c" which stands for comma-like sentence-signs.
If none of these cases applies, the return-value will have the default "e"
and the finite automaton will go into the error-state.
In order to do the requested changes, one has to introduce such type
of environments for the states "h" and "p". Note that "p" is entered
from "s" when there is a "+" or "-" after a space. From "p" there
should follow one of the digits "1", "2", "3", "4", "5", "6", "7", "8",
"9" but not the digit "0". So "-325" and "+7711" are permitted but
not "+007", "-0" and "+ 88". The hyphen-state "h" should only be
reached from "w" if there is an hyphen instead of a letter and
after the hyphen, there should again be one of the letters
"a", "b", ..., "z". If not, the automaton should take the
state "e". Note that it depends from the context whether
the symbol "-" is a minus or a hyphen.
Furthermore, at several places, ";" has to be added
into the list of permitted symbols for a state-change.
Also the quote-symbol ("'") should be handled adequately.
Only single quotes occur. If they are after a space, they
indicate the beginning of a quotation and the automaton
should take the state "q". What follows is already there.
If they occur inside a word, they are not the beginning
of a quotation but just an omission symbol as in the
sentence "He's running home fast to meet his sister's boyfriend."
Therefore, quote-symbols inside a word should be treated like
letters. For the ease of programming, it can be assumed that
the first symbol of a sentence is not a quote-symbol and that
the above two cases are all which occur. Quote-symbol denoting
omnissions or genitives can be treated like lower-case letters
and do not need an extra state of the automaton.
Java Script Starts Here.
Java Script Ends Here.