[Lecture Notes by Prof Yuen and possibly others 2000-2001]


...while the following material is mainly of interest to theory inclined CS
students; others are welcome to take a look...

2. Truth, Reality and Provability:

The idea of right and wrong permeates human life, and is studied in
different ways by different professions. Law, Politics, Philosophy and
Science are just some of these. Mathematics concerns itself with a
particular kind of right and wrong: statements that are expressed in
concise notations, and proved or disproved using a set of concisely stated
rules. In contrast, Computer Science is concerned with whether or not a
piece of software (running on particular hardware) will meet its intended
purpose: if it does, the program is right; otherwise it is wrong. 

If the software and its purpose have both been specified in mathematics-
like notations, then it may be possible to prove whether a program is
right or wrong using standard mathematical methods. In other words, by
using very rigorous notations for both programs and program
specifications, we may have a way of proving that the programs meet the
specifications and so must be correct. Mathematicians are convinced that
this is the right, and indeed only, way to develop software, but so far
they have not produced usable systems based on this principle. As they
have not produced software that meets our purpose, they have not met our
definition of "right", however successful they might have been at meeting
their own definition.

But what do we mean by "proving" something to be right or wrong? In
general, we do this by showing that it is consistent or inconsistent with
something else known to be right. However, how do we know that the
"something else" is right in the first place? There has to be some
starting point - certain things whose validity is taken for granted. These
are the axioms of a mathematical system. But axioms are statements about
some thing or things that already exist. Also, rules which we use to show
consistency or inconsistency of something with the axioms must also pre-
exist. 

The question then arises as to which parts of a mathematical system are
axioms, which are rules and which are the entities they apply to, and why
these were chosen for study rather than some others. This is a
philosophical issue, and in general the way one formulates a system
relates to human experience. We know certain things exist around us; we
know certain ways in which they operate; we know that certain statements
can be assumed in general and then be used to produce specific statements
about specific circumstances to help us handle problems. It is useful to
have mathematical systems that concisely denote such knowledge and
simplify the process of deducing specific from general.

For example, Geometry postulates the entities of points and lines, the
concepts of length, angle size and area, the axioms that through two given
points only one straight line can be drawn, only one circle can be drawn
with centre at one given point and passing through another point, etc.,
and the rules of allowing only unmarked straight edge and compass to draw
lines, circles, angles, polygons etc. From this, one can derive
Pythagoras's theorem about areas of squares whose sides are the sides of
a right-angle triangle, and other theorems that can be used to help us lay
out roads, farms, houses, etc.

But there are different kinds of experience for us to relate abstract
systems to. Euclidean Geometry has its Fifth Postulate, that one and only
one parallel line can be drawn through a point, based on common human
experience gained on flat surfaces. Riemann however made a different
postulate, that no parallel line can be drawn, which produces a different
system of Geometry, applicable to spherical surfaces and related spaces,
and this turns out to be relevant to spatial structures on the
cosmological scale: there is a match between the system and human
experience in studying the universe. While this particular experience is
confined to a very small number of human beings, it is no less important.

Matching an abstract system with human experience of the real world is
termed the process of interpretation: correspondences between the
entities, axioms and rules of the abstract system and those of the real
world are found, so that a result derived in the abstract system can be
used to make a prediction about the real world, e.g. 1+1=2, so if we put
one apple (or pear, orange, elephant) into a basket containing another
apple (or pear, etc), then the basket would contain two apples (or ...).
Here we have interpreted addition as "putting together and counting", and
the numerals as "object-quantity representations". The interpretation
works for identical or similar objects (if we put an apple into a basket
containing an orange but no other fruits, we have a total of 2 fruits in
the basket), but breaks down if we put an apple into a basket containing
an elephant! (If you say "the basket contains two objects", what about the
basket itself? Is it an object "contained in the basket"?)

The wider the range of interpretation of a system, the more useful it is.
However, the matching may not be very exact, and may only be usable within
certain limits, such as Euclidean Geometry applying only to small, nearly
flat parts of the earth and not to the global, never mind cosmological,
scale. It may be found that, while the axioms have valid interpretations,
some of the statements deduced from them do not. There can also be
statements that appear to have correct interpretations, but cannot be
derived from the axioms.

Exercise: If I got 10 apples (kisses) from my son, and 10 from my
daughter, I got 20 apples (kisses). If I got 10 apples (kisses) from my
son, and gave 5 to my daughter, do I have 5 apples (kisses) left? Explain
why addition works for both apples and kisses, but subtraction works for
apples, not for kisses.

Now let us show some examples about the difficulty of inventing good
abstract systems. Consider the following system, about entities called
words that have meanings, i.e., they make statements about things,
including words. Let us postulate something that looks very reasonable:

  A word either makes a correct statement about itself, or it does not.

For example, "long" is not a long word and so it does not make a correct
statement about itself; "short" is a short word, so it does make a correct
statement about itself. "Tasty" is not tasty, but "beautiful" may be a
beautiful word to some and not so to others. That alone does not violate
the postulate, which only concerns the possibility of classifying words,
not how a particular person would classify it.

However, it turns out to be not possible to classify all words into the
two groups. The act of classification immediately generates new meanings
requiring new words: we need the words "selfdescribing" and
"nonselfdescribing" to distinguish between words of the two classes. Now
what class does the word "nonselfdescribing" belong to, or, is
"nonselfdescribing" a selfdescribing word or not? If it is, then its
property differs from its name, and so it must be a nonselfdescribing
word. But if we assume "nonselfdescribing" is a nonselfdescribing word,
then it would be describing itself, and must be considered a
selfdescribing word. In either case, we have a contradiction. 

Now it might be thought that we can solve the problem by adding third
class "neither", into which we put "nonselfdescribing" plus any other
words we have problem classifying into the first two classes. But this
does not work because other new words can be made up to defy
classification, for example "notinclass1". We could also make up rules to
forbid such new words, but then the system is incomplete because not all
words and meanings are included in the classes. 
     
Exercise: Is "notinclass1" a selfdescribing word (class1 word)?

Consider another example. Suppose we have somehow managed to exhaustively
find all true statements that could be made, and there are N such
statements. Now immediately we can make the statement: The above N
statements are all true. Is this statement true? If so, it contradicts the
earlier assumption that all true statements have been exhaustively found.
But if the new statement is false, it also contradicts the assumption that
the N statements are true. Hence, we have to conclude that the new
statement contradicts the postulate whether it is true or false, and the
original postulate must be invalid. In fact, it should be clear that we
can keep adding new statements like this and never reach the state of
having found all true statements.
            
Exercise: If the statement "The above N statements are all true" is not
a new statement, but identical to one of the N true statements, then the
above paradox would not arise. Consider this as a possible solution to the
problem.
                              
Suppose we make a new start with the assumption that the number of true
statements is infinite. Immediately we see we can never find and prove all
true statements, because each effort takes finite time. But suppose we
keep trying until infinite time? Would it not be possible at least in
theory to find and prove all true statements? Perhaps we can reach a state
when all future true statements follow from previous ones? Unfortunately,
this, even if possible, is not enough, because the important thing is
being able to distinguish between true and false statements rather than
just being able to list true ones, but this has turned out to be
impossible for a system involving any kind of self-reference, such as the
"words that describe words" example above. Set theory has the problem of
"Does the set of all sets belong to itself?" and similar problems exist
when we talk about "theorems about theorems" (metatheorems) or "machines
processing machines" (metamachines). 

For a while, mathematicians hoped that at least number theory and
arithmetic, if carefully constructed, would not have the self referencing
problem. Such hopes were rudely shattered when the great Czech
Mathematician Godel proved that, in any supposedly complete arithmetic
system, one can always construct, by attaching numbers to theorems, a
theorem about theorems whose proof depends on itself being false, and
proving its falseness depends on it being true! In other words, it is
neither. Hence, a complete and consistent arithmetic system is inherently
impossible. The result has since been extended.        

Exercise: A King asked all prisoners to be brought to him one by one to
speak a sentence. If the sentence is true, the prisoner will be hanged;
if it is false, the prisoner will be beheaded. However, one prisoner said
"I will be beheaded." Is this true or false? Explain the relation of this
paradox with the well known example "This sentence is false." (The main
point of this example is that it is not easy to find rules to prevent
circular arguments leading nowhere. You can make a rule to forbid self-
referencing sentences, but the prisoner's sentence is apparently not self-
referencing. What if he changes the sentence to "I expect to be beheaded"?
True or false?)

Exercise: Epimenides the Cretan says "All Cretans are liars". Discuss.

We do not mean to say that true and false cannot be determined, only that
each system we invent will have limitations on its applicability. While
this might come as a severe disappointment to theoreticians, in the real
world there are already many restrictions on the usefulness of any tool,
and having one more restriction imposed by inherent logic is no great
setback. Transferred to Computer Science, the above examples convert into
problems that cannot be solved algorithmically, a state of affairs that
does not greatly perturb the Computer Scientists, who have seen many
apparently unsolvable problems and would not worry about a few more.    
       

3. Formal Systems:

Now let us look at a particular method for producing rigorous mathematics
systems in which we can distinguish between "right" and "wrong", the
method of formal system. A formal system consists of a set of symbolic
items defined according to some syntactic rules. For convenience, four
types of rules are separately stated:

1. Vocabulary: This is the set of basic symbols permitted in the formal
system. For example, the system of arithmetic permits the numerals 0...9,
the operators +, -, x and /, the equality sign =, etc., while Pascal
(which is however not a good example of formal systems) permits the key
words BEGIN, WHILE, etc., identifiers made up of at most 8 letters A...Z
and numerals but beginning with a letter, the symbols :=, ;, etc.

2. Sentence rules: These are general rules for forming sentences using
symbols of the vocabulary. Sentences formed this way could be TRUE or
FALSE.

3. Axioms: These are sentences formed according to the sentence rules and
immediately recognized as TRUE sentences - they are the starting points
for deriving additional TRUE sentences.

4. Deduction rules: These are rules whereby symbols in a TRUE sentence may
be replaced by other symbols to produce another TRUE sentence. By starting
with an axiom and applying these rules, new TRUE sentences are produced
as items of the formal system. 

There are two main types of deduction rules: productions and re-writings.
The former can only be used to replace whole sentences; the latter can be
applied to either a part or the whole of a sentence. Re-writings may be
context sensitive, i.e., applicable only if the symbol(s) being replaced
appear in a sentence of a particular type. A production may be regarded
as a context sensitive re-writing, applicable only when the symbols being
replaced appear as the whole sentence. Deduction rules have to be
consistent with the sentence rules, i.e., they do not end up producing
symbol combinations that violate the sentence rules. Usually, the sentence
rules are very simple format requirements, just to allow axioms to be
constructed.

A TRUE sentence is also called a theorem, and demonstrating that a
sentence is a TRUE one within a formal system, i.e., it can be produced
from an axiom of the system using its deduction rules, is called theorem
proving. Thus, the proof of a theorem is just a set of TRUE sentences that
starts with an axiom and ends with the theorem being proved, with each
item being produced from the previous item according to one or more of the
deduction rules. To be meaningful, a proof must be finite in length. 

The method of defining a formal system, with variations, is widely used.
We shall look at three examples.

           A. Addition: We have a formal system whose vocabulary consists of
           the symbols P, Q, and -. To present the other rules, we need the
           extra symbols x, y, z, etc., which stand for strings of -, including
           the empty string. (x etc. are part of the metasystem, a system for
           describing other system or systems. They are not part of the formal
           system itself, which only permits P, Q and - in sentences.) The one
           sentence rule says sentences must be of the form xPyQz, and there
           is just one axiom:

             xPQx  is a TRUE sentence for any x

           and one deduction rule:

             if xPyQz is a TRUE sentence, so is xPy-Q-z.

       Since xPQx is TRUE, so is xP-Q-x, xP--Q--x, xP---Q---x, etc. Because
       x is any string of -, we have TRUE sentences P-Q-, -P--Q---, ----P--
       -Q-------, etc. Note that the numbers of - before and after P always
       add up to the number after Q. Hence, the system captures the idea
       of addition.

We can define the system slightly differently, with TRUE sentences
generated according to the following substitution rules: starting with
xPRx, R may be replaced by either Q or -R-. Thus, xPQx is TRUE, and so are
xP-Q-x (after applying each rule once) and xP--Q--x (after substituting -
R- for R twice and Q for R once), etc. Or, the rules are presented as:

       initial:          xPRx

       deduction rule:   R ::= Q ! -R-

where ::= denotes "may be replaced by", and ! stands for alternative
substitutions (i.e., OR). Such notations are used often in defining
computer languages. Note that R is part of the metasystem only, not part
of a sentence.

           B. Multiplication: The vocabulary and sentence rule are as in A.,
           but the axiom is xPQ and the deduction rule is:

        if xPyQz is a TRUE sentence, so is xPy-Qxz.

Exercise: Show by examples that the system represents multiplication.

Exercise: Reformulate the rules using R. (Note: the rule is context
sensitive.)

           C. Polynomials: The vocabulary is integer powers of x, including 0th
           and 1st powers, and real numbers. A TRUE sentence must be a sum of
           terms, each term being the product of a real number and a power of
           x. One way of defining the system is to use the rules:

          (1) A real number a,b,c,... by itself is a TRUE sentence;
          (2) x by itself is a TRUE sentence;
          (3) Given two TRUE sentences P and Q, P + Q, PQ, and (P) are all
              TRUE sentences.

           Since a and x are TRUE sentences, ax is a TRUE sentence, and since
           b and c are TRUE sentences, ax+b, (ax+b)x, (ax+b)x+c, etc., are all
           TRUE sentences. Additional rules for adding and multiplying
           polynomials are needed to make them sums of terms, i.e, rewrite
           (ax+b)x as ax2+bx. 

Exercise: How would you change the rules to define two-variable
polynomials?
           However, it is equally feasible to use the following rules:

          <polynomial> ::= <polynomial>+<term> ! <term>

          <term> ::= <number> x^<integer>

           Applying the left part of the first rule twice we get

          <polynomial>+<term>+<term>

           then applying the right part produces

          <term>+<term>+<term>

          Finally the second rule is applied to produce the polynomial. Note
that we again need additional rules for simplifying polynomials to the
canonical representation with increasing powers of x in the terms.

Exercise: Formulate rules (1)-(3) using the above notation.

Computer languages frequently have formal definitions, at least for a
major part of the syntax. Usually there is only one axiom: the PROGRAM
symbol, and we often see the definition:

        <PROGRAM> ::= <STATEMENT> ! <PROGRAM>;<STATEMENT> 

To explain: a program may consist of one statement only, or a statement
plus another program which itself consists of one statement or... In other
words, a program consists of one or more statements, with ; in between
successive statements. We should see more of this in Section 18.

Executing a program could also be looked at as something like a formal
system: variable names appearing in the program are replaced by values and
combined to produce results, i.e., two numbers joined by an arithmetic
operator are replaced by one number, and function calls are replaced by
the body of the function definition, with the arguments replaced by values
supplied by the caller, e.g., given an expression like  A+B,  we first
replace  A  and  B  by their values (say 3 and 5), and then replace  3+5 
by  8;  given function  f(x,y) = x+y,  if we encounter the function call
...f(a,2)... in a program we would take the body  x+y,  and replace  x 
by  a  and  y  by 2, producing  a+2,  which is then used to replace 
f(a,2).  Unfortunately, in practical situations the correspondences are
not so obvious. Only in some languages, in particular logic and functional
programming languages, is there a close correspondence between program
execution and theorem proving.

14. LAMBDA calculus:

Just as logic programming has its theoretical basis in proposition and
predicate logic, functional programming has its basis in LAMBDA calculus. The
basic construct of LAMBDA calculus is the form  ^x(Exy)  where  x  is one or
more arguments and  Exy  is any expression involving these arguments and other
variables denoted collectively by  y. The form is applied under a set of
simple substitution rules, under which the arguments  x  appearing in  Exy 
may be replaced by constants, other variables, or expressions of variables
(including LAMBDA expressions). Any function can be defined as a LAMBDA
expression and be evaluated using the substitution rules.

To take a few simple examples, consider the LAMBDA expression, with single
argument x and expression x:

   ^x(x)

We apply this to an argument A, which merely replaces x in the expression,
producing A, or ^x(x) A -> A. Thus, ^x(x) is the identity function - applied
to anything it produced the thing itself. Now consider the nested two-argument
LAMBDA expression:

   ^x(^y(x))

Note that we have first a function of x, whose expression is ^y(x), which is
a function of y. Applying this to two arguments A B in succession, we first
substitute A for x in ^y(x), producing ^y(A). We then try to substitute B for
y in the expression, but since it does not contain y, the result is just A.
Hence, ^x(^y(x)) A B -> A. Now compare with a slightly different LAMBDA
expression  ^x(^y(y))  and apply it to two arguments A B, we get

   ^x(^y(y)) A B  ->  ^y(y) B  ->  B

In other words, ^x(^y(x)) and ^x(^y(y)) applied to two arguments would extract
the first and second argument respectively.

It is now possible to devise an IF function which takes three arguments. If
the first argument is TRUE, the function returns the second argument; if the
first argument is FALSE, the third argument is returned. But first we have to
define TRUE and FALSE in LAMBDA expressions. In fact, there are many
alternative ways to define these, but one simple way is to use ^x(^y(x)) for
TRUE and ^x(^y(y)) for FALSE, in which case IF is just the following LAMBDA
expression:

   ^z(z A B)

If we substitute TRUE, or ^x(^y(x)), for z, we get

   ^x(^y(x)) A B  ->  A

but if z is FALSE, or ^x(^y(x)), then we get

   ^x(^y(y)) A B  ->  B

However, to be strictly correct, IF should be the function of three arguments,
and we should define it as:

   ^z(^u(^v(z u v)))

so that if z is TRUE, we get ^u(^v(u)), a function which produces A if u=A,
regardless of what might be substituted for v; and if z is FALSE, then the IF
function becomes ^u(^v(v)), a function which produces B if v=B, regardless of
what is substituted for u. Note that these are identical to the TRUE/FALSE
expressions as well as the "choose one from two" functions - another example
of interpreting formal systems to obtain meaning.
   
Exercise: Work through the substitution process for ^z(^u(^v(z u v))) A B C
for A = TRUE and A = FALSE step by step.

It is clear that the substitution process corresponds closely to a function
call, but it is more fundamental in that, if one so wishes, one could use
LAMBDA expressions to form the definitions of constants like TRUE and FALSE,
primitive operators like IF, as well as arithmetic and Boolean operations,
instead of assuming them to pre-exist. However, this is in practice not
useful, and in the design of computer languages we always presume the
existence of certain hardware and language primitives, so that LAMBDA
expressions are written with them without first having to define them. In
other words, we assume the computer will recognize TRUE, so that we need not
write ^x(^y(x)), and we write (IF A B C), not ^z(^u(^v(z u v))) A B C.

Note that the idea of higher order functions is naturally embodied in LAMBDA
calculus: the system itself starts with a very simple notation and very simple
operations, and uses these to construct other basic operations like IF, on top
of which more complex functions can be built. Though LAMBDA expression
functions have no names, it is possible to produce recursion by making use of
a special combinator function. However, this will not be discussed here.

LAMBDA calculus is a powerful formal system with a simple set of sentence and
deduction rules that can be used to produce expressions embodying
sophisticated ideas of computation, and there are rigorous proofs of its
important properties. This is why it forms a very satisfactory basis for
functionally structured computer languages. The modern functional languages
are basically LAMBDA calculus with a user-friendly appearance, or syntactic
sugaring.

It is also useful to mention that, before the invention of LAMBDA calculus,
functions were formally defined in terms of sets of value pairs (argument,
output). In such a framework, functions cannot be applied to themselves, since
the idea of set of sets leads paradoxical results. Hence, recursion could not
be readily included. In the LAMBDA calculus formulation, this problem does not
rise.


19. Automata theory:

The theory of automata seeks to model the behaviour of any machine that
responds to external stimulus by changing its internal state. The current
state of the machine, and the value of the next input, uniquely determine the
machine's next state. Out of all the possible states of the machine, a subset
is regarded as final states. If a sequence of inputs changes the machine from
its initial state to a final state, then the sequence is accepted as a TRUE
sentence. If the sequence leaves the machine in a non-final state, then it is
not a TRUE sentence.

To define a machine like this, one needs to specify:

    (a) The set of possible values of the input stimulus;

    (b) The set of possible states;

    (c) The subset of final state(s);

    (d) The transition matrix specifying the next state from the current state
    and the input value;

    (e) The initial state.

Once these give items are given, the machine's behaviour can be completely
determined. We are said to have defined an automaton.

For example, consider a machine that will determine whether a received decimal
integer is divisible by 3. We know that such a number, when all its digits are
added together, produces another integer divisible by 3. Hence, if we add up
all the digits modulo 3 as they arrive, and if the final sum is 0, then the
received number is divisible by 3. To handle this with an automaton, it must
have 3 states, one showing that the digits received so far add up to 0 modulo
3, one showing that the remainder is 1, and one more showing the remainder to
be 2. A transition from state 0 to 1, 1 to 2, or 2 to 0 occurs if a digit of
value 1, 4 or 7 is received; transition from 0 to 2, 1 to 0 or 2 to 1 occurs
if a digit of value 2, 5 or 8 arrives. No change of state occurs if 0, 3, 6
or 9 is received. The following shows the transition relations between states:

          __________          1,4,7                __________
         | State  0 |---------------------------->|          |
|------->|          |<----------------------------| State  1 |--------|
 --------| initial  |                2,5,8        |          |<-------
 0,3,6,9 |_& final__|<----------.      .--------->|__________| 0,3,6,9
                |       1,4,7   |      |  2,5,8      |
                |               |      |             |
                |              _|      |_            |
                |   2,5,8     |          |  1,4,7    |
                '------------>| State  2 |<----------'
                     |------->|          |
                      --------|__________|
                      0,3,6,9

Hence, the machine has the transition matrix:

              input
              0 1 2 3 4 5 6 7 8 9
    state  0  0 1 2 0 1 2 0 1 2 0
           1  1 2 0 1 2 0 1 2 0 1
           2  2 0 1 2 0 1 2 0 1 2

An automaton (or more properly, a finite state deterministic automaton) has 
by itself a very simple
behaviour; the intelligence lies in the transition matrix. 

The program will return T or NIL depending on whether the received input
string is a TRUE sentence. Final is a function that checks whether State is
one of the final states, and MATRIX takes the matrix Transition, the current
state and an input value, returning the selected matrix element.

Exercise: Formulate the formal system for addition (i.e., the xPyQz system)
as an automaton. 

Exercise: Suppose we re-formulate the system as follows: x is a string made
up of either - (x+) or + (x-) only (no mixture is allowed), and the starting
axiom is x-PRx+, with the deduction rule  R ::= Q ! -R+.  Show that a simpler
automaton can be formulated to accept sentences for addition.

There is a simple correspondence between the transition matrix of an automaton
and the formal syntax of the language it recognizes. Taking the MOD 3
automaton as example: Suppose the first digit received is a 0 MOD 3 digit,
then the machine expect the remaining digits to be a 0 MOD 3 number; if the
first digit is 1 MOD 3, then a 2 MOD 3 number is needed to add up to a 0 MOD
3 number; if the first digit is 2 MOD 3, a 1 MOD 3 number is needed. A TRUE
sentence may be generated by the deduction rule

         0A ! 3A ! 6A ! 9A ! 2B ! 5B !8B ! 1C ! 4C ! 7C

where A denotes a 0 MOD 3 number, B, 1 MOD 3, and C, 2 MOD 3. Further, 0, 3,
6 or 9 alone is also a TRUE sentence, and may be added to the above rule.

However, A is a number divisible by 3 and so is a TRUE sentence. Hence, the
above deduction rule is actually

      A ::= 0 ! 3 ! 6 ! 9 ! 0A ! 3A ! 6A ! 9A ! 2B ! 5B ! 8B ! 1C ! 4C ! 7C

We can also formulate rules for B and C: B may be 1, 4, or 7, or 1, 4, or 7
followed by a 0 MOD 3 value, or 0, 3, 6 or 9 followed a 1 MOD 3 value, or ...
Thus we have the rule

      B ::= 1 ! 4 ! 7 ! 1A ! 4A ! 7A ! 0B ! 3B ! 6B ! 9B ! 2C ! 5C ! 8C

and   C ::= 2 ! 5 ! 8 ! 2A ! 5A ! 8A ! 1B ! 4B ! 7B ! 0C ! 3C ! 6C ! 9C

Comparing this with the transition matrix for the automaton given earlier, we
see that there is a close correspondence. Indeed, for any element of the
transition matrix with

      <new state z> <- <input x><current state y>,

we have a part of the deduction rule giving

      <symbol y> ::= <input x><symbol z>

meaning that initially the machine was expecting an input string represented
by symbol y, but after receiving input x, its expectation changes to a shorter
string represented by symbol z. However, if state z is final, then no further
input is required and <symbol z> need not be present, and so the rule is 

      <symbol y> ::= <input x>.                           

And, if state z may be both final and non-final, then we have

      <symbol y> ::= <input x> ! <input x><symbol z>.

Note however the above method does not work for the formal system for addition
(PQ system), because the deduction replaces R with -R-, with a required input
symbol on each side, rather than in front of R only. Indeed, there are serious
limitations on the kind of formal languages that can be processed with finite
deterministic automata.

Now we consider another kind of automata, the non-deterministic type. The
transition matrix, instead of specifying one unique new state given the
current state and the input, specifies a number of alternative states. From
each alternative state, the next input puts the machine into more alternative
states, and so on. If, upon receiving the whole input string, any of these
alternative paths lead to a final state, then the input string is accepted as
a TRUE sentence.

The following diagram shows the transition diagram for a non-deterministic
automaton:
          __________                              __________
         |          |          a                 |          |
         | State  0 |--------------------------->| State  1 |--------|
         | initial  |                            |          |<-------
         |__________|                            |__________|   a,b,c
           |    |                                    |
           |    |                                    |a
           |    |b             __________         ___V______
           |    |             |          |  b    |          |
           |    '------------>| State  2 |------>| State  4 |
           |c        |------->|          |       |  final   |
           |          --------|__________|       |__________|
           |          a,b,c                          ^
           |                                         |
      _____V_____  a,b,c                             |
     |           |-------|                           |
     |  State 3  |<------                   c        |
     |           |-----------------------------------'
     |___________|      

automaton that accepts strings made up of a, b & c and ending with a symbol 
that also appears at the start
the string, e.g., aca, cac, accaa, bcab, etc. The initial symbol puts the 
machine into three separate states,
from which only the same symbol would put it into the final state. However, 
it would not be correct to just
go into the final state upon receiving the same symbol again, because this 
may not be the end of the string.
Hence, the machine has the alternative of staying in the same state and go 
to the final state later.

A non-deterministic automaton has a transition matrix whose elements are 
sets of alternative states, rather
than individual states. The matrix for the above machine is: (Here - denotes 
the empty set.)

             input
             a      b      c
   state 0   {1}    {2}    {3}
         1   {1 4}  {1}    {1}
         2   {2}    {2 4}  {2} 
         3   {3}    {3}    {3 4}
         4   -      -      -     

The following is an automaton algorithm:
If the input has ended, then the program checks each alternative state to see
if it is final. If it reaches the end of the State list, then none of the
states are final and the input string was not a TRUE sentence. If the input
has not ended, it follows the successor set of states for each of the
alternative current states and see if any of them leads to the final state.
If it reaches the end of the State list, it knows that none of the
alternatives lead to the final state.

An fundamental theorem of automata theory is that any non-deterministic
automaton has an equivalent deterministic automaton. This is achieved simply
by defining sets of states of the former as states of the latter. To be more
specific: if a machine can reach a set of states that includes a final state,
then because it explores all alternatives, it will surely reach that final
state. Hence, a set of states that includes a final state is as good as a
final state, and instead of studying which individual states a machine comes
from and goes to after receiving an input, we study all the alternative states
it could have been in, and all the alternative states it could go to. In other
words, we will specify a transition matrix from sets of states to sets of
states.

Taking the above matrix, we extend it by adding all the combinations that may
arise. Note that the empty set combined with another set just reproduces that
set.

                   input
        state      a       b       c
 0        {0}      {1}     {2}     {3}
 1        {1}      {1 4}   {1}     {1}
 2        {2}      {2}     {2 4}   {2} 
 3        {3}      {3}     {3}     {3 4}
 4        {4}      -       -       -     
 5        {1 4}    {1 4}   {1}     {1}
 6        {2 4}    {2}     {2 4}   {2}
 7        {3 4}    {3}     {3}     {3 4}

Note that all the sets that include the final state are marked in bold. To
define a deterministic automaton, we only have to assign a number to each set
of states and produce a new transition matrix:

                   input
                   a       b       c
 0  state 0        1       2       3 
 1        1        5       1       1
 2        2        2       6       2  
 3        3        3       3       7   
 4        4        -       -       -    <--
 5        5        5       1       1    
 6        6        2       6       2    
 7        7        3       3       7

Interestingly, state 4 cannot be reached from any state: it is only included
in the other final states, and we really just need a 7 by 4 transition matrix.

Exercise: Verify that the machine will accept strings like aa, aaa, aba, aaba,
etc., but reject bba, baa, etc.

Exercise: A more complex non-deterministic machine which will accept any
string whose last symbol has appeared somewhere before, not necessarily as the
first symbol, may be obtained from the above machine by changing the first
line of the transition matrix from  {1} {2} {3}  to {0 1} {0 2} {0 3}, i.e.,
from the initial state 0 the machine has the alternative of staying in the
state. Find the equivalent deterministic machine. 

Now consider the language syntax processed by this automaton. We are looking
for sentences starting and ending with a, or b, or c. Let us denote a string
ending with a by A. Clearly, A may be just a, or it may be any symbol followed
by another A. Similarly defining B and C, we have

     <sentence> ::= aA ! bB ! cC

     A ::= a ! aA ! bA ! cA

     B ::= b ! aB ! bB ! cB

     C ::= c ! aC ! bC ! cC

If we try to define an automaton from the above grammar, the result is the
five state non-deterministic machine, with the states Initial, A, B, C, and
Final. The machine is non-deterministic because when it is in state A,
expecting a string A, and receives a, it is not sure whether to go into Final
and expect end of string, or to expect another A. Hence, both possibilities
need to be explored. Similar uncertainty arises in states B and C.

To find the deterministic machine, let us invent a new symbol A' which may be
either end of string _ or an A. We then have
   
     A ::= aA' ! bA ! cA

     B ::= aB ! bB' ! cB

     C ::= aC ! bC ! cC'

and the additional rules for A', B' and C':

     A' ::= _ ! A

etc. Unfortunately, a rule like this cannot be used in defining an automaton,
as a machine must change state because of an input. So we use the rule for A
to change the above to

     A' ::= _ ! aA' ! bA ! cA

     B' ::= _ ! aB ! bB' ! cB

     C' ::= _ ! aC ! bC ! cC'

We now have a deterministic machine with 8 states (Initial, A, B, C, A', B',
C', Final), same as the one discovered earlier by manipulating the transition
matrix. All this must look rather artificial, and indeed it is because we
already know what we are looking for. 

Exercise: Work out the transition matrix from the above deduction rules and
show it to be the same.

The model provided by theory of automata appears to be related to computers
in two ways: machines with internal states like computer memory, and with
capability to process formal languages. The model is general and has few
prerequisites and assumptions, and properties and limitations of the model are
generally applicable to all machines based on the model. Hence, automata
theory was much studied in theoretical computer science in the hope of
discovering general laws governing computers. 

However, our examples should make it clear that the model is not easily used
to study any complex computations, for a very basic reason: the machine's
present state, one single number, is meant to capture all its knowledge, and
the state alone decides how the machine would respond to any input. In
particular, the machine does not "remember" how it reached a state and what
the past input string was. If such memory about past history is important,
then you have to provide different states that are arrived at in different
ways. 

By inventing enough states, we can have any content of memory and any past
history represented by a different state, but this is obviously not a
convenient or efficient way of storing knowledge, and results derived from
automata theory must be re-cast into other forms before they are able to
describe the behaviour of machines that represent knowledge and execution
history in other ways. This recasting process is itself very complicated. For
this and other reasons, automata theory today is a peripheral part of computer
science, and is mainly of interest to mathematicians.

The Turing machine, also studied mainly by mathematicians these days, is a
more powerful model because of its ability to change the input string and move
back to a previously read/changed symbol, so that the input string also
functions as its internal storage. Further, the Turing machine has greater
theoretical importance because of its equivalence to other models of
computation. For example, the inherent logical limitation on its capability
turns out to correspond to similar limitations in predicate logic and LAMBDA
calculus.