[Lecture Notes by Prof Yuen and possibly others 2000-2001] ...while the following material is mainly of interest to theory inclined CS students; others are welcome to take a look... 2. Truth, Reality and Provability: The idea of right and wrong permeates human life, and is studied in different ways by different professions. Law, Politics, Philosophy and Science are just some of these. Mathematics concerns itself with a particular kind of right and wrong: statements that are expressed in concise notations, and proved or disproved using a set of concisely stated rules. In contrast, Computer Science is concerned with whether or not a piece of software (running on particular hardware) will meet its intended purpose: if it does, the program is right; otherwise it is wrong. If the software and its purpose have both been specified in mathematics- like notations, then it may be possible to prove whether a program is right or wrong using standard mathematical methods. In other words, by using very rigorous notations for both programs and program specifications, we may have a way of proving that the programs meet the specifications and so must be correct. Mathematicians are convinced that this is the right, and indeed only, way to develop software, but so far they have not produced usable systems based on this principle. As they have not produced software that meets our purpose, they have not met our definition of "right", however successful they might have been at meeting their own definition. But what do we mean by "proving" something to be right or wrong? In general, we do this by showing that it is consistent or inconsistent with something else known to be right. However, how do we know that the "something else" is right in the first place? There has to be some starting point - certain things whose validity is taken for granted. These are the axioms of a mathematical system. But axioms are statements about some thing or things that already exist. Also, rules which we use to show consistency or inconsistency of something with the axioms must also pre- exist. The question then arises as to which parts of a mathematical system are axioms, which are rules and which are the entities they apply to, and why these were chosen for study rather than some others. This is a philosophical issue, and in general the way one formulates a system relates to human experience. We know certain things exist around us; we know certain ways in which they operate; we know that certain statements can be assumed in general and then be used to produce specific statements about specific circumstances to help us handle problems. It is useful to have mathematical systems that concisely denote such knowledge and simplify the process of deducing specific from general. For example, Geometry postulates the entities of points and lines, the concepts of length, angle size and area, the axioms that through two given points only one straight line can be drawn, only one circle can be drawn with centre at one given point and passing through another point, etc., and the rules of allowing only unmarked straight edge and compass to draw lines, circles, angles, polygons etc. From this, one can derive Pythagoras's theorem about areas of squares whose sides are the sides of a right-angle triangle, and other theorems that can be used to help us lay out roads, farms, houses, etc. But there are different kinds of experience for us to relate abstract systems to. Euclidean Geometry has its Fifth Postulate, that one and only one parallel line can be drawn through a point, based on common human experience gained on flat surfaces. Riemann however made a different postulate, that no parallel line can be drawn, which produces a different system of Geometry, applicable to spherical surfaces and related spaces, and this turns out to be relevant to spatial structures on the cosmological scale: there is a match between the system and human experience in studying the universe. While this particular experience is confined to a very small number of human beings, it is no less important. Matching an abstract system with human experience of the real world is termed the process of interpretation: correspondences between the entities, axioms and rules of the abstract system and those of the real world are found, so that a result derived in the abstract system can be used to make a prediction about the real world, e.g. 1+1=2, so if we put one apple (or pear, orange, elephant) into a basket containing another apple (or pear, etc), then the basket would contain two apples (or ...). Here we have interpreted addition as "putting together and counting", and the numerals as "object-quantity representations". The interpretation works for identical or similar objects (if we put an apple into a basket containing an orange but no other fruits, we have a total of 2 fruits in the basket), but breaks down if we put an apple into a basket containing an elephant! (If you say "the basket contains two objects", what about the basket itself? Is it an object "contained in the basket"?) The wider the range of interpretation of a system, the more useful it is. However, the matching may not be very exact, and may only be usable within certain limits, such as Euclidean Geometry applying only to small, nearly flat parts of the earth and not to the global, never mind cosmological, scale. It may be found that, while the axioms have valid interpretations, some of the statements deduced from them do not. There can also be statements that appear to have correct interpretations, but cannot be derived from the axioms. Exercise: If I got 10 apples (kisses) from my son, and 10 from my daughter, I got 20 apples (kisses). If I got 10 apples (kisses) from my son, and gave 5 to my daughter, do I have 5 apples (kisses) left? Explain why addition works for both apples and kisses, but subtraction works for apples, not for kisses. Now let us show some examples about the difficulty of inventing good abstract systems. Consider the following system, about entities called words that have meanings, i.e., they make statements about things, including words. Let us postulate something that looks very reasonable: A word either makes a correct statement about itself, or it does not. For example, "long" is not a long word and so it does not make a correct statement about itself; "short" is a short word, so it does make a correct statement about itself. "Tasty" is not tasty, but "beautiful" may be a beautiful word to some and not so to others. That alone does not violate the postulate, which only concerns the possibility of classifying words, not how a particular person would classify it. However, it turns out to be not possible to classify all words into the two groups. The act of classification immediately generates new meanings requiring new words: we need the words "selfdescribing" and "nonselfdescribing" to distinguish between words of the two classes. Now what class does the word "nonselfdescribing" belong to, or, is "nonselfdescribing" a selfdescribing word or not? If it is, then its property differs from its name, and so it must be a nonselfdescribing word. But if we assume "nonselfdescribing" is a nonselfdescribing word, then it would be describing itself, and must be considered a selfdescribing word. In either case, we have a contradiction. Now it might be thought that we can solve the problem by adding third class "neither", into which we put "nonselfdescribing" plus any other words we have problem classifying into the first two classes. But this does not work because other new words can be made up to defy classification, for example "notinclass1". We could also make up rules to forbid such new words, but then the system is incomplete because not all words and meanings are included in the classes. Exercise: Is "notinclass1" a selfdescribing word (class1 word)? Consider another example. Suppose we have somehow managed to exhaustively find all true statements that could be made, and there are N such statements. Now immediately we can make the statement: The above N statements are all true. Is this statement true? If so, it contradicts the earlier assumption that all true statements have been exhaustively found. But if the new statement is false, it also contradicts the assumption that the N statements are true. Hence, we have to conclude that the new statement contradicts the postulate whether it is true or false, and the original postulate must be invalid. In fact, it should be clear that we can keep adding new statements like this and never reach the state of having found all true statements. Exercise: If the statement "The above N statements are all true" is not a new statement, but identical to one of the N true statements, then the above paradox would not arise. Consider this as a possible solution to the problem. Suppose we make a new start with the assumption that the number of true statements is infinite. Immediately we see we can never find and prove all true statements, because each effort takes finite time. But suppose we keep trying until infinite time? Would it not be possible at least in theory to find and prove all true statements? Perhaps we can reach a state when all future true statements follow from previous ones? Unfortunately, this, even if possible, is not enough, because the important thing is being able to distinguish between true and false statements rather than just being able to list true ones, but this has turned out to be impossible for a system involving any kind of self-reference, such as the "words that describe words" example above. Set theory has the problem of "Does the set of all sets belong to itself?" and similar problems exist when we talk about "theorems about theorems" (metatheorems) or "machines processing machines" (metamachines). For a while, mathematicians hoped that at least number theory and arithmetic, if carefully constructed, would not have the self referencing problem. Such hopes were rudely shattered when the great Czech Mathematician Godel proved that, in any supposedly complete arithmetic system, one can always construct, by attaching numbers to theorems, a theorem about theorems whose proof depends on itself being false, and proving its falseness depends on it being true! In other words, it is neither. Hence, a complete and consistent arithmetic system is inherently impossible. The result has since been extended. Exercise: A King asked all prisoners to be brought to him one by one to speak a sentence. If the sentence is true, the prisoner will be hanged; if it is false, the prisoner will be beheaded. However, one prisoner said "I will be beheaded." Is this true or false? Explain the relation of this paradox with the well known example "This sentence is false." (The main point of this example is that it is not easy to find rules to prevent circular arguments leading nowhere. You can make a rule to forbid self- referencing sentences, but the prisoner's sentence is apparently not self- referencing. What if he changes the sentence to "I expect to be beheaded"? True or false?) Exercise: Epimenides the Cretan says "All Cretans are liars". Discuss. We do not mean to say that true and false cannot be determined, only that each system we invent will have limitations on its applicability. While this might come as a severe disappointment to theoreticians, in the real world there are already many restrictions on the usefulness of any tool, and having one more restriction imposed by inherent logic is no great setback. Transferred to Computer Science, the above examples convert into problems that cannot be solved algorithmically, a state of affairs that does not greatly perturb the Computer Scientists, who have seen many apparently unsolvable problems and would not worry about a few more. 3. Formal Systems: Now let us look at a particular method for producing rigorous mathematics systems in which we can distinguish between "right" and "wrong", the method of formal system. A formal system consists of a set of symbolic items defined according to some syntactic rules. For convenience, four types of rules are separately stated: 1. Vocabulary: This is the set of basic symbols permitted in the formal system. For example, the system of arithmetic permits the numerals 0...9, the operators +, -, x and /, the equality sign =, etc., while Pascal (which is however not a good example of formal systems) permits the key words BEGIN, WHILE, etc., identifiers made up of at most 8 letters A...Z and numerals but beginning with a letter, the symbols :=, ;, etc. 2. Sentence rules: These are general rules for forming sentences using symbols of the vocabulary. Sentences formed this way could be TRUE or FALSE. 3. Axioms: These are sentences formed according to the sentence rules and immediately recognized as TRUE sentences - they are the starting points for deriving additional TRUE sentences. 4. Deduction rules: These are rules whereby symbols in a TRUE sentence may be replaced by other symbols to produce another TRUE sentence. By starting with an axiom and applying these rules, new TRUE sentences are produced as items of the formal system. There are two main types of deduction rules: productions and re-writings. The former can only be used to replace whole sentences; the latter can be applied to either a part or the whole of a sentence. Re-writings may be context sensitive, i.e., applicable only if the symbol(s) being replaced appear in a sentence of a particular type. A production may be regarded as a context sensitive re-writing, applicable only when the symbols being replaced appear as the whole sentence. Deduction rules have to be consistent with the sentence rules, i.e., they do not end up producing symbol combinations that violate the sentence rules. Usually, the sentence rules are very simple format requirements, just to allow axioms to be constructed. A TRUE sentence is also called a theorem, and demonstrating that a sentence is a TRUE one within a formal system, i.e., it can be produced from an axiom of the system using its deduction rules, is called theorem proving. Thus, the proof of a theorem is just a set of TRUE sentences that starts with an axiom and ends with the theorem being proved, with each item being produced from the previous item according to one or more of the deduction rules. To be meaningful, a proof must be finite in length. The method of defining a formal system, with variations, is widely used. We shall look at three examples. A. Addition: We have a formal system whose vocabulary consists of the symbols P, Q, and -. To present the other rules, we need the extra symbols x, y, z, etc., which stand for strings of -, including the empty string. (x etc. are part of the metasystem, a system for describing other system or systems. They are not part of the formal system itself, which only permits P, Q and - in sentences.) The one sentence rule says sentences must be of the form xPyQz, and there is just one axiom: xPQx is a TRUE sentence for any x and one deduction rule: if xPyQz is a TRUE sentence, so is xPy-Q-z. Since xPQx is TRUE, so is xP-Q-x, xP--Q--x, xP---Q---x, etc. Because x is any string of -, we have TRUE sentences P-Q-, -P--Q---, ----P-- -Q-------, etc. Note that the numbers of - before and after P always add up to the number after Q. Hence, the system captures the idea of addition. We can define the system slightly differently, with TRUE sentences generated according to the following substitution rules: starting with xPRx, R may be replaced by either Q or -R-. Thus, xPQx is TRUE, and so are xP-Q-x (after applying each rule once) and xP--Q--x (after substituting - R- for R twice and Q for R once), etc. Or, the rules are presented as: initial: xPRx deduction rule: R ::= Q ! -R- where ::= denotes "may be replaced by", and ! stands for alternative substitutions (i.e., OR). Such notations are used often in defining computer languages. Note that R is part of the metasystem only, not part of a sentence. B. Multiplication: The vocabulary and sentence rule are as in A., but the axiom is xPQ and the deduction rule is: if xPyQz is a TRUE sentence, so is xPy-Qxz. Exercise: Show by examples that the system represents multiplication. Exercise: Reformulate the rules using R. (Note: the rule is context sensitive.) C. Polynomials: The vocabulary is integer powers of x, including 0th and 1st powers, and real numbers. A TRUE sentence must be a sum of terms, each term being the product of a real number and a power of x. One way of defining the system is to use the rules: (1) A real number a,b,c,... by itself is a TRUE sentence; (2) x by itself is a TRUE sentence; (3) Given two TRUE sentences P and Q, P + Q, PQ, and (P) are all TRUE sentences. Since a and x are TRUE sentences, ax is a TRUE sentence, and since b and c are TRUE sentences, ax+b, (ax+b)x, (ax+b)x+c, etc., are all TRUE sentences. Additional rules for adding and multiplying polynomials are needed to make them sums of terms, i.e, rewrite (ax+b)x as ax2+bx. Exercise: How would you change the rules to define two-variable polynomials? However, it is equally feasible to use the following rules: ::= + ! ::= x^ Applying the left part of the first rule twice we get ++ then applying the right part produces ++ Finally the second rule is applied to produce the polynomial. Note that we again need additional rules for simplifying polynomials to the canonical representation with increasing powers of x in the terms. Exercise: Formulate rules (1)-(3) using the above notation. Computer languages frequently have formal definitions, at least for a major part of the syntax. Usually there is only one axiom: the PROGRAM symbol, and we often see the definition: ::= ! ; To explain: a program may consist of one statement only, or a statement plus another program which itself consists of one statement or... In other words, a program consists of one or more statements, with ; in between successive statements. We should see more of this in Section 18. Executing a program could also be looked at as something like a formal system: variable names appearing in the program are replaced by values and combined to produce results, i.e., two numbers joined by an arithmetic operator are replaced by one number, and function calls are replaced by the body of the function definition, with the arguments replaced by values supplied by the caller, e.g., given an expression like A+B, we first replace A and B by their values (say 3 and 5), and then replace 3+5 by 8; given function f(x,y) = x+y, if we encounter the function call ...f(a,2)... in a program we would take the body x+y, and replace x by a and y by 2, producing a+2, which is then used to replace f(a,2). Unfortunately, in practical situations the correspondences are not so obvious. Only in some languages, in particular logic and functional programming languages, is there a close correspondence between program execution and theorem proving. 14. LAMBDA calculus: Just as logic programming has its theoretical basis in proposition and predicate logic, functional programming has its basis in LAMBDA calculus. The basic construct of LAMBDA calculus is the form ^x(Exy) where x is one or more arguments and Exy is any expression involving these arguments and other variables denoted collectively by y. The form is applied under a set of simple substitution rules, under which the arguments x appearing in Exy may be replaced by constants, other variables, or expressions of variables (including LAMBDA expressions). Any function can be defined as a LAMBDA expression and be evaluated using the substitution rules. To take a few simple examples, consider the LAMBDA expression, with single argument x and expression x: ^x(x) We apply this to an argument A, which merely replaces x in the expression, producing A, or ^x(x) A -> A. Thus, ^x(x) is the identity function - applied to anything it produced the thing itself. Now consider the nested two-argument LAMBDA expression: ^x(^y(x)) Note that we have first a function of x, whose expression is ^y(x), which is a function of y. Applying this to two arguments A B in succession, we first substitute A for x in ^y(x), producing ^y(A). We then try to substitute B for y in the expression, but since it does not contain y, the result is just A. Hence, ^x(^y(x)) A B -> A. Now compare with a slightly different LAMBDA expression ^x(^y(y)) and apply it to two arguments A B, we get ^x(^y(y)) A B -> ^y(y) B -> B In other words, ^x(^y(x)) and ^x(^y(y)) applied to two arguments would extract the first and second argument respectively. It is now possible to devise an IF function which takes three arguments. If the first argument is TRUE, the function returns the second argument; if the first argument is FALSE, the third argument is returned. But first we have to define TRUE and FALSE in LAMBDA expressions. In fact, there are many alternative ways to define these, but one simple way is to use ^x(^y(x)) for TRUE and ^x(^y(y)) for FALSE, in which case IF is just the following LAMBDA expression: ^z(z A B) If we substitute TRUE, or ^x(^y(x)), for z, we get ^x(^y(x)) A B -> A but if z is FALSE, or ^x(^y(x)), then we get ^x(^y(y)) A B -> B However, to be strictly correct, IF should be the function of three arguments, and we should define it as: ^z(^u(^v(z u v))) so that if z is TRUE, we get ^u(^v(u)), a function which produces A if u=A, regardless of what might be substituted for v; and if z is FALSE, then the IF function becomes ^u(^v(v)), a function which produces B if v=B, regardless of what is substituted for u. Note that these are identical to the TRUE/FALSE expressions as well as the "choose one from two" functions - another example of interpreting formal systems to obtain meaning. Exercise: Work through the substitution process for ^z(^u(^v(z u v))) A B C for A = TRUE and A = FALSE step by step. It is clear that the substitution process corresponds closely to a function call, but it is more fundamental in that, if one so wishes, one could use LAMBDA expressions to form the definitions of constants like TRUE and FALSE, primitive operators like IF, as well as arithmetic and Boolean operations, instead of assuming them to pre-exist. However, this is in practice not useful, and in the design of computer languages we always presume the existence of certain hardware and language primitives, so that LAMBDA expressions are written with them without first having to define them. In other words, we assume the computer will recognize TRUE, so that we need not write ^x(^y(x)), and we write (IF A B C), not ^z(^u(^v(z u v))) A B C. Note that the idea of higher order functions is naturally embodied in LAMBDA calculus: the system itself starts with a very simple notation and very simple operations, and uses these to construct other basic operations like IF, on top of which more complex functions can be built. Though LAMBDA expression functions have no names, it is possible to produce recursion by making use of a special combinator function. However, this will not be discussed here. LAMBDA calculus is a powerful formal system with a simple set of sentence and deduction rules that can be used to produce expressions embodying sophisticated ideas of computation, and there are rigorous proofs of its important properties. This is why it forms a very satisfactory basis for functionally structured computer languages. The modern functional languages are basically LAMBDA calculus with a user-friendly appearance, or syntactic sugaring. It is also useful to mention that, before the invention of LAMBDA calculus, functions were formally defined in terms of sets of value pairs (argument, output). In such a framework, functions cannot be applied to themselves, since the idea of set of sets leads paradoxical results. Hence, recursion could not be readily included. In the LAMBDA calculus formulation, this problem does not rise. 19. Automata theory: The theory of automata seeks to model the behaviour of any machine that responds to external stimulus by changing its internal state. The current state of the machine, and the value of the next input, uniquely determine the machine's next state. Out of all the possible states of the machine, a subset is regarded as final states. If a sequence of inputs changes the machine from its initial state to a final state, then the sequence is accepted as a TRUE sentence. If the sequence leaves the machine in a non-final state, then it is not a TRUE sentence. To define a machine like this, one needs to specify: (a) The set of possible values of the input stimulus; (b) The set of possible states; (c) The subset of final state(s); (d) The transition matrix specifying the next state from the current state and the input value; (e) The initial state. Once these give items are given, the machine's behaviour can be completely determined. We are said to have defined an automaton. For example, consider a machine that will determine whether a received decimal integer is divisible by 3. We know that such a number, when all its digits are added together, produces another integer divisible by 3. Hence, if we add up all the digits modulo 3 as they arrive, and if the final sum is 0, then the received number is divisible by 3. To handle this with an automaton, it must have 3 states, one showing that the digits received so far add up to 0 modulo 3, one showing that the remainder is 1, and one more showing the remainder to be 2. A transition from state 0 to 1, 1 to 2, or 2 to 0 occurs if a digit of value 1, 4 or 7 is received; transition from 0 to 2, 1 to 0 or 2 to 1 occurs if a digit of value 2, 5 or 8 arrives. No change of state occurs if 0, 3, 6 or 9 is received. The following shows the transition relations between states: __________ 1,4,7 __________ | State 0 |---------------------------->| | |------->| |<----------------------------| State 1 |--------| --------| initial | 2,5,8 | |<------- 0,3,6,9 |_& final__|<----------. .--------->|__________| 0,3,6,9 | 1,4,7 | | 2,5,8 | | | | | | _| |_ | | 2,5,8 | | 1,4,7 | '------------>| State 2 |<----------' |------->| | --------|__________| 0,3,6,9 Hence, the machine has the transition matrix: input 0 1 2 3 4 5 6 7 8 9 state 0 0 1 2 0 1 2 0 1 2 0 1 1 2 0 1 2 0 1 2 0 1 2 2 0 1 2 0 1 2 0 1 2 An automaton (or more properly, a finite state deterministic automaton) has by itself a very simple behaviour; the intelligence lies in the transition matrix. The program will return T or NIL depending on whether the received input string is a TRUE sentence. Final is a function that checks whether State is one of the final states, and MATRIX takes the matrix Transition, the current state and an input value, returning the selected matrix element. Exercise: Formulate the formal system for addition (i.e., the xPyQz system) as an automaton. Exercise: Suppose we re-formulate the system as follows: x is a string made up of either - (x+) or + (x-) only (no mixture is allowed), and the starting axiom is x-PRx+, with the deduction rule R ::= Q ! -R+. Show that a simpler automaton can be formulated to accept sentences for addition. There is a simple correspondence between the transition matrix of an automaton and the formal syntax of the language it recognizes. Taking the MOD 3 automaton as example: Suppose the first digit received is a 0 MOD 3 digit, then the machine expect the remaining digits to be a 0 MOD 3 number; if the first digit is 1 MOD 3, then a 2 MOD 3 number is needed to add up to a 0 MOD 3 number; if the first digit is 2 MOD 3, a 1 MOD 3 number is needed. A TRUE sentence may be generated by the deduction rule 0A ! 3A ! 6A ! 9A ! 2B ! 5B !8B ! 1C ! 4C ! 7C where A denotes a 0 MOD 3 number, B, 1 MOD 3, and C, 2 MOD 3. Further, 0, 3, 6 or 9 alone is also a TRUE sentence, and may be added to the above rule. However, A is a number divisible by 3 and so is a TRUE sentence. Hence, the above deduction rule is actually A ::= 0 ! 3 ! 6 ! 9 ! 0A ! 3A ! 6A ! 9A ! 2B ! 5B ! 8B ! 1C ! 4C ! 7C We can also formulate rules for B and C: B may be 1, 4, or 7, or 1, 4, or 7 followed by a 0 MOD 3 value, or 0, 3, 6 or 9 followed a 1 MOD 3 value, or ... Thus we have the rule B ::= 1 ! 4 ! 7 ! 1A ! 4A ! 7A ! 0B ! 3B ! 6B ! 9B ! 2C ! 5C ! 8C and C ::= 2 ! 5 ! 8 ! 2A ! 5A ! 8A ! 1B ! 4B ! 7B ! 0C ! 3C ! 6C ! 9C Comparing this with the transition matrix for the automaton given earlier, we see that there is a close correspondence. Indeed, for any element of the transition matrix with <- , we have a part of the deduction rule giving ::= meaning that initially the machine was expecting an input string represented by symbol y, but after receiving input x, its expectation changes to a shorter string represented by symbol z. However, if state z is final, then no further input is required and need not be present, and so the rule is ::= . And, if state z may be both final and non-final, then we have ::= ! . Note however the above method does not work for the formal system for addition (PQ system), because the deduction replaces R with -R-, with a required input symbol on each side, rather than in front of R only. Indeed, there are serious limitations on the kind of formal languages that can be processed with finite deterministic automata. Now we consider another kind of automata, the non-deterministic type. The transition matrix, instead of specifying one unique new state given the current state and the input, specifies a number of alternative states. From each alternative state, the next input puts the machine into more alternative states, and so on. If, upon receiving the whole input string, any of these alternative paths lead to a final state, then the input string is accepted as a TRUE sentence. The following diagram shows the transition diagram for a non-deterministic automaton: __________ __________ | | a | | | State 0 |--------------------------->| State 1 |--------| | initial | | |<------- |__________| |__________| a,b,c | | | | | |a | |b __________ ___V______ | | | | b | | | '------------>| State 2 |------>| State 4 | |c |------->| | | final | | --------|__________| |__________| | a,b,c ^ | | _____V_____ a,b,c | | |-------| | | State 3 |<------ c | | |-----------------------------------' |___________| automaton that accepts strings made up of a, b & c and ending with a symbol that also appears at the start the string, e.g., aca, cac, accaa, bcab, etc. The initial symbol puts the machine into three separate states, from which only the same symbol would put it into the final state. However, it would not be correct to just go into the final state upon receiving the same symbol again, because this may not be the end of the string. Hence, the machine has the alternative of staying in the same state and go to the final state later. A non-deterministic automaton has a transition matrix whose elements are sets of alternative states, rather than individual states. The matrix for the above machine is: (Here - denotes the empty set.) input a b c state 0 {1} {2} {3} 1 {1 4} {1} {1} 2 {2} {2 4} {2} 3 {3} {3} {3 4} 4 - - - The following is an automaton algorithm: If the input has ended, then the program checks each alternative state to see if it is final. If it reaches the end of the State list, then none of the states are final and the input string was not a TRUE sentence. If the input has not ended, it follows the successor set of states for each of the alternative current states and see if any of them leads to the final state. If it reaches the end of the State list, it knows that none of the alternatives lead to the final state. An fundamental theorem of automata theory is that any non-deterministic automaton has an equivalent deterministic automaton. This is achieved simply by defining sets of states of the former as states of the latter. To be more specific: if a machine can reach a set of states that includes a final state, then because it explores all alternatives, it will surely reach that final state. Hence, a set of states that includes a final state is as good as a final state, and instead of studying which individual states a machine comes from and goes to after receiving an input, we study all the alternative states it could have been in, and all the alternative states it could go to. In other words, we will specify a transition matrix from sets of states to sets of states. Taking the above matrix, we extend it by adding all the combinations that may arise. Note that the empty set combined with another set just reproduces that set. input state a b c 0 {0} {1} {2} {3} 1 {1} {1 4} {1} {1} 2 {2} {2} {2 4} {2} 3 {3} {3} {3} {3 4} 4 {4} - - - 5 {1 4} {1 4} {1} {1} 6 {2 4} {2} {2 4} {2} 7 {3 4} {3} {3} {3 4} Note that all the sets that include the final state are marked in bold. To define a deterministic automaton, we only have to assign a number to each set of states and produce a new transition matrix: input a b c 0 state 0 1 2 3 1 1 5 1 1 2 2 2 6 2 3 3 3 3 7 4 4 - - - <-- 5 5 5 1 1 6 6 2 6 2 7 7 3 3 7 Interestingly, state 4 cannot be reached from any state: it is only included in the other final states, and we really just need a 7 by 4 transition matrix. Exercise: Verify that the machine will accept strings like aa, aaa, aba, aaba, etc., but reject bba, baa, etc. Exercise: A more complex non-deterministic machine which will accept any string whose last symbol has appeared somewhere before, not necessarily as the first symbol, may be obtained from the above machine by changing the first line of the transition matrix from {1} {2} {3} to {0 1} {0 2} {0 3}, i.e., from the initial state 0 the machine has the alternative of staying in the state. Find the equivalent deterministic machine. Now consider the language syntax processed by this automaton. We are looking for sentences starting and ending with a, or b, or c. Let us denote a string ending with a by A. Clearly, A may be just a, or it may be any symbol followed by another A. Similarly defining B and C, we have ::= aA ! bB ! cC A ::= a ! aA ! bA ! cA B ::= b ! aB ! bB ! cB C ::= c ! aC ! bC ! cC If we try to define an automaton from the above grammar, the result is the five state non-deterministic machine, with the states Initial, A, B, C, and Final. The machine is non-deterministic because when it is in state A, expecting a string A, and receives a, it is not sure whether to go into Final and expect end of string, or to expect another A. Hence, both possibilities need to be explored. Similar uncertainty arises in states B and C. To find the deterministic machine, let us invent a new symbol A' which may be either end of string _ or an A. We then have A ::= aA' ! bA ! cA B ::= aB ! bB' ! cB C ::= aC ! bC ! cC' and the additional rules for A', B' and C': A' ::= _ ! A etc. Unfortunately, a rule like this cannot be used in defining an automaton, as a machine must change state because of an input. So we use the rule for A to change the above to A' ::= _ ! aA' ! bA ! cA B' ::= _ ! aB ! bB' ! cB C' ::= _ ! aC ! bC ! cC' We now have a deterministic machine with 8 states (Initial, A, B, C, A', B', C', Final), same as the one discovered earlier by manipulating the transition matrix. All this must look rather artificial, and indeed it is because we already know what we are looking for. Exercise: Work out the transition matrix from the above deduction rules and show it to be the same. The model provided by theory of automata appears to be related to computers in two ways: machines with internal states like computer memory, and with capability to process formal languages. The model is general and has few prerequisites and assumptions, and properties and limitations of the model are generally applicable to all machines based on the model. Hence, automata theory was much studied in theoretical computer science in the hope of discovering general laws governing computers. However, our examples should make it clear that the model is not easily used to study any complex computations, for a very basic reason: the machine's present state, one single number, is meant to capture all its knowledge, and the state alone decides how the machine would respond to any input. In particular, the machine does not "remember" how it reached a state and what the past input string was. If such memory about past history is important, then you have to provide different states that are arrived at in different ways. By inventing enough states, we can have any content of memory and any past history represented by a different state, but this is obviously not a convenient or efficient way of storing knowledge, and results derived from automata theory must be re-cast into other forms before they are able to describe the behaviour of machines that represent knowledge and execution history in other ways. This recasting process is itself very complicated. For this and other reasons, automata theory today is a peripheral part of computer science, and is mainly of interest to mathematicians. The Turing machine, also studied mainly by mathematicians these days, is a more powerful model because of its ability to change the input string and move back to a previously read/changed symbol, so that the input string also functions as its internal storage. Further, the Turing machine has greater theoretical importance because of its equivalence to other models of computation. For example, the inherent logical limitation on its capability turns out to correspond to similar limitations in predicate logic and LAMBDA calculus.