Deterministic Finite Automaton (DFA)

Computer scientists have also done a lot of research in order to find easy mechanisms of computation which can be modified and improved in a very straightforward way. The easyness of the mechanisms can then be exploited in order to gain some insight on the abilities of these mechanisms and also to be able to check with computer programs whether such mechanisms do what they should. One of these mechanisms is that of a deterministic finite automaton (dfa). Its task is very simple: It should check whether a number (given as a string of its digits) or more general whether some string of symbols belongs to some given language L or not. This can of course only be done for languages L of a certain form.

A deterministic finite automaton consists of a set of finitely many states, which one might number 0,1,2,... and where one might assume that the machine is initially in the state 0.

The second ingredient of the DFA is an update function which tells for each state and symbol read what the succeeding state is.

Furthermore, some of the states are called accepting states; the other states are called rejecting states.

Now a DFA processes a word w by starting in the state 0 and updating the state for each symbol in the word according to the update function; if the state reached this way is "accepting" then the DFA accepts the word w else the DFA rejects the word w.

Multiples of 3

An example is the well-known rule to check whether a number is a multiple of 3. For this one starts with the value 0 in the state. For each digit, the new value of the state is the remainder by 3 of the sum of the old value of the state plus the value of the digit. The whole number is a multiple of 3 iff the state is 0 after processing all digits. Here two sample runs. The following button permits to try out a Javascript program implementing this automaton.

Multiples of a number p in general

The update function can be implemented more in general to check whether the input is the multiple of a fixed number p. The adjustment is the following: The reason that this algorithm works are the following rules for the remainder operation "%":
       (a * b) % p  =  ((a % p) * b) % p ;
       (a + b) % p  =  ((a % p) + b) % p .
Furthermore, the equation
       val(wd) = val(w)*10+val(d)
on the value of strings is combined with the above rules on the remainder to obtain the following rule:
       val(wd) % p = ( (val(w) % p) * 10 + val(d) ) % d .
In the automaton, val(w) % p is the old state and val(wd) % p is the new state. In the case of computing the remainders by 3 and 9, the multiplication with 10 can be omitted (as 10%3 and 10%9 are both 1), but in general, it is needed.

The following button permits to try out a Javascript program implementing this automaton. Note that the so implemented function also works when the number analised has much more digits than the precision with which Javascript is computing.

Optimising the size of an automaton

One goal of automata theory is not only to construct automata, but also to construct automata of small size. To check whether a number is a multiple of p needs in the above default algorithm p states. For certain numbers, it is possible to do this with much less states, an example would be to check whether a number is a multiple of 10000. The idea is here just to count the trailing zeroes in the input processed so far and when this number is at least 4 then the automaton accepts; as there are only finitely many states, the automaton does not go beyong 4 when counting. Try out the automaton here.



For the case of checking whether a number is a multiple of 25, the automaton can do with 5 states, but it is more complicated to be built and the saving of states is not as large as the case of multiples of 10000. Here the automaton: The following button activates the corresponding automaton.

General pattern matching

One can use finite automata not only for checking whether a number (given as a string of digits) is a multiple of a fixed number. Many simple patterns of numbers can be checked with a finite automaton. For example, does a number have only two non-zero digits? Or does a number contain a certain sequence of digits like "121"? The two optimised automata used that the multiples of 10000 and 25 follow in the decimal system certain patterns; they would not work in the binary or ternary system.

The automaton to check whether the sequence "121" has the states {0,1,2,3} where being in state k means that the first k digits of "121" have just been processed and the automaton remains in state 3 when it has processed at some point all 3 digits one after the other. The starting state is 0 and the only accepting state is 3. The state transition function can be given best by a table.

StateSuccessor at "1"Successor at "2" Successor at other digit
0100
1120
2300
3333

The button below permits to run this automaton.

Combining Automata

One could also whether there are automata which combine the search for certain patterns. Is there, for example, an automaton which accepts a string w exactly if (w represents a decimal number which is a multiple of 25) or (w represents a decimal number which is a multiple of 3 and which contains the pattern "121")? The idea is to run all three automata in parallel and to combine their sets of states to one set of triple states. The program to do this would look like this:
function combinedpattern()
  { var stateseq = "000"; var statea=0; var stateb=0; var statec=0;
    var word = prompt("Input number to be checked"); var k;
    for (k in word)
      { statea = update("mod25",statea,word[k]);
        stateb = update("mod3",stateb,word[k]);
        statec = update("121",statec,word[k]);
        stateseq = stateseq+"-"+statea+stateb+statec; }
    if ((acceptance("mod25",statea)) ||
        ((acceptance("mod3",stateb))&&(acceptance("121",statec))))
      { alert("DFA accepts "+word+" with statesequence "+stateseq); }
    else
      { alert("DFA rejects "+word+" with statesequence "+stateseq); }
    return; }
Here the function update(..,state.,word[k]) computes for the state of the corresponding simulated dfa the new value; the first input parameter tells the function which dfa to use and the variables statea, stateb, statec refer to the corresponding states. The triple (statea,stateb,statec) is then the state of the combined automaton. The initial state is (0,0,0). The variable stateseq keeps a protocol of the states when processing the input. The acceptance condition is that the first simulated automaton has to accept or both the second and the third simulated automaton have to accept. The first automaton uses the states 0,1,2,3,4 where 0,1 are accepting, the second automaton uses states 0,1,2 where 0 is accepting, the third automaton uses states 0,1,2,3 where 3 is accepting. So all states of the forms x03, 0yz, 1yz are accepting. See the sections above for a more detailed description of these autoamta.

Regular languages

A language L is called regular iff it is recognised by a finite automaton, that is, there is a finite automaton which accepts the words in L and rejects the words outside L. The last example of the combined pattern showed that unions and intersections of regular languages are again regular and that one can also form more complicated combinations of regular languages with the result being regular again. Furthermore, one can also show that the set-difference of regular languages is regular. There are two more operations with languages which can be done and which combine regular languages to a new regular language.

The concatenation of two strings x and y is just the string obtained by putting x and y one after the other. So if x = 110 and y = 212 then xy = 110212. Furthermore, the concatenation of two languages L and H is the set {xy: x in L and y in H}.

The Kleene Star. Given a language L, the new language L* consists of all strings formed by concatenating finitely many words from L. Note that the empty string is formed by concatenating 0 many words and that every member of L is obtained by forming a concatenation consisting just of one parameter. Then xy is the concatenation of two strings and xyz the concatenation of three strings x,y,z and so on. Any finite concatenation of strings from L is in L*. See the Wikipedia page of the Kleene star for further information.

Now the regular languages can be characterised as that class of languages which can be formed from several base sets which are finite sets of strings by combining these base sets with the Kleene star, the union, the intersection and the concatenation.

Further Reading. Regular languages can also be characterised as the languages recognised by non-deterministic finite automata. They furthermore form the lowest level in the Chomsky hierarchy introduced by Avram Noam Chomsky.