I will not discuss here finite state machines theory, just show how useful they are in real life, even in solving simple problems. ( to get more theory goto http://en.wikipedia.org/wiki/Finite_state_machine ).
Example No.1: Lets create finite state machine that extract integer from string. We assume that integer may start with '+', '-' or any decimal digit. Flex pattern for integers we will deal with is: [+-]?[1-9][0-9]* |"0"
VALID INTEGERS |
INVALID INTEGERS |
| 123 | 01 |
| +123 | 00 |
| -123 | +0 |
| 0 | -0 |
| ++1 |
Our FSM will receive string on input and produce integer number on output. If string is 'invalid' (doesn't represent integer number) it will throw exception. In order to create FSM we need to define states and input symbols (I call symbol class of input).
We will divide all input characters into 5 classes:
CLASS |
CHARACTERS |
| PLUS | '+' |
| MINUS | '-' |
| ZERO | '0' |
| NONZERO | '1','2','3','4','5','6','7','8','9' |
| OTHER | all other characters |
And have 5 states:
FAILURE |
START |
ZERO |
SIGN |
BODY |
Now we need to build state-input table that will contain the logic of FSM (how it reacts on input depending from state).
|
|
INPUT |
PLUS |
MINUS |
ZERO |
NONZERO |
OTHER |
STATE |
||||||
START |
SIGN |
SIGN (negative integer) |
ZERO |
BODY |
FAILURE |
|
SIGN |
FAILURE |
FAILURE |
FAILURE |
BODY |
FAILURE |
|
ZERO |
FAILURE |
FAILURE |
FAILURE |
FAILURE |
FAILURE |
|
BODY |
FAILURE |
FAILURE |
BODY |
BODY |
FAILURE |
|
Let's take for example cell START-NONZERO. This cell contains name of state that FSM will reach when being in START state it will receive symbol from sequence: '1','2','3','4','5','6','7','8','9'.
Another cell: ZERO-PLUS. It contains FAILURE state what means that when we are in ZERO state ('0' was received in the beginning of string) and we get '+', given pattern breaks integer number construction rules. Of course, integer cannot contain substring "0+".
And, now we finally get the VB.NET source for this FSM:
'
' IntDFA.vb - DFA to parse signed integers
'
' Project: Deterministic Finite Automata With VB.NET
' Author: S.Zabinskis
' February, 2008
'
Public Class TIntParser
Private Enum TSymbol
Minus
Plus
Zero
NonZero
Other
End Enum
Private Enum TState
Failure
Start
Sign
Zero
Body
End Enum
Private Sub New()
End Sub
Public Shared Function Parse(ByVal s As String) As Int32
Dim positive As Boolean = True
Dim state As TState = TState.Start
Dim value As Int32 = 0
For Each c As Char In s.ToCharArray()
Dim symbol = GetSymbol(c)
Select Case (state.ToString() & ":" & symbol.ToString()).ToUpper()
Case "START:OTHER"
state = TState.Failure
Case "START:ZERO"
state = TState.Zero
Case "START:PLUS"
state = TState.Sign
Case "START:MINUS"
positive = False
state = TState.Sign
Case "START:NONZERO"
value = Asc(c) - Asc("0")
state = TState.Body
Case "SIGN:PLUS", "SIGN:MINUS", "SIGN:ZERO", "SIGN:OTHER"
state = TState.Failure
Case "SIGN:NONZERO"
state = TState.Body
Case "ZERO:ZERO", "ZERO:MINUS", "ZERO:PLUS", "ZERO:NONZERO", "ZERO:OTHER"
state = TState.Failure
Case "BODY:PLUS", "BODY:MINUS", "BODY:OTHER"
state = TState.Failure
Case "BODY:ZERO", "BODY:NONZERO"
value = value * 10 + Asc(c) - Asc("0")
End Select
If state = TState.Failure Then
Throw New Exception("Invalid integer")
End If
Next
If Not positive Then
value = 0 - value
End If
Return value
End Function
Private Shared Function GetSymbol(ByVal c As Char) As TSymbol
Dim symbol As TSymbol = TSymbol.Other
Select Case c
Case "+"
symbol = TSymbol.Plus
Case "-"
symbol = TSymbol.Minus
Case "0"
symbol = TSymbol.Zero
Case Else
If Char.IsDigit(c) Then
symbol = TSymbol.NonZero
End If
End Select
Return symbol
End Function
End Class |
You may see that I use concatenated state and input symbol - (state.ToString() & ":" & symbol.ToString()).ToUpper()
to determine what action to choose. For example: "START:ZERO". Of course it isn't optimal for performance but it is very clear and readable.
And, at last simple program to test our FSM:
Module Module1
Sub Main()
Console.WriteLine("str=""12345"" N=" & TIntParser.Parse("12345"))
Console.WriteLine("str=""-12345"" N=" & TIntParser.Parse("-12345"))
Console.WriteLine("str=""+12345"" N=" & TIntParser.Parse("+12345"))
Console.WriteLine("str=""1234500"" N=" & TIntParser.Parse("1234500"))
Console.WriteLine("str=""0"" N=" & TIntParser.Parse("0"))
Console.WriteLine("str=""00"" N=" & TIntParser.Parse("00"))
Console.ReadLine()
End Sub
End Module
|
We can see that strings: '12345', '-12345', '+12345', '1234500' and '0' are valid integers, and the last '00' is invalid - parser will throw exception.
Example No.2: Extract quoted (single and double) strings from text.
If we should apply it to previous sentence it should return strings: '12345', '-12345', '+12345', '1234500', '0', '00'.
CLASS |
CHARACTERS |
| SNGLQUOTE | ' (single quote) |
| DBLQUOTE | " (double quote) |
| CRLF | chr(13)+chr(10) |
| OTHER | all other characters |
And have 4 states:
FAILURE |
START |
SNGLQUOTE |
DBLQUOTE |
Transition matrix for this FSM:
| INPUT | SNGLQUOTE |
DBLQUOTE |
CRLF |
OTHER |
|
| STATE | |||||
START |
SNGLQUOTE |
DBLQUOTE (start of double quoted literal) |
START
(ignore CRLF) |
START (ignore other characters) |
|
SNGLQUOTE |
START (end of single quoted literal) |
SNGLQUOTE |
SNGLQUOTE |
SNGLQUOTE |
|
DBLQUOTE |
DBLQUOTE |
START
(end of single quoted literal) |
DBLQUOTE |
DBLQUOTE |
|
'
' LiteralDFA.vb - DFA to extract literals from text.
'
' Project: Deterministic Finite Automata With VB.NET
' Author: S.Zabinskis
' February, 2008
'
Public Class TLiteralDFA
Private Enum TSymbol
SnglQuote
DblQuote
CrLf
Other
End Enum
Private Enum TState
Failure
Start
DblQuote
SnglQuote
End Enum
Private Sub New()
End Sub
Public Shared Function Parse(ByVal s As String) As List(Of String)
Dim literals As New List(Of String)
Dim state As TState = TState.Start
Dim value As String = String.Empty
For Each c As Char In s.ToCharArray()
Dim symbol = GetSymbol(c)
Console.WriteLine((state.ToString() & ":" & symbol.ToString()).ToUpper())
Select Case (state.ToString() & ":" & symbol.ToString()).ToUpper()
Case "START:OTHER", "START:CRLF"
state = TState.Start
Case "START:SNGLQUOTE"
state = TState.SnglQuote
value = String.Empty
Case "START:DBLQUOTE"
state = TState.DblQuote
value = String.Empty
Case "SNGLQUOTE:SNGLQUOTE"
literals.Add(value)
state = TState.Start
Case "SNGLQUOTE:DBLQUOTE", "SNGLQUOTE:OTHER"
value &= c
Case "DBLQUOTE:DBLQUOTE"
literals.Add(value)
state = TState.Start
Case "DBLQUOTE:SNGLQUOTE", "DBLQUOTE:OTHER"
value &= c
Case "SNGLQUOTE:CRLF", "DBLQUOTE:CRLF"
' just ignore CRLF
End Select
Next
If state = TState.DblQuote Or state = TState.SnglQuote Then
Throw New Exception("Unterminated literal")
End If
Return literals
End Function
Private Shared Function GetSymbol(ByVal c As Char) As TSymbol
Dim symbol As TSymbol = TSymbol.Other
Select Case c
Case "'"
symbol = TSymbol.SnglQuote
Case """"
symbol = TSymbol.DblQuote
Case vbCrLf
symbol = TSymbol.CrLf
End Select
Return symbol
End Function
End Class |