DETERMINISTIC FINITE STATE MACHINES WITH VB.NET

I will not discuss here finite state machines theory, just show how useful they are in real life, even in solving simple problems. ( to get more theory goto http://en.wikipedia.org/wiki/Finite_state_machine ).

Example No.1: Lets create finite state machine that extract integer from string. We assume that integer may start with '+', '-' or any decimal digit. Flex pattern for integers we will deal with is: [+-]?[1-9][0-9]* |"0"

VALID INTEGERS
INVALID INTEGERS
123 01
+123 00
-123 +0
0 -0
  ++1

Our FSM will receive string on input and produce integer number on output. If string is 'invalid' (doesn't represent integer number) it will throw exception. In order to create FSM we need to define states and input symbols (I call symbol class of input).
We will divide all input characters into 5 classes:

CLASS
CHARACTERS
PLUS '+'
MINUS '-'
ZERO '0'
NONZERO '1','2','3','4','5','6','7','8','9'
OTHER all other characters

And have 5 states:

FAILURE
START
ZERO
SIGN
BODY

Now we need to build state-input table that will contain the logic of FSM (how it reacts on input depending from state).

 

 

INPUT
PLUS
MINUS
ZERO
NONZERO
OTHER
STATE
START

SIGN
(positive integer)

SIGN
(negative integer)
ZERO
BODY
FAILURE
SIGN
FAILURE
FAILURE
FAILURE
BODY
FAILURE
ZERO
FAILURE
FAILURE
FAILURE
FAILURE
FAILURE
BODY
FAILURE
FAILURE
BODY
BODY
FAILURE

Let's take for example cell START-NONZERO. This cell contains name of state that FSM will reach when being in START state it will receive symbol from sequence: '1','2','3','4','5','6','7','8','9'.
Another cell: ZERO-PLUS. It contains FAILURE state what means that when we are in ZERO state ('0' was received in the beginning of string) and we get '+', given pattern breaks integer number construction rules. Of course, integer cannot contain substring "0+".

And, now we finally get the VB.NET source for this FSM:

 
' 
'  IntDFA.vb - DFA to parse signed integers
'
'  Project: Deterministic Finite Automata With VB.NET
'  Author: S.Zabinskis
'  February, 2008
'

Public Class TIntParser


    Private Enum TSymbol
        Minus
        Plus
        Zero
        NonZero
        Other
    End Enum


    Private Enum TState
        Failure
        Start
        Sign
        Zero
        Body
    End Enum


    Private Sub New()
    End Sub


    Public Shared Function Parse(ByVal s As String) As Int32
        Dim positive As Boolean = True
        Dim state As TState = TState.Start
        Dim value As Int32 = 0
        For Each c As Char In s.ToCharArray()
            Dim symbol = GetSymbol(c)
            Select Case (state.ToString() & ":" & symbol.ToString()).ToUpper()
                Case "START:OTHER"
                    state = TState.Failure

                Case "START:ZERO"
                    state = TState.Zero

                Case "START:PLUS"
                    state = TState.Sign

                Case "START:MINUS"
                    positive = False
                    state = TState.Sign

                Case "START:NONZERO"
                    value = Asc(c) - Asc("0")
                    state = TState.Body

                Case "SIGN:PLUS", "SIGN:MINUS", "SIGN:ZERO", "SIGN:OTHER"
                    state = TState.Failure

                Case "SIGN:NONZERO"
                    state = TState.Body

                Case "ZERO:ZERO", "ZERO:MINUS", "ZERO:PLUS", "ZERO:NONZERO", "ZERO:OTHER"
                    state = TState.Failure

                Case "BODY:PLUS", "BODY:MINUS", "BODY:OTHER"
                    state = TState.Failure

                Case "BODY:ZERO", "BODY:NONZERO"
                    value = value * 10 + Asc(c) - Asc("0")

            End Select
            If state = TState.Failure Then
                Throw New Exception("Invalid integer")
            End If
        Next
        If Not positive Then
            value = 0 - value
        End If
        Return value
    End Function

    Private Shared Function GetSymbol(ByVal c As Char) As TSymbol
        Dim symbol As TSymbol = TSymbol.Other
        Select Case c
            Case "+"
                symbol = TSymbol.Plus

            Case "-"
                symbol = TSymbol.Minus

            Case "0"
                symbol = TSymbol.Zero

            Case Else
                If Char.IsDigit(c) Then
                    symbol = TSymbol.NonZero
                End If

        End Select
        Return symbol
    End Function

End Class
 

You may see that I use concatenated state and input symbol -

(state.ToString() & ":" & symbol.ToString()).ToUpper()
to determine what action to choose. For example: "START:ZERO". Of course it isn't optimal for performance but it is very clear and readable.
And, at last simple program to test our FSM:

 
Module Module1

    Sub Main()
        Console.WriteLine("str=""12345"" N=" & TIntParser.Parse("12345"))
        Console.WriteLine("str=""-12345"" N=" & TIntParser.Parse("-12345"))
        Console.WriteLine("str=""+12345"" N=" & TIntParser.Parse("+12345"))
        Console.WriteLine("str=""1234500"" N=" & TIntParser.Parse("1234500"))
        Console.WriteLine("str=""0"" N=" & TIntParser.Parse("0"))
        Console.WriteLine("str=""00"" N=" & TIntParser.Parse("00"))
        Console.ReadLine()
    End Sub
End Module

We can see that strings: '12345', '-12345', '+12345', '1234500' and '0' are valid integers, and the last '00' is invalid - parser will throw exception.

Example No.2: Extract quoted (single and double) strings from text.
If we should apply it to previous sentence it should return strings: '12345', '-12345', '+12345', '1234500', '0', '00'.

CLASS
CHARACTERS
SNGLQUOTE ' (single quote)
DBLQUOTE " (double quote)
CRLF chr(13)+chr(10)
OTHER all other characters

And have 4 states:

FAILURE
START
SNGLQUOTE
DBLQUOTE

Transition matrix for this FSM:

  INPUT
SNGLQUOTE
DBLQUOTE
CRLF
OTHER
STATE
START

SNGLQUOTE
(start of single quoted literal)

DBLQUOTE
(start of double quoted literal)
START
(ignore CRLF)
START
(ignore other characters)
SNGLQUOTE
START
(end of single quoted literal)
SNGLQUOTE
SNGLQUOTE
SNGLQUOTE
DBLQUOTE
DBLQUOTE
START
(end of single quoted literal)
DBLQUOTE
DBLQUOTE

 

 
' 
'  LiteralDFA.vb - DFA to extract literals from text.
'
'  Project: Deterministic Finite Automata With VB.NET
'  Author: S.Zabinskis
'  February, 2008
'

Public Class TLiteralDFA


    Private Enum TSymbol
        SnglQuote
        DblQuote
        CrLf
        Other
    End Enum


    Private Enum TState
        Failure
        Start
        DblQuote
        SnglQuote
    End Enum


    Private Sub New()
    End Sub


    Public Shared Function Parse(ByVal s As String) As List(Of String)
        Dim literals As New List(Of String)
        Dim state As TState = TState.Start
        Dim value As String = String.Empty
        For Each c As Char In s.ToCharArray()
            Dim symbol = GetSymbol(c)
            Console.WriteLine((state.ToString() & ":" & symbol.ToString()).ToUpper())
            Select Case (state.ToString() & ":" & symbol.ToString()).ToUpper()
                Case "START:OTHER", "START:CRLF"
                    state = TState.Start

                Case "START:SNGLQUOTE"
                    state = TState.SnglQuote
                    value = String.Empty

                Case "START:DBLQUOTE"
                    state = TState.DblQuote
                    value = String.Empty

                Case "SNGLQUOTE:SNGLQUOTE"
                    literals.Add(value)
                    state = TState.Start

                Case "SNGLQUOTE:DBLQUOTE", "SNGLQUOTE:OTHER"
                    value &= c

                Case "DBLQUOTE:DBLQUOTE"
                    literals.Add(value)
                    state = TState.Start

                Case "DBLQUOTE:SNGLQUOTE", "DBLQUOTE:OTHER"
                    value &= c

                Case "SNGLQUOTE:CRLF", "DBLQUOTE:CRLF"
                    ' just ignore CRLF

            End Select
        Next
        If state = TState.DblQuote Or state = TState.SnglQuote Then
            Throw New Exception("Unterminated literal")
        End If
        Return literals
    End Function


    Private Shared Function GetSymbol(ByVal c As Char) As TSymbol
        Dim symbol As TSymbol = TSymbol.Other
        Select Case c
            Case "'"
                symbol = TSymbol.SnglQuote

            Case """"
                symbol = TSymbol.DblQuote

            Case vbCrLf
                symbol = TSymbol.CrLf

        End Select
        Return symbol
    End Function

End Class