Skip to main content

A python framework for formal grammars

Project description

pyformlang

Build Status pypi Documentation

A python library to manipulate formal grammar. In general, it can be used to better understand algorithms in a formal way.

Installation

pip3 install pyformlang

Sources

Most algorithms come from Introduction to Automata Theory, Languages, and Computation (2nd edition) by John E. Hopcroft, Rajeev Motwani and Jeferey D. Ullman.

Indexed grammars come from the original paper Index Grammars - An Extension of Context-free grammars by Alfred V. Aho.

On the implementation of Hopcroft minimization algorithm: Implementation of Hopcroft's Algorithm, Hang Zhou

Usage

Regular expression

pyformlang contains a basic regex reader. The available operators are:

  • The concatenation, the default operator, which can by represented either by a space or a dot (.)
  • The union, represented either by | or +
  • The kleene star, represented by *

It is also possible to use parenthesis. Symbols different from the space, ., |, +, *, (, ) and $ can be part of the alphabet, consecutive characters being consider as a single symbol. The epsilon symbol can either be epsilon or $.

from pyformlang.regular_expression import Regex

regex = Regex("(a-|a a b)*")

Finite Automata

pyformlang contains several finite automata, all of them being equivalent in the languages they can describe. In general, the states have to be represented by a pyformlang.finite_automaton.State object and the symbols by a pyformlang.finite_automaton.Symbol. When the class is not ambiguous, raw values can also be used. In addition, epsilon transitions are elements of the class: pyformlang.finite_automaton.Epsilon.

Deterministic Automata

These represent deterministic automata, i.e. with only one possible next state possible at each stage and no epsilon transitions.

from pyformlang.finite_automaton import DeterministicFiniteAutomaton
from pyformlang.finite_automaton import State
from pyformlang.finite_automaton import Symbol

# Declaration of the DFA
dfa = DeterministicFiniteAutomaton()

# Creation of the states
state0 = State(0)
state1 = State(1)
state2 = State(2)
state3 = State(3)

# Creation of the symbols
symb_a = Symbol("a")
symb_b = Symbol("b")
symb_c = Symbol("c")
symb_d = Symbol("d")

# Add a start state
dfa.add_start_state(state0)

# Add two final states
dfa.add_final_state(state2)
dfa.add_final_state(state3)

# Create transitions
dfa.add_transition(state0, symb_a, state1)
dfa.add_transition(state1, symb_b, state1)
dfa.add_transition(state1, symb_c, state2)
dfa.add_transition(state1, symb_d, state3)

# Check if a word is accepted
dfa.accepts([symb_a, symb_b, symb_c])

Non Deterministic Automata

The representation of non deterministic automata, i.e. automata with possibly several next states at each stage but no epsilon transitions.

from pyformlang.finite_automaton import NondeterministicFiniteAutomaton
from pyformlang.finite_automaton import State
from pyformlang.finite_automaton import Symbol

# Definition of the NFA
nfa = NondeterministicFiniteAutomaton()

# Declare the states
state0 = State(0)
state1 = State(1)
state2 = State(2)
state3 = State(3)
state4 = State(4)

# Declare the symbols
symb_a = Symbol("a")
symb_b = Symbol("b")
symb_c = Symbol("c")
symb_d = Symbol("d")

# Add a start state
nfa.add_start_state(state0)
# Add a final state
nfa.add_final_state(state4)
nfa.add_final_state(state3)
# Add the transitions
nfa.add_transition(state0, symb_a, state1)
nfa.add_transition(state1, symb_b, state1)
nfa.add_transition(state1, symb_c, state2)
nfa.add_transition(state1, symb_d, state3)
nfa.add_transition(state1, symb_c, state4)
nfa.add_transition(state1, symb_b, state4)

# Check if a word is accepted
nfa.accepts([symb_a, symb_b, symb_c])

# Check if a NFA is deterministic
nfa.is_deterministic() # False

# Get the equivalent deterministic automaton
dfa = nfa.to_deterministic()

Epsilon Non Deterministic Automata

It represents a non deterministic automaton where epsilon transitions are allowed.

from pyformlang.finite_automaton import EpsilonNFA, State, Symbol, Epsilon

# Declaration of the symbols and the states
epsilon = Epsilon()
plus = Symbol("+")
minus = Symbol("-")
point = Symbol(".")
digits = [Symbol(x) for x in range(10)]
states = [State("q" + str(x)) for x in range(6)]

# Creattion of the Epsilon NFA
enfa = EpsilonNFA()
enfa.add_start_state(states[0])
enfa.add_final_state(states[5])
enfa.add_transition(states[0], epsilon, states[1])
enfa.add_transition(states[0], plus, states[1])
enfa.add_transition(states[0], minus, states[1])
for digit in digits:
    enfa.add_transition(states[1], digit, states[1])
    enfa.add_transition(states[1], digit, states[4])
    enfa.add_transition(states[2], digit, states[3])
    enfa.add_transition(states[3], digit, states[3])
enfa.add_transition(states[1], point, states[2])
enfa.add_transition(states[4], point, states[3])
enfa.add_transition(states[3], epsilon, states[5])

# Checks if a word is accepted
enfa.accepts([plus, digits[1], point, digits[9]])

Regex and Finite Automaton

As regex and finite automaton are equivalent, one can turn one into the other.

from pyformlang.finite_automaton import EpsilonNFA, State, Symbol, Epsilon

enfa = EpsilonNFA()
state0 = State(0)
state1 = State(1)
symb_a = Symbol("0")
symb_b = Symbol("1")
enfa.add_start_state(state0)
enfa.add_final_state(state1)
enfa.add_transition(state0, symb_a, state0)
enfa.add_transition(state1, symb_b, state0)
enfa.add_transition(state1, symb_b, state1)

# Turn a finite automaton into a regex...
regex = enfa.to_regex()
# And turn it back into an epsilon non deterministic automaton
enfa2 = regex.to_epsilon_nfa()

Context-Free Grammar

We represent here context-free grammars. Like finite automata, one needs to use the classes pyformlang.cfg.Variable and pyformlang.cfg.Terminal to represent variables and terminals. The productions need to be represented as pyformlang.cfg.Production. In addition, epsilon terminals are members of pyformlang.cfg.Epsilon.

from pyformlang.cfg import Production, Variable, Terminal, CFG, Epsilon

# Creation of variables
var_useless = Variable("USELESS")
var_S = Variable("S")
var_B = Variable("B")

# Creation of terminals
ter_a = Terminal("a")
ter_b = Terminal("b")
ter_c = Terminal("c")

# Creation of productions
p0 = Production(var_S, [ter_a, var_S, var_B])
p1 = Production(var_useless, [ter_a, var_S, var_B])
p2 = Production(var_S, [var_useless])
p4 = Production(var_B, [ter_b])
p5 = Production(var_useless, [])

# Creation of the CFG
cfg = CFG({var_useless, var_S}, {ter_a, ter_b}, var_S, {p0, p1, p2, p4, p5})

# Check for containment
cfg.contains([Epsilon()])
cfg.contains([ter_a, ter_b])

Push-Down Automata

For a Push-Down Automata, there are there objects: pyformlang.pda.State which represents a state, pyformlang.pda.Symbol which represents a symbol and pyformlang.pda.StackSymbol which represents a stack symbol.

PDA can either accept by final state or by empty stack. Function are provided to transform one kind into the other.

from pyformlang.pda import PDA, State, StackSymbol, Symbol, Epsilon

# Declare states
q = State("#STARTTOFINAL#")
q0 = State("q0")

# Declare symbols
e = Symbol("e")
i = Symbol("i")

# Declare stack symbols
Z = StackSymbol("Z")
Z0 = StackSymbol("Z0")

# Create the PDA
pda = PDA(states={q, q0},
          input_symbols={i, e},
          stack_alphabet={Z, Z0},
          start_state=q,
          start_stack_symbol=Z0,
          final_states={q0})

# Add transitions
pda.add_transition(q, i, Z, q, (Z, Z))
pda.add_transition(q, i, Z0, q, (Z, Z0))
pda.add_transition(q, e, Z, q, [])
pda.add_transition(q, Epsilon(), Z0, q0, [])

# Transformation to a PDA accepting by empty stack
pda_empty_stack = pda.to_empty_stack()
# Transformation to a PDA accepting by final state
pda_final_state = pda_empty_stack.to_final_state()

CFG and PDA

As CFG and PDA are equivalent, one can turn one into the other, but needs to be careful about whether the PDA accepts on empty stack and final state. The conversions between CFG and PDA are done when the PDA is accepting by empty stack

from pyformlang.cfg import Production, Variable, Terminal, CFG

ter_a = Terminal("a")
ter_b = Terminal("b")
ter_c = Terminal("c")
var_S = Variable("S")
productions = {Production(var_S, [ter_a, var_S, ter_b]),
               Production(var_S, [ter_c])}
cfg = CFG(productions=productions, start_symbol=var_S)

# Convert into a PDA accepting by final state
pda_empty_stack = cfg.to_pda()
# Go to final state
pda_final_state = pda_empty_stack.to_final_state()
# Go back to empty stack, necessary to transform into a CFG
pda_empty_stack = pda_final_state.to_empty_stack()
# Transform the PDA into a CFG
cfg = pda_empty_stack.to_cfg()

Indexed Grammars

Indexed grammars are grammars which have a stack which can be duplicated. In an indexed grammar, rules can take 4 forms (sigma is the stack):

  • EndRule: This simply turns a Variable into a terminal, for example A[sigma]->a
  • ProductionRule: We push something on the stack, for example A[sigma]->B[f sigma]
  • ConsumptionRule: We consume something from the stack, for example A[f sigma] -> C[sigma]
  • DuplicationRule: We duplicate the stack, for example A[sigma] -> B[sigma] C[sigma]
from pyformlang.indexed_grammar import Rules
from pyformlang.indexed_grammar import ConsumptionRule
from pyformlang.indexed_grammar import EndRule
from pyformlang.indexed_grammar import ProductionRule
from pyformlang.indexed_grammar import DuplicationRule
from pyformlang.indexed_grammar import IndexedGrammar

l_rules = []

# Initialization rules
l_rules.append(ProductionRule("S", "Cinit", "end"))
l_rules.append(ProductionRule("Cinit", "C", "b"))
l_rules.append(ConsumptionRule("end", "C", "T"))
l_rules.append(EndRule("T", "epsilon"))

# C[cm sigma] -> cm C[sigma]
l_rules.append(ConsumptionRule("cm", "C", "B0"))
l_rules.append(DuplicationRule("B0", "A0", "C"))
l_rules.append(EndRule("A0", "cm"))

rules = Rules(l_rules)
i_grammar = IndexedGrammar(rules)
self.assertTrue(i_grammar.is_empty())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyformlang-0.1.12.tar.gz (54.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyformlang-0.1.12-py3-none-any.whl (78.8 kB view details)

Uploaded Python 3

File details

Details for the file pyformlang-0.1.12.tar.gz.

File metadata

  • Download URL: pyformlang-0.1.12.tar.gz
  • Upload date:
  • Size: 54.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.8

File hashes

Hashes for pyformlang-0.1.12.tar.gz
Algorithm Hash digest
SHA256 4cfa993caec46008b9c389dbf3c9f788862e5480a25342330b75026a2dd23cd4
MD5 cff581d669be606f13cbeda61e4570c0
BLAKE2b-256 27207b05ecef7534069a62c546b9923fef076557eec851b88e771cda2834588d

See more details on using hashes here.

File details

Details for the file pyformlang-0.1.12-py3-none-any.whl.

File metadata

  • Download URL: pyformlang-0.1.12-py3-none-any.whl
  • Upload date:
  • Size: 78.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.6.3 requests-toolbelt/0.8.0 tqdm/4.29.0 CPython/3.6.8

File hashes

Hashes for pyformlang-0.1.12-py3-none-any.whl
Algorithm Hash digest
SHA256 6dc5abe815bf42408d1da6aa6f8d8775c2aabf411648023173868a988a072bbb
MD5 91eb01fab9ba775f4d55fc67a1e0a8b7
BLAKE2b-256 571ed5bbe51aa0df182dd6afc910bd869403e22646d0f6c7732d04c7a2eff388

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page