Skip to main content

Python library for identifying (learning) DFAs (automata) from labeled examples.

Project description

dfa-identify

Python library for identifying (learning) minimal DFAs from labeled examples by reduction to SAT.

Build Status PyPI version License: MIT

Table of Contents

Installation

If you just need to use dfa, you can just run:

$ pip install dfa

For developers, note that this project uses the poetry python package/dependency management tool. Please familarize yourself with it and then run:

$ poetry install

Usage

dfa_identify is centered around the find_dfa and find_dfas function. Both take in sequences of accepting and rejecting "words", where are word is a sequence of arbitrary python objects.

  1. find_dfas returns all minimally sized (no DFAs exist of size smaller) consistent with the given labeled data.

  2. find_dfa returns an arbitrary (first) minimally sized DFA.

The returned DFA object is from the dfa library.

from dfa_identify import find_dfa


accepting = ['a', 'abaa', 'bb']
rejecting = ['abb', 'b']
    
my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)

assert all(my_dfa.label(x) for x in accepting)
assert all(not my_dfa.label(x) for x in rejecting)

Because words are sequences of arbitrary python objects, the identification problem, with a ↦ 0 and b ↦ 1, is given below:

accepting = [[0], [0, 'z', 0, 0], ['z', 'z']]
rejecting = [[0, 'z', 'z'], ['z']]

my_dfa = find_dfa(accepting=accepting, rejecting=rejecting)

Active learning

There are also active variants of find_dfa and find_dfas called find_dfa_active and find_dfas_active resp.

An example from the unit tests:

import dfa
from dfa_identify import find_dfa_active


def oracle(word):
    return sum(word) % 2 == 0

lang = find_dfa_active(alphabet=[0, 1],
                       oracle=oracle,
                       n_queries=20)
assert lang == dfa.DFA(
    inputs=[0,1],
    label=lambda s: s,
    transition=lambda s, c: s ^ bool(c),
    start=True
)

# Can also send in positive and negative examples:
lang = find_dfa_active(alphabet=[0, 1],
                       positive=[(0,), (0,0)],
                       negative=[(1,), (1,0)],
                       oracle=oracle,
                       n_queries=20)

Learning Decomposed DFAs

The is also support for learning decomposed DFAs following, Learning Deterministic Finite Automata Decompositions from Examples and Demonstrations. FMCAD`22. These are tuples of DFAs whose labels are combined to determine the acceptence / rejection of string, e.g., via conjunction or disjunction.

Similar to learning a monolithic dfa, this functionality can be accessed using find_decomposed_dfas. For example:

from dfa_identify import find_decomposed_dfas
accepting = ['y', 'yy', 'gy', 'bgy', 'bbgy', 'bggy']
rejecting = ['', 'r', 'ry', 'by', 'yr', 'gr', 'rr', 'rry', 'rygy']

# --------------------------------------
# 1. Learn a disjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
                                rejecting=rejecting,
                                n_dfas=2,
                                order_by_stutter=True,
                                decompose_via="disjunction")
dfas = next(gen_dfas)

monolithic = dfas[0] | dfas[1]  # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)

# Each dfa must reject a rejecting string.
assert all(all(~d.label(w) for d in dfas) for w in rejecting)
# At least one dfa must accept an accepting string.
assert all(any(d.label(w) for d in dfas) for w in accepting)

# --------------------------------------
# 2. Learn a conjunctive decomposition.
# --------------------------------------
gen_dfas = find_decomposed_dfas(accepting=accepting,
                                rejecting=rejecting,
                                n_dfas=2,
                                order_by_stutter=True,
                                decompose_via="conjunction")
dfas = next(gen_dfas)

monolithic = dfas[0] & dfas[1]  # Build DFA that is the union of languages.
assert all(monolithic.label(w) for w in accepting)
assert not any(monolithic.label(w) for w in rejecting)

Minimality

There are two forms of "minimality" supported by dfa-identify.

  1. By default, dfa-identify returns DFAs that have the minimum number of states required to seperate the accepting and rejecting set.
  2. If the order_by_stutter flag is set to True, then the find_dfas (lazily) orders the DFAs so that the number of self loops (stuttering transitions) appearing the DFAs decreases. find_dfa thus returns a DFA with the most number of self loops given the minimal number of states.

Encoding

This library currently uses the encodings outlined in Heule, Marijn JH, and Sicco Verwer. "Exact DFA identification using SAT solvers." International Colloquium on Grammatical Inference. Springer, Berlin, Heidelberg, 2010. and Ulyantsev, Vladimir, Ilya Zakirzyanov, and Anatoly Shalyto. "Symmetry Breaking Predicates for SAT-based DFA Identification.".

The key difference is in the use of the symmetry breaking clauses. Two kinds are exposed.

  1. clique (Heule 2010): Partially breaks symmetries by analyzing conflict graph.
  2. bfs (Ulyantsev 2016): Breaks all symmetries so that each model corresponds to a unique DFA.

Goals and related libraries

There are many other python libraries that perform DFA and other automata inference.

  1. DFA-Inductor-py - State of the art passive inference via reduction to SAT (as of 2019).
  2. z3gi: Uses SMT backed passive learning algorithm.
  3. lstar: Active learning algorithm based L* derivative.

The primary goal of this library is to loosely track the state of the art in passive SAT based inference while providing a simple implementation and API.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dfa_identify-3.13.0.tar.gz (17.9 kB view details)

Uploaded Source

Built Distribution

dfa_identify-3.13.0-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file dfa_identify-3.13.0.tar.gz.

File metadata

  • Download URL: dfa_identify-3.13.0.tar.gz
  • Upload date:
  • Size: 17.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.8 Linux/6.6.19

File hashes

Hashes for dfa_identify-3.13.0.tar.gz
Algorithm Hash digest
SHA256 8beec037b81e6c0f40c67fbf1c565dd0a3d4373300315eccee032744e0e6e13c
MD5 5c6644c435883e17640dc60154890221
BLAKE2b-256 2a3da6ff9c528291772412fc55abfa793cac5500aba741de2a62c6033287b4e2

See more details on using hashes here.

File details

Details for the file dfa_identify-3.13.0-py3-none-any.whl.

File metadata

  • Download URL: dfa_identify-3.13.0-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.11.8 Linux/6.6.19

File hashes

Hashes for dfa_identify-3.13.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4e701e25782d87ccf9c0d68f7d4bdac0f123d2bbed4c8dee1c0a2c706f5f6038
MD5 871aa4b7d0f36b820898c3bb3a9b835d
BLAKE2b-256 7770022ca336f0a029ddb2525fe3de5f3f3cfdcf7a4800872d8c42d26f3b5566

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page