Skip to main content

Epsilon Machine Inference & Characterization - A framework for computational mechanics

Project description

emic

CI PyPI Docs Coverage License: MIT Python 3.11+ DOI

Epsilon Machine Inference & Characterization

A Python framework for constructing and analyzing epsilon-machines based on computational mechanics.

📚 Documentation | 🚀 Getting Started

What is an Epsilon-Machine?

An epsilon-machine (ε-machine) is the minimal, optimal predictor of a stochastic process. Introduced by James Crutchfield and collaborators, ε-machines capture the intrinsic computational structure hidden in sequential data.

Key concepts:

  • Causal states: Equivalence classes of histories that yield identical predictions
  • Statistical complexity (Cμ): The entropy of the causal state distribution — a measure of structural complexity
  • Entropy rate (hμ): The irreducible randomness in the process

ε-machines reveal the emic structure of a process — the computational organization that exists within the system itself, not imposed from outside.

Features

  • 🔮 Inference: Reconstruct ε-machines using multiple algorithms (CSSR, CSM, BSI, Spectral, NSD)
  • 📊 Analysis: Compute complexity measures (Cμ, hμ, excess entropy E, crypticity χ)
  • 🎲 Sources: Built-in stochastic process generators (Golden Mean, Even Process, Biased Coin, Periodic) with noise transforms (BitFlipNoise)
  • 🔗 Pipeline: Composable >> operator for source → inference → analysis workflows
  • 🧪 Experiments: CLI and framework for reproducible algorithm benchmarking
  • 📈 Visualization: State diagram rendering with Graphviz
  • 📝 Export: LaTeX tables, TikZ diagrams, DOT, Mermaid, and JSON formats
  • 🧩 Extensible: Protocol-based architecture for custom algorithms and sources

Installation

pip install emic

Or install from source with uv:

git clone https://github.com/johnazariah/emic.git
cd emic
uv sync --dev

Quick Start

from emic.sources import GoldenMeanSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze

# Generate data from the Golden Mean process (no consecutive 1s)
source = GoldenMeanSource(p=0.5, _seed=42)
data = TakeN(10_000)(source)

# Infer the epsilon-machine using CSSR
config = CSSRConfig(max_history=5, significance=0.001)
result = CSSR(config).infer(data)

# Analyze the inferred machine
summary = analyze(result.machine)
print(f"States: {len(result.machine.states)}")
print(f"Statistical Complexity: Cμ = {summary.statistical_complexity:.4f}")
print(f"Entropy Rate: hμ = {summary.entropy_rate:.4f}")

Pipeline Composition

Chain operations using the >> operator:

from emic.sources import GoldenMeanSource, TakeN
from emic.inference import CSSR, CSSRConfig
from emic.analysis import analyze

# Compose source and transforms
source = GoldenMeanSource(p=0.5, _seed=42)
data = source >> TakeN(10_000)

# Infer and analyze
config = CSSRConfig(max_history=5, significance=0.001)
result = CSSR(config).infer(data)
summary = analyze(result.machine)

print(summary)

Built-in Sources

Process Description True States
Golden Mean No consecutive 1s allowed 2
Even Process Even number of 1s between 0s 2
Biased Coin i.i.d. Bernoulli process 1
Periodic Deterministic repeating pattern n (period length)

Experiments

Run reproducible experiments to evaluate algorithm performance:

# Run all experiments with parallel execution
emic-experiment --all --parallel 4

# Quick mode for development
emic-experiment --quick

# List available experiments
emic-experiment --list

Algorithm Accuracy (January 2026)

Algorithm State Count Accuracy Cμ Error
Spectral 85% (100% at N≥10K) 0.15
CSSR 82% 0.05
NSD 73% 0.12
CSM 39% 0.10
BSI 32% 0.53

See the Experiments Guide for full details.

Project Status

Core implementation complete — The framework is functional with:

  • Multiple inference algorithms: CSSR, CSM, BSI, Spectral, NSD
  • Full analysis suite (Cμ, hμ, excess entropy E, crypticity χ)
  • Synthetic and empirical data sources with noise transforms
  • Pipeline composition
  • 429 tests with 82%+ coverage
  • Deep dive documentation: CSSR, Spectral Learning, Complexity Measures, Working with Real Data

📚 Full documentation available

Testing

All 429 tests are catalogued in the Testing Register, each with a plain English intent and classified by kind:

Kind Count What it verifies
Fact 280 Deterministic structural truths — immutability, validation, construction
Theory 73 Mathematical relationships from computational mechanics — Cμ, hμ, E, χ values
Property 30 Invariants across inputs — reproducibility, algorithm agreement, stochastic validity

Test categories:

  • Unit tests — Types, analysis measures, all 5 inference algorithms, sources, transforms, output formats
  • Golden tests — Algorithms verified against analytically known ε-machines (Golden Mean, Even Process, Biased Coin, Periodic)
  • Integration tests — Pipeline composition from source through inference to analysis
  • Machine invariant tests — Every algorithm's output validated for stochastic correctness (transition sums ≤ 1.0)

Pre-commit hooks enforce that the testing register is updated whenever tests change.

Etymology

The name emic works on multiple levels:

  1. Acronym: Epsilon Machine Inference & Characterization
  2. Linguistic: In linguistics/anthropology, emic refers to analysis from within the system — understanding structure on its own terms. This resonates with computational mechanics: ε-machines reveal the intrinsic structure of a process.
  3. Phonetic: Pronounced "EE-mik" or "EH-mic" — a nod to "ε-machine"

References

Contributing

Contributions are welcome! See the Contributing Guide for details.

License

MIT License — see LICENSE for details.

Author

John Azariah (@johnazariah)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

emic-0.5.4.tar.gz (473.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

emic-0.5.4-py3-none-any.whl (100.5 kB view details)

Uploaded Python 3

File details

Details for the file emic-0.5.4.tar.gz.

File metadata

  • Download URL: emic-0.5.4.tar.gz
  • Upload date:
  • Size: 473.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for emic-0.5.4.tar.gz
Algorithm Hash digest
SHA256 fdd3e9bda229123d4f01e54d6ceea0a002be3ba4e3366f9cf7e7e7cb6140e3fa
MD5 b3126374b84232fb47efbd5c1e87eb0e
BLAKE2b-256 2623b8af0e35267cefff49530c2820a566a82b8ed35b2dd1fd970281d4701d42

See more details on using hashes here.

Provenance

The following attestation bundles were made for emic-0.5.4.tar.gz:

Publisher: release.yml on johnazariah/emic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file emic-0.5.4-py3-none-any.whl.

File metadata

  • Download URL: emic-0.5.4-py3-none-any.whl
  • Upload date:
  • Size: 100.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for emic-0.5.4-py3-none-any.whl
Algorithm Hash digest
SHA256 01865fe1dc7cecda6ecb77ac7afc8fb08f5de9ee5fd3188ee1bcf4c6f99783bf
MD5 0cff9905a9a51ee996aa0cbcc9620366
BLAKE2b-256 51186b80b4cbfa9c1235e211d104560b4152fdb333322b7318f6a76aeab5a1fa

See more details on using hashes here.

Provenance

The following attestation bundles were made for emic-0.5.4-py3-none-any.whl:

Publisher: release.yml on johnazariah/emic

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page