Skip to main content

A high-performance Rust library for weighted finite-state transducers with Python bindings

Project description

ArcWeight

Crates.io Documentation Build Status License: Apache 2.0 DOI

A high-performance Rust library for weighted finite-state transducers with comprehensive semiring support.

ArcWeight provides efficient algorithms for constructing, combining, and optimizing weighted finite-state transducers (WFSTs), making it suitable for natural language processing, speech recognition, and computational linguistics applications.

Features

  • Core FST Operations: Composition, determinization, minimization, closure, union, concatenation
  • Advanced Algorithms: Shortest path, weight pushing, epsilon removal, pruning, synchronization
  • Rich Semiring Support: Tropical, log, probability, boolean, integer, product, and Gallic weights
  • Multiple FST Implementations: Vector-based, constant, compact, lazy evaluation, and cached
  • Type-Safe Design: Zero-cost abstractions with trait-based polymorphism
  • OpenFST Compatible: Read and write OpenFST format files
  • Python Bindings: Full-featured Python API via PyO3 for easy integration
  • Pure Rust: Memory-safe implementation with no C++ dependencies
  • Parallel Processing: Optional Rayon-based parallelization for large FSTs

Quick Start

Add ArcWeight to your Cargo.toml:

[dependencies]
arcweight = "0.2"

Basic Example

use arcweight::prelude::*;

fn main() -> Result<(), Box<dyn std::error::Error>> {
    // Create a simple FST
    let mut fst = VectorFst::<TropicalWeight>::new();

    // Add states
    let s0 = fst.add_state();
    let s1 = fst.add_state();
    let s2 = fst.add_state();

    // Set start and final states
    fst.set_start(s0);
    fst.set_final(s2, TropicalWeight::one());

    // Add arcs
    fst.add_arc(s0, Arc::new(1, 1, TropicalWeight::one(), s1));
    fst.add_arc(s1, Arc::new(2, 2, TropicalWeight::one(), s2));

    // Perform operations
    let minimized = minimize(&fst)?;

    println!("Original states: {}", fst.num_states());
    println!("Minimized states: {}", minimized.num_states());

    Ok(())
}

Python Bindings

ArcWeight also provides Python bindings for easy integration into Python projects:

pip install arcweight
import arcweight

# Create a new FST
fst = arcweight.VectorFst()

# Add states
s0 = fst.add_state()
s1 = fst.add_state()

# Set start state
fst.set_start(s0)

# Add an arc: from s0 to s1, input=1, output=1, weight=1.0
fst.add_arc(s0, 1, 1, 1.0, s1)

# Set final state
fst.set_final(s1, 0.5)

# Perform operations
minimized = arcweight.minimize(fst)
composed = arcweight.compose(fst1, fst2)

The Python API provides full access to all FST operations and algorithms. See the Python bindings documentation for more details.

Examples

ArcWeight includes comprehensive examples demonstrating real-world applications:

# String edit distance
cargo run --example edit_distance

# Spell checking and correction
cargo run --example spell_checking

# Morphological analysis
cargo run --example morphological_analyzer

# Phonological rules
cargo run --example phonological_rules

# Text normalization
cargo run --example number_date_normalizer

See the examples/ directory for complete implementations with detailed explanations.

Documentation

Minimum Supported Rust Version (MSRV)

ArcWeight requires Rust 1.85.0 or later.

The MSRV is explicitly tested in CI and will only be increased in minor version updates. When the MSRV is increased, the previous two stable releases will still be supported for six months.

Performance

ArcWeight is designed for high performance:

  • Zero-copy arc iteration minimizes allocations
  • Cache-friendly data structures optimize memory access
  • Optional parallel algorithms leverage multi-core processors
  • Automatic algorithm selection based on FST properties

Run benchmarks on your system:

cargo bench

Contributing

We welcome contributions! Please see CONTRIBUTING.md for guidelines.

Quick checklist:

  • Follow existing code style (run cargo fmt)
  • Add tests for new functionality (run cargo test)
  • Update documentation for public APIs (run cargo doc)
  • Ensure all CI checks pass (run cargo clippy)

Getting Help

License

Licensed under the Apache License, Version 2.0. See LICENSE for details.

Citation

If you use ArcWeight in your research, please cite:

@software{arcweight,
  author = {White, Aaron Steven},
  title = {ArcWeight: A Rust Library for Weighted Finite-State Transducers},
  url = {https://github.com/aaronstevenwhite/arcweight},
  doi = {10.5281/zenodo.17371992},
  year = {2025}
}

References

ArcWeight implements algorithms based on:

Acknowledgments

This library was architected and implemented with the help of Claude Code.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

arcweight-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (630.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

arcweight-0.2.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (1.2 MB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file arcweight-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for arcweight-0.2.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2bc2d9ab733244ee8650bbdba104fea16cac1238236f99ae5469102757b88e36
MD5 4b9c55b94d0a6b6184896d7789ad1ee2
BLAKE2b-256 0dbc21f2fe8de03e64e8736003334e485e90f352b2c786c075afad7e4aed5771

See more details on using hashes here.

File details

Details for the file arcweight-0.2.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for arcweight-0.2.0-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 4e2b300a81e1dfd2fefa38707cb5ac8c715b82437b233b3c74d20e96d6e1b294
MD5 2c2216becba74b9467153f50132db5e6
BLAKE2b-256 bdc10de09dd40b572eebd5b44f23e949335183f849b0cb5ccf72415e7e326815

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page