Skip to main content

Python bindings for ChemFST - a high-performance chemical name search library

Project description

ChemFST

PyPI version License: MIT Docs

Python bindings for ChemFST: a high-performance chemical name search library using Finite State Transducers (FSTs).

Features

  • Memory-efficient indexing using Finite State Transducers
  • Extremely fast prefix searches for autocomplete functionality
  • Case-insensitive substring searches for finding chemical names
  • Memory-mapped file access for optimal performance
  • Native Rust implementation with Python bindings

Installation

pip install chemfst

Requires Python 3.11 or higher.

Quick Start

from chemfst import ChemicalFST, build_fst

# Build an FST index from a list of chemical names (required - not distributed)
# Note: The .fst file is generated and not included in the package
build_fst("data/chemical_names.txt", "data/chemical_names.fst")

# Load the FST for searching
fst = ChemicalFST("data/chemical_names.fst")

# Prefix search (autocomplete)
matches = fst.prefix_search("acet", max_results=10)
print(f"Chemicals starting with 'acet': {matches}")

# Substring search
matches = fst.substring_search("benz", max_results=10)
print(f"Chemicals containing 'benz': {matches}")

Input Format

The input file should contain one chemical name per line:

acetone
benzene
methanol
ethanol
...

API Reference

build_fst(input_path, output_path)

Create an FST index from a list of chemical names in a text file. The resulting .fst file is generated and not distributed with the package.

  • input_path: Path to text file containing chemical names (one per line)
  • output_path: Path where the FST index will be saved (not distributed with package)

ChemicalFST(fst_path)

Initialize a chemical name search engine.

  • fst_path: Path to the FST index file

Methods

  • prefix_search(prefix, max_results=100): Find chemical names starting with a specified prefix

  • substring_search(substring, max_results=100): Find chemical names containing a specified substring

Performance

ChemFST uses memory mapping and Finite State Transducers to achieve excellent performance:

  • Fast loading: The FST is memory-mapped, not fully loaded into memory
  • Low memory usage: Compact FST representation uses minimal RAM
  • Quick prefix searches: Typically < 1ms for prefix searches
  • Efficient substring searches: Faster than regex or database lookups

Building from Source

git clone https://github.com/username/ChemFST
cd ChemFST
pip install maturin
maturin develop

License

MIT

Credits

ChemFST is built using the fst Rust crate by BurntSushi for the Finite State Transducer implementation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

chemfst-0.1.2-cp311-abi3-win_amd64.whl (192.4 kB view details)

Uploaded CPython 3.11+Windows x86-64

chemfst-0.1.2-cp311-abi3-manylinux_2_34_x86_64.whl (331.9 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

chemfst-0.1.2-cp311-abi3-macosx_11_0_arm64.whl (290.5 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

File details

Details for the file chemfst-0.1.2-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: chemfst-0.1.2-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 192.4 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chemfst-0.1.2-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 d07729252650486174874f1a4d4763a4907da32cb02d3db94cb7a7eda64e3e25
MD5 21569346c470008aa57dbf953c24afb1
BLAKE2b-256 0ce6a09a587147c35b66532d059dc909ca12c9689b4837732b8ad98ab599a12d

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.1.2-cp311-abi3-win_amd64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.1.2-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for chemfst-0.1.2-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 e7317480b95eb5d609aa7959b95f021c070c9f7d2e9b75e686a1f8c0118da8bb
MD5 f371db9249430d6382a4cd095c646f75
BLAKE2b-256 a93e7fd71d0a6e30e56d9864e2f2edf52b57806b32f80cdacd9960fc2a7667c7

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.1.2-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.1.2-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chemfst-0.1.2-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 6a9020ac210b195659217a59b982aca30bc5cfa25b9b486f82e8eb17880e9b9e
MD5 08b5f78dcb452be63102de3525f756a2
BLAKE2b-256 b324152cc5dfee96c13ddc896b514d97181baad7609f04da6caf59a950081a60

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.1.2-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page