Skip to main content

Python bindings for ChemFST - a high-performance chemical name search library

Project description

ChemFST Python

Python CI PyPI version License: MIT

Python bindings for ChemFST: a high-performance chemical name search library using Finite State Transducers (FSTs).

Features

  • Memory-efficient indexing using Finite State Transducers
  • Extremely fast prefix searches for autocomplete functionality
  • Case-insensitive substring searches for finding chemical names
  • Memory-mapped file access for optimal performance
  • Native Rust implementation with Python bindings
  • Comprehensive logging integrated with Python's logging system

Installation

pip install chemfst

Requires Python 3.11 or higher.

Quick Start

from chemfst import ChemicalFST, build_fst
import logging

# Optional: Configure logging to see operation details
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(name)s] %(levelname)s: %(message)s')

# Build an FST index from chemical names (one name per line)
build_fst("data/chemical_names.txt", "data/chemical_names.fst")

# Load the FST for searching
fst = ChemicalFST("data/chemical_names.fst")

# Prefix search (autocomplete)
matches = fst.prefix_search("acet", max_results=10)
print(f"Chemicals starting with 'acet': {matches}")

# Substring search
matches = fst.substring_search("benz", max_results=10)
print(f"Chemicals containing 'benz': {matches}")

# Preload for better performance
count = fst.preload()
print(f"Preloaded {count} entries")

API Reference

build_fst(input_path, output_path)

Create an FST index from a text file containing chemical names (one per line).

ChemicalFST(fst_path)

Initialize a chemical name search engine from an FST file.

Methods:

  • prefix_search(prefix, max_results=100) - Find names starting with prefix
  • substring_search(substring, max_results=100) - Find names containing substring
  • preload() - Load all data into memory for faster searches

Logging

ChemFST integrates with Python's standard logging module to provide detailed operation insights.

Basic Logging Setup

import logging
import chemfst

logging.basicConfig(level=logging.INFO)
# ChemFST operations will now generate log messages

Log Levels

  • ERROR: File errors, operation failures
  • INFO: Operation summaries, result counts, timing
  • DEBUG: Detailed parameters, internal operations

Advanced Logging

# DEBUG level for development
logging.getLogger('chemfst').setLevel(logging.DEBUG)

# Custom formatting
logging.basicConfig(
    level=logging.INFO,
    format='%(asctime)s [%(name)s] %(levelname)s: %(message)s',
    filename='chemfst.log'
)

Example Log Output

2024-01-15 10:30:15 [chemfst] INFO: Building FST from input file: data/chemicals.txt
2024-01-15 10:30:15 [chemfst] INFO: Read 50000 chemical names from input file
2024-01-15 10:30:16 [chemfst] INFO: Successfully built FST with 50000 entries
2024-01-15 10:30:20 [chemfst] INFO: Prefix search for 'acet' found 3 results (checked 3 entries)

Performance

  • Fast loading: Memory-mapped FST files, no full loading required
  • Low memory usage: Compact FST representation
  • Quick searches: Typically < 1ms for prefix searches
  • Efficient substring searches: Faster than regex or database lookups

Performance logging available at DEBUG level for optimization.

Input Format

Chemical names file (one per line):

acetone
benzene
methanol
ethanol

Development

Building from Source

git clone https://github.com/username/ChemFST
cd ChemFST/chemfst-py
pip install maturin
maturin develop

Running Tests

python -m pytest python/tests/ -v

Examples

See python/examples/ for complete usage examples including logging configuration.

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

chemfst-0.2.0-cp311-abi3-win_amd64.whl (215.2 kB view details)

Uploaded CPython 3.11+Windows x86-64

chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl (357.7 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.34+ x86-64

chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (358.5 kB view details)

Uploaded CPython 3.11+manylinux: glibc 2.17+ ARM64

chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl (315.1 kB view details)

Uploaded CPython 3.11+macOS 11.0+ ARM64

chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl (323.6 kB view details)

Uploaded CPython 3.11+macOS 10.12+ x86-64

File details

Details for the file chemfst-0.2.0-cp311-abi3-win_amd64.whl.

File metadata

  • Download URL: chemfst-0.2.0-cp311-abi3-win_amd64.whl
  • Upload date:
  • Size: 215.2 kB
  • Tags: CPython 3.11+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for chemfst-0.2.0-cp311-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 41dee4c791e1be25ff36e048edb5b1eff7cb98b5d9a1d40450bd113c22013f22
MD5 23987dcbf8766f18ad624bbbb113d7e3
BLAKE2b-256 b24a043c969d9dd5662fec687d2b6c012140e2ceecf0eb17ba3bde861ab36fee

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-win_amd64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 8588434783b2652ec11954b569ffe7ab7f2beed32000af837ca6465570fee366
MD5 0424507bad0c904fd82b9c195fd24a70
BLAKE2b-256 ab68016ab8cfb82253de029a204d6d1c2cdfcdf0ce68bf9870a328789f930a82

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 225589b9a1425204afdaf976413a250429e24b04bca44a2be946b0498eeb99c1
MD5 0a03964a4e2567f41c12138e1bac2631
BLAKE2b-256 c1cf75d71606bc7b5150053adfa77f3096e717eb27cb6c83cea3eb9ba56a413f

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 f00ffdf150bf898bf3220d061222461c5003c18b786c0f715ea2bb8998ee5eba
MD5 25c713ec55721f7e3d8e142e9d8f8b9b
BLAKE2b-256 21df963bc5d589cec2d929640c10e488c89f5f60a76d86505bb8fce34e1bc6f1

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl.

File metadata

File hashes

Hashes for chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl
Algorithm Hash digest
SHA256 5c6d34ff23cee74a9e15f14c2434e54677612637e577a48b9025eee6086ddef8
MD5 03e0763c879ac62fc39c34418e8072ef
BLAKE2b-256 7d16ce9373e83571a48b06725a9379f056e70921a0edb1a0856df10cd49fc97f

See more details on using hashes here.

Provenance

The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl:

Publisher: publish-pypi.yml on esrehmki/chemfst

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page