Python bindings for ChemFST - a high-performance chemical name search library
Project description
ChemFST Python
Python bindings for ChemFST: a high-performance chemical name search library using Finite State Transducers (FSTs).
Features
- Memory-efficient indexing using Finite State Transducers
- Extremely fast prefix searches for autocomplete functionality
- Case-insensitive substring searches for finding chemical names
- Memory-mapped file access for optimal performance
- Native Rust implementation with Python bindings
- Comprehensive logging integrated with Python's logging system
Installation
pip install chemfst
Requires Python 3.11 or higher.
Quick Start
from chemfst import ChemicalFST, build_fst
import logging
# Optional: Configure logging to see operation details
logging.basicConfig(level=logging.INFO, format='%(asctime)s [%(name)s] %(levelname)s: %(message)s')
# Build an FST index from chemical names (one name per line)
build_fst("data/chemical_names.txt", "data/chemical_names.fst")
# Load the FST for searching
fst = ChemicalFST("data/chemical_names.fst")
# Prefix search (autocomplete)
matches = fst.prefix_search("acet", max_results=10)
print(f"Chemicals starting with 'acet': {matches}")
# Substring search
matches = fst.substring_search("benz", max_results=10)
print(f"Chemicals containing 'benz': {matches}")
# Preload for better performance
count = fst.preload()
print(f"Preloaded {count} entries")
API Reference
build_fst(input_path, output_path)
Create an FST index from a text file containing chemical names (one per line).
ChemicalFST(fst_path)
Initialize a chemical name search engine from an FST file.
Methods:
prefix_search(prefix, max_results=100)- Find names starting with prefixsubstring_search(substring, max_results=100)- Find names containing substringpreload()- Load all data into memory for faster searches
Logging
ChemFST integrates with Python's standard logging module to provide detailed operation insights.
Basic Logging Setup
import logging
import chemfst
logging.basicConfig(level=logging.INFO)
# ChemFST operations will now generate log messages
Log Levels
- ERROR: File errors, operation failures
- INFO: Operation summaries, result counts, timing
- DEBUG: Detailed parameters, internal operations
Advanced Logging
# DEBUG level for development
logging.getLogger('chemfst').setLevel(logging.DEBUG)
# Custom formatting
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s [%(name)s] %(levelname)s: %(message)s',
filename='chemfst.log'
)
Example Log Output
2024-01-15 10:30:15 [chemfst] INFO: Building FST from input file: data/chemicals.txt
2024-01-15 10:30:15 [chemfst] INFO: Read 50000 chemical names from input file
2024-01-15 10:30:16 [chemfst] INFO: Successfully built FST with 50000 entries
2024-01-15 10:30:20 [chemfst] INFO: Prefix search for 'acet' found 3 results (checked 3 entries)
Performance
- Fast loading: Memory-mapped FST files, no full loading required
- Low memory usage: Compact FST representation
- Quick searches: Typically < 1ms for prefix searches
- Efficient substring searches: Faster than regex or database lookups
Performance logging available at DEBUG level for optimization.
Input Format
Chemical names file (one per line):
acetone
benzene
methanol
ethanol
Development
Building from Source
git clone https://github.com/username/ChemFST
cd ChemFST/chemfst-py
pip install maturin
maturin develop
Running Tests
python -m pytest python/tests/ -v
Examples
See python/examples/ for complete usage examples including logging configuration.
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file chemfst-0.2.0-cp311-abi3-win_amd64.whl.
File metadata
- Download URL: chemfst-0.2.0-cp311-abi3-win_amd64.whl
- Upload date:
- Size: 215.2 kB
- Tags: CPython 3.11+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41dee4c791e1be25ff36e048edb5b1eff7cb98b5d9a1d40450bd113c22013f22
|
|
| MD5 |
23987dcbf8766f18ad624bbbb113d7e3
|
|
| BLAKE2b-256 |
b24a043c969d9dd5662fec687d2b6c012140e2ceecf0eb17ba3bde861ab36fee
|
Provenance
The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-win_amd64.whl:
Publisher:
publish-pypi.yml on esrehmki/chemfst
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemfst-0.2.0-cp311-abi3-win_amd64.whl -
Subject digest:
41dee4c791e1be25ff36e048edb5b1eff7cb98b5d9a1d40450bd113c22013f22 - Sigstore transparency entry: 241125696
- Sigstore integration time:
-
Permalink:
esrehmki/chemfst@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/esrehmki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 357.7 kB
- Tags: CPython 3.11+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8588434783b2652ec11954b569ffe7ab7f2beed32000af837ca6465570fee366
|
|
| MD5 |
0424507bad0c904fd82b9c195fd24a70
|
|
| BLAKE2b-256 |
ab68016ab8cfb82253de029a204d6d1c2cdfcdf0ce68bf9870a328789f930a82
|
Provenance
The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl:
Publisher:
publish-pypi.yml on esrehmki/chemfst
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemfst-0.2.0-cp311-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
8588434783b2652ec11954b569ffe7ab7f2beed32000af837ca6465570fee366 - Sigstore transparency entry: 241125710
- Sigstore integration time:
-
Permalink:
esrehmki/chemfst@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/esrehmki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 358.5 kB
- Tags: CPython 3.11+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
225589b9a1425204afdaf976413a250429e24b04bca44a2be946b0498eeb99c1
|
|
| MD5 |
0a03964a4e2567f41c12138e1bac2631
|
|
| BLAKE2b-256 |
c1cf75d71606bc7b5150053adfa77f3096e717eb27cb6c83cea3eb9ba56a413f
|
Provenance
The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish-pypi.yml on esrehmki/chemfst
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemfst-0.2.0-cp311-abi3-manylinux2014_aarch64.manylinux_2_17_aarch64.whl -
Subject digest:
225589b9a1425204afdaf976413a250429e24b04bca44a2be946b0498eeb99c1 - Sigstore transparency entry: 241125693
- Sigstore integration time:
-
Permalink:
esrehmki/chemfst@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/esrehmki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 315.1 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f00ffdf150bf898bf3220d061222461c5003c18b786c0f715ea2bb8998ee5eba
|
|
| MD5 |
25c713ec55721f7e3d8e142e9d8f8b9b
|
|
| BLAKE2b-256 |
21df963bc5d589cec2d929640c10e488c89f5f60a76d86505bb8fce34e1bc6f1
|
Provenance
The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl:
Publisher:
publish-pypi.yml on esrehmki/chemfst
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemfst-0.2.0-cp311-abi3-macosx_11_0_arm64.whl -
Subject digest:
f00ffdf150bf898bf3220d061222461c5003c18b786c0f715ea2bb8998ee5eba - Sigstore transparency entry: 241125702
- Sigstore integration time:
-
Permalink:
esrehmki/chemfst@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/esrehmki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Trigger Event:
push
-
Statement type:
File details
Details for the file chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl.
File metadata
- Download URL: chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl
- Upload date:
- Size: 323.6 kB
- Tags: CPython 3.11+, macOS 10.12+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5c6d34ff23cee74a9e15f14c2434e54677612637e577a48b9025eee6086ddef8
|
|
| MD5 |
03e0763c879ac62fc39c34418e8072ef
|
|
| BLAKE2b-256 |
7d16ce9373e83571a48b06725a9379f056e70921a0edb1a0856df10cd49fc97f
|
Provenance
The following attestation bundles were made for chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl:
Publisher:
publish-pypi.yml on esrehmki/chemfst
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
chemfst-0.2.0-cp311-abi3-macosx_10_12_x86_64.whl -
Subject digest:
5c6d34ff23cee74a9e15f14c2434e54677612637e577a48b9025eee6086ddef8 - Sigstore transparency entry: 241125708
- Sigstore integration time:
-
Permalink:
esrehmki/chemfst@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Branch / Tag:
refs/tags/v0.2.0 - Owner: https://github.com/esrehmki
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@5551f0b653a7bc4a3c9f07d6c4adafcbd712a536 -
Trigger Event:
push
-
Statement type: