Python port of Fidel Tools - Amharic language preprocessing toolkit
Project description
fidel-tools
Python port of Fidel Tools - Amharic language preprocessing toolkit.
Overview
fidel-tools is the Python wrapper package for the Fidel Tools NLP suite. It wraps the core native Rust library (core-native) compiled via maturin's PyO3 bindings, bringing near-native speed and performance to Amharic preprocessing in Python. It includes a symmetrical API (matching JS), standard snake_case pythonic mappings, and a spaCy compatible tokenizer.
Features
- High-Performance Rust Core: Employs PyO3 native extensions for fast homophone mapping, labialized string expansion, and gemination collapsing.
- Pythonic & JS Symmetrical APIs: Exposes both pythonic (snake_case) and JavaScript (camelCase) signatures on the Pipeline class.
- spaCy Integration: Easy integration of tokenizers into spaCy language pipelines.
- Stopword & Stemmer Engines: Morphology-aware boundary stopword filtering and light stemming.
- Transliteration & Term Indexer: SERA/Felig ASCII schemes and TF-IDF document/query indexers.
Installation
pip install fidel-tools
Quick Start
Basic Pipeline Usage
import fidel_tools as fidel
# 1. Load the pre-configured Amharic language pack
am_pack = fidel.get_amharic_pack()
# 2. Instantiate the pipeline
nlp = fidel.Pipeline(am_pack)
# 3. Perform pre-processing operations
normalized = nlp.normalize("ሐኪም ኀይሉ በልቷልልል!")
cleaned = nlp.remove_stopwords("ያወጣውን የተጨማሪ እሴት")
stemmed = nlp.stem("ልጆቻቸውን")
print(normalized) # "ሃኪም ሃይሉ በልቱዋልል!"
spaCy Tokenizer Integration
import spacy
from fidel_tools import Pipeline, get_spacy_tokenizer, get_amharic_pack
# Create spaCy model
nlp = spacy.blank("am")
# Configure tokenizer
pipeline = Pipeline(get_amharic_pack())
nlp.tokenizer = get_spacy_tokenizer(nlp, pipeline)
# Process text
doc = nlp("ይህ የመጀመሪያው ዓረፍተ ነገር ነው።")
print([token.text for token in doc])
API Reference
Pipeline Methods
Supports both camelCase (symmetrical to JS) and snake_case signatures.
normalize(text: str) -> str: Normalizes characters and collapses geminations.sentence_tokenize(text: str) -> list: Tokenizes text into sentences.stem(word: str) -> str: Extracts the base form of a word.remove_stopwords(corpus: str) -> str: Removes stopwords.text_analyze(corpus: str) -> str: Expands abbreviations and strips punctuation/numbers.felig_transliterate(word: str, lang: str) -> str: Felig transliteration.sera_transliterate(word: str, lang: str) -> str: SERA transliteration.index_documents(docs: list) -> dict: Indexes document dictionaries.index_query(query: str) -> dict: Indexes a single query string.weigh_terms(index: dict, type_of_index: str) -> dict: Calculates TF-IDF weights.
License
Licensed under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fidel_tools-0.1.9-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: fidel_tools-0.1.9-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 156.5 kB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
098d3ed016855dc7e7d1c8ab9cf262a66e9aadb9ecc300e6c2d01533f5e692c3
|
|
| MD5 |
e700b3bc5ab263757e24ae048bf90fc6
|
|
| BLAKE2b-256 |
7ba53b7b2f543cf7095e2d391c9250957fb337cb65a0db1bb58e26285e0aa18c
|
Provenance
The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-win_amd64.whl:
Publisher:
publish-npm.yml on Yehonatal/fidel-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fidel_tools-0.1.9-cp38-abi3-win_amd64.whl -
Subject digest:
098d3ed016855dc7e7d1c8ab9cf262a66e9aadb9ecc300e6c2d01533f5e692c3 - Sigstore transparency entry: 2029253547
- Sigstore integration time:
-
Permalink:
Yehonatal/fidel-tools@4b910ce882bb899cc9358f593045c2ee1c92758f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Yehonatal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-npm.yml@4b910ce882bb899cc9358f593045c2ee1c92758f -
Trigger Event:
push
-
Statement type:
File details
Details for the file fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl.
File metadata
- Download URL: fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl
- Upload date:
- Size: 293.3 kB
- Tags: CPython 3.8+, manylinux: glibc 2.34+ x86-64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0efc5c8cca7c9231c3a2536ff8558d393df6c273a2a2a17235e422e70369c2e
|
|
| MD5 |
94eed943d3a8bdd7b64ca32937f90234
|
|
| BLAKE2b-256 |
a76c1cf70ada60ee355aec432310107ae3a0982cddf52a9221a87e790e646ffb
|
Provenance
The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl:
Publisher:
publish-npm.yml on Yehonatal/fidel-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl -
Subject digest:
a0efc5c8cca7c9231c3a2536ff8558d393df6c273a2a2a17235e422e70369c2e - Sigstore transparency entry: 2029253699
- Sigstore integration time:
-
Permalink:
Yehonatal/fidel-tools@4b910ce882bb899cc9358f593045c2ee1c92758f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Yehonatal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-npm.yml@4b910ce882bb899cc9358f593045c2ee1c92758f -
Trigger Event:
push
-
Statement type:
File details
Details for the file fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 289.4 kB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
61d7dc1721faf8580df78bc89420f057de0c27e026b579b7b2ef988e0f0c3ddd
|
|
| MD5 |
9f12319b197d6ec4c03ea1f777656241
|
|
| BLAKE2b-256 |
2a0cc5e42cf3233c266f5d687ff4416bf7f84372845c39bf3fb2b84952175297
|
Provenance
The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:
Publisher:
publish-npm.yml on Yehonatal/fidel-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl -
Subject digest:
61d7dc1721faf8580df78bc89420f057de0c27e026b579b7b2ef988e0f0c3ddd - Sigstore transparency entry: 2029253880
- Sigstore integration time:
-
Permalink:
Yehonatal/fidel-tools@4b910ce882bb899cc9358f593045c2ee1c92758f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Yehonatal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-npm.yml@4b910ce882bb899cc9358f593045c2ee1c92758f -
Trigger Event:
push
-
Statement type:
File details
Details for the file fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.
File metadata
- Download URL: fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
- Upload date:
- Size: 488.8 kB
- Tags: CPython 3.8+, macOS 10.12+ universal2 (ARM64, x86-64), macOS 10.12+ x86-64, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
11bf62b49ce77edc7c8f976516e56665363a90ffe56076ee228d72c97b779f0b
|
|
| MD5 |
5dd644281557a6e8c7b72eb2081fc705
|
|
| BLAKE2b-256 |
a7acb1b6c5aebcf72cb8404208c691f0b23dfb797b0bada496e1cc9f7d086b38
|
Provenance
The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:
Publisher:
publish-npm.yml on Yehonatal/fidel-tools
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl -
Subject digest:
11bf62b49ce77edc7c8f976516e56665363a90ffe56076ee228d72c97b779f0b - Sigstore transparency entry: 2029253975
- Sigstore integration time:
-
Permalink:
Yehonatal/fidel-tools@4b910ce882bb899cc9358f593045c2ee1c92758f -
Branch / Tag:
refs/heads/main - Owner: https://github.com/Yehonatal
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-npm.yml@4b910ce882bb899cc9358f593045c2ee1c92758f -
Trigger Event:
push
-
Statement type: