Skip to main content

Python port of Fidel Tools - Amharic language preprocessing toolkit

Project description

fidel-tools

Python port of Fidel Tools - Amharic language preprocessing toolkit.

License PyPI Version


Overview

fidel-tools is the Python wrapper package for the Fidel Tools NLP suite. It wraps the core native Rust library (core-native) compiled via maturin's PyO3 bindings, bringing near-native speed and performance to Amharic preprocessing in Python. It includes a symmetrical API (matching JS), standard snake_case pythonic mappings, and a spaCy compatible tokenizer.


Features

  • High-Performance Rust Core: Employs PyO3 native extensions for fast homophone mapping, labialized string expansion, and gemination collapsing.
  • Pythonic & JS Symmetrical APIs: Exposes both pythonic (snake_case) and JavaScript (camelCase) signatures on the Pipeline class.
  • spaCy Integration: Easy integration of tokenizers into spaCy language pipelines.
  • Stopword & Stemmer Engines: Morphology-aware boundary stopword filtering and light stemming.
  • Transliteration & Term Indexer: SERA/Felig ASCII schemes and TF-IDF document/query indexers.

Installation

pip install fidel-tools

Quick Start

Basic Pipeline Usage

import fidel_tools as fidel

# 1. Load the pre-configured Amharic language pack
am_pack = fidel.get_amharic_pack()

# 2. Instantiate the pipeline
nlp = fidel.Pipeline(am_pack)

# 3. Perform pre-processing operations
normalized = nlp.normalize("ሐኪም ኀይሉ በልቷልልል!")
cleaned = nlp.remove_stopwords("ያወጣውን የተጨማሪ እሴት")
stemmed = nlp.stem("ልጆቻቸውን")

print(normalized)  # "ሃኪም ሃይሉ በልቱዋልል!"

spaCy Tokenizer Integration

import spacy
from fidel_tools import Pipeline, get_spacy_tokenizer, get_amharic_pack

# Create spaCy model
nlp = spacy.blank("am")

# Configure tokenizer
pipeline = Pipeline(get_amharic_pack())
nlp.tokenizer = get_spacy_tokenizer(nlp, pipeline)

# Process text
doc = nlp("ይህ የመጀመሪያው ዓረፍተ ነገር ነው።")
print([token.text for token in doc])

API Reference

Pipeline Methods

Supports both camelCase (symmetrical to JS) and snake_case signatures.

  • normalize(text: str) -> str: Normalizes characters and collapses geminations.
  • sentence_tokenize(text: str) -> list: Tokenizes text into sentences.
  • stem(word: str) -> str: Extracts the base form of a word.
  • remove_stopwords(corpus: str) -> str: Removes stopwords.
  • text_analyze(corpus: str) -> str: Expands abbreviations and strips punctuation/numbers.
  • felig_transliterate(word: str, lang: str) -> str: Felig transliteration.
  • sera_transliterate(word: str, lang: str) -> str: SERA transliteration.
  • index_documents(docs: list) -> dict: Indexes document dictionaries.
  • index_query(query: str) -> dict: Indexes a single query string.
  • weigh_terms(index: dict, type_of_index: str) -> dict: Calculates TF-IDF weights.

License

Licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

fidel_tools-0.1.9-cp38-abi3-win_amd64.whl (156.5 kB view details)

Uploaded CPython 3.8+Windows x86-64

fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl (293.3 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.34+ x86-64

fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (289.4 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ ARM64

fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl (488.8 kB view details)

Uploaded CPython 3.8+macOS 10.12+ universal2 (ARM64, x86-64)macOS 10.12+ x86-64macOS 11.0+ ARM64

File details

Details for the file fidel_tools-0.1.9-cp38-abi3-win_amd64.whl.

File metadata

  • Download URL: fidel_tools-0.1.9-cp38-abi3-win_amd64.whl
  • Upload date:
  • Size: 156.5 kB
  • Tags: CPython 3.8+, Windows x86-64
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fidel_tools-0.1.9-cp38-abi3-win_amd64.whl
Algorithm Hash digest
SHA256 098d3ed016855dc7e7d1c8ab9cf262a66e9aadb9ecc300e6c2d01533f5e692c3
MD5 e700b3bc5ab263757e24ae048bf90fc6
BLAKE2b-256 7ba53b7b2f543cf7095e2d391c9250957fb337cb65a0db1bb58e26285e0aa18c

See more details on using hashes here.

Provenance

The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-win_amd64.whl:

Publisher: publish-npm.yml on Yehonatal/fidel-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 a0efc5c8cca7c9231c3a2536ff8558d393df6c273a2a2a17235e422e70369c2e
MD5 94eed943d3a8bdd7b64ca32937f90234
BLAKE2b-256 a76c1cf70ada60ee355aec432310107ae3a0982cddf52a9221a87e790e646ffb

See more details on using hashes here.

Provenance

The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-manylinux_2_34_x86_64.whl:

Publisher: publish-npm.yml on Yehonatal/fidel-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 61d7dc1721faf8580df78bc89420f057de0c27e026b579b7b2ef988e0f0c3ddd
MD5 9f12319b197d6ec4c03ea1f777656241
BLAKE2b-256 2a0cc5e42cf3233c266f5d687ff4416bf7f84372845c39bf3fb2b84952175297

See more details on using hashes here.

Provenance

The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl:

Publisher: publish-npm.yml on Yehonatal/fidel-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl.

File metadata

File hashes

Hashes for fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl
Algorithm Hash digest
SHA256 11bf62b49ce77edc7c8f976516e56665363a90ffe56076ee228d72c97b779f0b
MD5 5dd644281557a6e8c7b72eb2081fc705
BLAKE2b-256 a7acb1b6c5aebcf72cb8404208c691f0b23dfb797b0bada496e1cc9f7d086b38

See more details on using hashes here.

Provenance

The following attestation bundles were made for fidel_tools-0.1.9-cp38-abi3-macosx_10_12_x86_64.macosx_11_0_arm64.macosx_10_12_universal2.whl:

Publisher: publish-npm.yml on Yehonatal/fidel-tools

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page