Fast, modern, and modular Persian text preprocessing library.
Project description
farsflow
farsflow is a small library for Persian text preprocessing, designed for practical NLP and LLM workflows.
🚀 Features
- Deterministic pipeline (same input -> same output)
- Safe character normalization
- Joiner (ZWNJ) fixes
- Whitespace & punctuation cleanup
- Modular processors (use only the components you need)
- Real‑world style sample tests
- Zero dependencies
📦 Installation
pip install farsflow
✨ Quick Start
import farsflow as ff
text = "سلام دنیا! این یك تست است که می نویسم ۴۵۶"
cleaned = ff.clean(text)
print(cleaned)
Expected output: سلام دنیا! این یک تست است که مینویسم 456
🧩 Pipeline Components
farsflow ships with a set of modular, composable components:
- Normalizer — safe character normalization
- JoinerFixer — fixes ZWNJ usage without over-correction
- SpaceCleaner — trims redundant whitespace and punctuation spacing
- Pipeline — orchestrates components in a deterministic order
You can customize the pipeline:
from farsflow import Pipeline, Normalizer, JoinerFixer
pipeline = Pipeline(
Normalizer(),
JoinerFixer(),
)
text = "متن تستی"
pipeline(text)
🧪 Testing
pytest
# or:
pytest path/to/test_file.py
🗺 Roadmap (v0.2.0)
- formalize the behavior of "ff.clean" as a safe and deterministic baseline
- add optional normalization controls (e.g. digit normalization)
- add optional noise-cleaning emoji and url processors
- introduce simple profiles (ff.llm.clean, ff.embedding.query, ff.embedding.index) built on top of the same core
- expand real-world test cases to ensure stable behavior across informal, mixed, and noisy Persian text
📄 License
MIT License — see LICENSE.
🤝 Contributing
Contributions are welcome.
Please open an issue or submit a pull request on GitHub.
📝 Changelog
See CHANGELOG for version history.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file farsflow-0.1.0.tar.gz.
File metadata
- Download URL: farsflow-0.1.0.tar.gz
- Upload date:
- Size: 8.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0e153362cc52c4f3c9e1e528227339b6f8a92556dbf789c659f5d6d845d1a4a0
|
|
| MD5 |
f3e735e5a81763525d8d182666f0e961
|
|
| BLAKE2b-256 |
50d6688e29e97aaeffa544e93aed8f1427ea77de10d6d15c2967a35c5c3c06b6
|
File details
Details for the file farsflow-0.1.0-py3-none-any.whl.
File metadata
- Download URL: farsflow-0.1.0-py3-none-any.whl
- Upload date:
- Size: 8.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
895b4ea4f6d8b1c94ae88dc7154a9b4699fbcc4cf8947ae849746cca06458970
|
|
| MD5 |
dd1a479c66dd0a038f86dd4e4dcde512
|
|
| BLAKE2b-256 |
59a5dfb19c1aeab8ed5c84f721ba997fac4bcf9b03027b8c413f6808118849cb
|