TransFuzzy is a robust transliteration system that bridges the gap between Indic scripts and the Latin alphabet.
Project description
TransFuzzy
TransFuzzy is a Python package for multilingual personal-name matching across Latin and several Indic scripts. It exposes the same matching pipeline through a CLI and a Flask API, and it supports switching between the bundled dataset and user-managed datasets.
What It Does
- Accepts names in Latin, Devanagari, Telugu, Tamil, Kannada, Malayalam, Gujarati, and Gurmukhi.
- Transliterates non-Latin input before matching.
- Scores candidate names with phonetic, edit-distance, and embedding-based features.
- Returns the best matches through
transfuzzy predictor the HTTP API. - Lets you upload, activate, list, and delete datasets without modifying package files.
Installation
From PyPI
pip install transfuzzy
Local development setup
uv sync
Python 3.11+ is required.
Runtime Notes
TransFuzzy currently loads the sentence-transformers/all-MiniLM-L6-v2 model during module import. On a fresh machine, the first CLI, API, or test run may download model files from Hugging Face before the command can complete.
That has two practical consequences:
- The first run can be noticeably slower than later runs.
- Offline or restricted-network environments can fail before the CLI help text, API startup, or tests finish loading.
Quick Start
Start the API server
transfuzzy
or
transfuzzy serve --port 3000
The Flask server listens on http://localhost:3000 and opens that URL in your default browser on startup.
Query from the CLI
transfuzzy predict "Rahul"
Limit results:
transfuzzy predict "Rahul" --top 5
Return JSON:
transfuzzy predict "Rahul" --json
Use a specific text dataset file directly:
transfuzzy predict "Rahul" --db .\names.txt --top 5 --json
Supported Input Scripts
Examples of valid input:
Rahul
राहुल
రాహుల్
The output is transliterated back to the original script when the input was converted from a supported Indic script.
CLI Reference
transfuzzy
Starts the API server on port 3000 and opens the browser automatically.
transfuzzy serve
Run the API server explicitly.
transfuzzy serve --port 3000
Use --no-browser to skip opening the browser:
transfuzzy serve --port 3000 --no-browser
transfuzzy predict
Find similar names for a single input.
transfuzzy predict <name> [--top N] [--json] [--db PATH]
Arguments:
<name>: required input string.--top: maximum number of matches to return. Default:10.--json: print a JSON object withsimilar_names.--db: use a dataset file path directly instead of the active managed dataset.
transfuzzy db
Manage datasets stored in the TransFuzzy home directory.
Add a dataset:
transfuzzy db add .\names.txt
List managed datasets:
transfuzzy db list
Set the active dataset:
transfuzzy db use names.txt
Delete a managed dataset:
transfuzzy db delete names.txt
API Reference
POST /similar_names
Request body:
{
"name": "Rahul"
}
Success response:
{
"similar_names": ["Rahul", "Raahul", "Rahool"]
}
Validation errors are returned as JSON with an error field and the appropriate HTTP status code.
POST /upload_db
Uploads a dataset file using multipart/form-data with the field name file.
Success response shape:
{
"message": "Dataset 'demo.txt' uploaded",
"dataset_name": "demo.txt",
"active_db": null
}
GET /list_dbs
Returns the stored managed datasets and the active dataset name.
{
"datasets": ["demo.txt"],
"active_db": "demo.txt"
}
POST /use_db
Request body:
{
"name": "demo.txt"
}
DELETE /delete_db
Request body:
{
"name": "demo.txt"
}
Dataset Management
There are two ways to provide names:
- Pass a file path directly with
--db. - Store datasets with
transfuzzy db ...or the dataset API routes and switch the active dataset.
Managed datasets are stored under:
%USERPROFILE%\.transfuzzy\datasets
The active dataset name is stored in:
%USERPROFILE%\.transfuzzy\config.json
To override the base directory, set:
$env:TRANSFUZZY_HOME = "C:\path\to\custom-home"
Each dataset should contain one name per line.
How Matching Works
The current pipeline is:
Input name
-> transliteration to Latin when needed
-> candidate pair generation from the selected dataset
-> feature computation
- Soundex ratio
- Metaphone ratio
- Levenshtein ratio
- Jaro-Winkler similarity
- Cosine similarity
- Euclidean similarity
- Manhattan similarity
- Pearson similarity
-> trained model ranking
-> optional transliteration back to the input script
Project Structure
src/transfuzzy/
├── app.py Flask app and HTTP routes
├── cli.py CLI entrypoint
├── core/
│ ├── config.py package constants and paths
│ ├── db_manager.py managed dataset storage
│ └── pipeline.py top-level matching pipeline
├── datasets/
│ └── default.txt bundled dataset asset
├── db/ packaged training/runtime artifacts
├── dir/ feature generation and training scripts
├── static/ browser-side assets
├── templates/ HTML templates
└── utils/ helper and response utilities
Development
Run the app locally:
uv run transfuzzy serve
Run tests:
uv run python -m unittest discover -s tests -v
Run training-related scripts:
uv run python src/transfuzzy/dir/enrich_data.py
uv run python src/transfuzzy/dir/train_model.py
More development notes are in docs/DEVELOPMENT.md.
Current Limitations
- Import-time model loading makes commands and tests depend on model availability.
- The package metadata says the project is a transliteration system, while the implementation is broader name matching.
- The repository contains packaged model and dataset artifacts in
src/transfuzzy/db, so development and release size are coupled to those files.
License
MIT. See LICENSE.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file transfuzzy-0.1.2.tar.gz.
File metadata
- Download URL: transfuzzy-0.1.2.tar.gz
- Upload date:
- Size: 22.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd29e58df9e5e9e56d1c304ba3665f03a965bdbdcc2dcf92408b1bacd61bc84b
|
|
| MD5 |
3773bc5ad75aded4a7817e8989f544b3
|
|
| BLAKE2b-256 |
b8536988f62dddaf7f7e3438b5eaf2edac8dbe2eea2695bb0c69c773766981f9
|
File details
Details for the file transfuzzy-0.1.2-py3-none-any.whl.
File metadata
- Download URL: transfuzzy-0.1.2-py3-none-any.whl
- Upload date:
- Size: 21.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
82929858d8ac483fdea2a27d3fc380456deb27748e69a0e08b0fc29b6077b456
|
|
| MD5 |
ce6cdc45059776317825569874b4a2cb
|
|
| BLAKE2b-256 |
0b6241758e481a83b41a2b0d9ca8f5b202a54c19bfdc2ec297108d936774fb47
|