TransFuzzy is a robust transliteration system that bridges the gap between Indic scripts and the Latin alphabet.

Project description

TransFuzzy

TransFuzzy is a Python package for multilingual personal-name matching across Latin and several Indic scripts. It exposes the same matching pipeline through a CLI and a Flask API, and it supports switching between the bundled dataset and user-managed datasets.

What It Does

Accepts names in Latin, Devanagari, Telugu, Tamil, Kannada, Malayalam, Gujarati, and Gurmukhi.
Transliterates non-Latin input before matching.
Scores candidate names with phonetic, edit-distance, and embedding-based features.
Returns the best matches through transfuzzy predict or the HTTP API.
Lets you upload, activate, list, and delete datasets without modifying package files.

Installation

From PyPI

pip install transfuzzy

Local development setup

uv sync

Python 3.11+ is required.

Runtime Notes

TransFuzzy currently loads the sentence-transformers/all-MiniLM-L6-v2 model during module import. On a fresh machine, the first CLI, API, or test run may download model files from Hugging Face before the command can complete.

That has two practical consequences:

The first run can be noticeably slower than later runs.
Offline or restricted-network environments can fail before the CLI help text, API startup, or tests finish loading.

Quick Start

Start the API server

transfuzzy

transfuzzy serve --port 3000

The Flask server listens on http://localhost:3000 and opens that URL in your default browser on startup.

Query from the CLI

transfuzzy predict "Rahul"

Limit results:

transfuzzy predict "Rahul" --top 5

Return JSON:

transfuzzy predict "Rahul" --json

Use a specific text dataset file directly:

transfuzzy predict "Rahul" --db .\names.txt --top 5 --json

Supported Input Scripts

Examples of valid input:

Rahul
राहुल
రాహుల్

The output is transliterated back to the original script when the input was converted from a supported Indic script.

CLI Reference

`transfuzzy`

Starts the API server on port 3000 and opens the browser automatically.

`transfuzzy serve`

Run the API server explicitly.

transfuzzy serve --port 3000

Use --no-browser to skip opening the browser:

transfuzzy serve --port 3000 --no-browser

`transfuzzy predict`

Find similar names for a single input.

transfuzzy predict <name> [--top N] [--json] [--db PATH]

Arguments:

<name>: required input string.
--top: maximum number of matches to return. Default: 10.
--json: print a JSON object with similar_names.
--db: use a dataset file path directly instead of the active managed dataset.

`transfuzzy db`

Manage datasets stored in the TransFuzzy home directory.

Add a dataset:

transfuzzy db add .\names.txt

List managed datasets:

transfuzzy db list

Set the active dataset:

transfuzzy db use names.txt

Delete a managed dataset:

transfuzzy db delete names.txt

API Reference

`POST /similar_names`

Request body:

{
  "name": "Rahul"
}

Success response:

{
  "similar_names": ["Rahul", "Raahul", "Rahool"]
}

Validation errors are returned as JSON with an error field and the appropriate HTTP status code.

`POST /upload_db`

Uploads a dataset file using multipart/form-data with the field name file.

Success response shape:

{
  "message": "Dataset 'demo.txt' uploaded",
  "dataset_name": "demo.txt",
  "active_db": null
}

`GET /list_dbs`

Returns the stored managed datasets and the active dataset name.

{
  "datasets": ["demo.txt"],
  "active_db": "demo.txt"
}

`POST /use_db`

Request body:

{
  "name": "demo.txt"
}

`DELETE /delete_db`

Request body:

{
  "name": "demo.txt"
}

Dataset Management

There are two ways to provide names:

Pass a file path directly with --db.
Store datasets with transfuzzy db ... or the dataset API routes and switch the active dataset.

Managed datasets are stored under:

%USERPROFILE%\.transfuzzy\datasets

The active dataset name is stored in:

%USERPROFILE%\.transfuzzy\config.json

To override the base directory, set:

$env:TRANSFUZZY_HOME = "C:\path\to\custom-home"

Each dataset should contain one name per line.

How Matching Works

The current pipeline is:

Input name
-> transliteration to Latin when needed
-> candidate pair generation from the selected dataset
-> feature computation
   - Soundex ratio
   - Metaphone ratio
   - Levenshtein ratio
   - Jaro-Winkler similarity
   - Cosine similarity
   - Euclidean similarity
   - Manhattan similarity
   - Pearson similarity
-> trained model ranking
-> optional transliteration back to the input script

Project Structure

src/transfuzzy/
├── app.py              Flask app and HTTP routes
├── cli.py              CLI entrypoint
├── core/
│   ├── config.py       package constants and paths
│   ├── db_manager.py   managed dataset storage
│   └── pipeline.py     top-level matching pipeline
├── datasets/
│   └── default.txt     bundled dataset asset
├── db/                 packaged training/runtime artifacts
├── dir/                feature generation and training scripts
├── static/             browser-side assets
├── templates/          HTML templates
└── utils/              helper and response utilities

Development

Run the app locally:

uv run transfuzzy serve

Run tests:

uv run python -m unittest discover -s tests -v

Run training-related scripts:

uv run python src/transfuzzy/dir/enrich_data.py
uv run python src/transfuzzy/dir/train_model.py

More development notes are in docs/DEVELOPMENT.md.

Current Limitations

Import-time model loading makes commands and tests depend on model availability.
The package metadata says the project is a transliteration system, while the implementation is broader name matching.
The repository contains packaged model and dataset artifacts in src/transfuzzy/db, so development and release size are coupled to those files.

License

MIT. See LICENSE.

Project details

Release history Release notifications | RSS feed

This version

0.1.2

Mar 30, 2026

0.1.1

Mar 30, 2026

0.1.0

Mar 30, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transfuzzy-0.1.2.tar.gz (22.0 kB view details)

Uploaded Mar 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

transfuzzy-0.1.2-py3-none-any.whl (21.6 kB view details)

Uploaded Mar 30, 2026 Python 3

File details

Details for the file transfuzzy-0.1.2.tar.gz.

File metadata

Download URL: transfuzzy-0.1.2.tar.gz
Upload date: Mar 30, 2026
Size: 22.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for transfuzzy-0.1.2.tar.gz
Algorithm	Hash digest
SHA256	`bd29e58df9e5e9e56d1c304ba3665f03a965bdbdcc2dcf92408b1bacd61bc84b`
MD5	`3773bc5ad75aded4a7817e8989f544b3`
BLAKE2b-256	`b8536988f62dddaf7f7e3438b5eaf2edac8dbe2eea2695bb0c69c773766981f9`

See more details on using hashes here.

File details

Details for the file transfuzzy-0.1.2-py3-none-any.whl.

File metadata

Download URL: transfuzzy-0.1.2-py3-none-any.whl
Upload date: Mar 30, 2026
Size: 21.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for transfuzzy-0.1.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`82929858d8ac483fdea2a27d3fc380456deb27748e69a0e08b0fc29b6077b456`
MD5	`ce6cdc45059776317825569874b4a2cb`
BLAKE2b-256	`0b6241758e481a83b41a2b0d9ca8f5b202a54c19bfdc2ec297108d936774fb47`

See more details on using hashes here.

transfuzzy 0.1.2

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

TransFuzzy

What It Does

Installation

From PyPI

Local development setup

Runtime Notes

Quick Start

Start the API server

Query from the CLI

Supported Input Scripts

CLI Reference

transfuzzy

transfuzzy serve

transfuzzy predict

transfuzzy db

API Reference

POST /similar_names

POST /upload_db

GET /list_dbs

POST /use_db

DELETE /delete_db

Dataset Management

How Matching Works

Project Structure

Development

Current Limitations

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`transfuzzy`

`transfuzzy serve`

`transfuzzy predict`

`transfuzzy db`

`POST /similar_names`

`POST /upload_db`

`GET /list_dbs`

`POST /use_db`

`DELETE /delete_db`