Fuzzy matching lookup for CSV/Excel/SQL datasets (Arabic + English)

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

mohamed1410

These details have not been verified by PyPI

Project description

FuzzyLookup

Fuzzy string matching for CSV, Excel, and SQL datasets — built for Arabic and English names.

pip install fuzzylookup

Features

Arabic-aware normalization — strips diacritics, unifies alef variants, teh marbuta, alef maqsura
Positional name scoring — "محمد كمال" and "كمال محمد" score differently (name_aware=True)
Multiple sources — CSV, Excel, Parquet, Feather, pandas DataFrame, SQL (sqlite3 / SQLAlchemy)
fuzzy_merge() — fuzzy join between two DataFrames, like pd.merge() with a score threshold
~10x faster on large datasets via a blocking index (first-token prefix bucketing)
Five scorers: ratio, partial, token_sort, token_set, wratio

Quick Start

Lookup from a file

from fuzzylookup import FuzzyLookup

fl = FuzzyLookup("customers.csv", column="name", name_aware=True)

# Single lookup
fl.lookup("محمد كمال", top_n=3, min_score=70)
# [{'name': 'محمد كمال عبد الرحمن', 'score': 83.4, '_index': 0}, ...]

# Best match only
fl.lookup_best("احمد سعيد", min_score=70)

# Batch lookup
fl.lookup_many(["محمد", "أحمد", "علي"], top_n=1, min_score=70)

From SQL

import sqlite3
from fuzzylookup import FuzzyLookup

con = sqlite3.connect("customers.db")
fl = FuzzyLookup(
    source=None,
    column="name",
    connection=con,
    sql_query="SELECT * FROM customers WHERE active = 1",
    name_aware=True,
)
fl.lookup("محمد كمال", top_n=3)

Fuzzy merge — join two DataFrames

from fuzzylookup import fuzzy_merge

result = fuzzy_merge(
    crm_df, master_df,
    left_on="cust_name",
    right_on="name",
    min_score=80,
    name_aware=True,
)

Or from a FuzzyLookup instance — uses the blocking index automatically:

master = FuzzyLookup("master.csv", column="name", name_aware=True)

result = master.merge(
    crm_df,
    other_on="cust_name",
    min_score=80,
    return_columns=["account_no", "cust_name"],
)

API Reference

`FuzzyLookup(source, column, ...)`

Parameter	Type	Default	Description
`source`	str / Path / DataFrame / None	—	File path, DataFrame, or None for SQL
`column`	str	—	Column to match against
`scorer`	str	`"wratio"`	`ratio` / `partial` / `token_sort` / `token_set` / `wratio`
`normalize_arabic`	bool	`True`	Strip diacritics, normalize alef/teh marbuta/alef maqsura
`name_aware`	bool	`False`	Positional name scoring
`encoding`	str	`"utf-8"`	CSV encoding
`sql_query`	str	`None`	SQL SELECT (required when `connection=` is used)
`connection`	connection	`None`	sqlite3 or SQLAlchemy connection
`use_blocking`	bool	`True`	Enable blocking index (~10x speedup)
`block_prefix_len`	int	`2`	Prefix length for blocking buckets

`.lookup(query, top_n, min_score, columns)`

Returns a list of dicts, each with row data + score (0–100) + _index.

`.lookup_best(query, min_score, columns)`

Returns the single best match dict, or None if below min_score.

`.lookup_many(queries, top_n, min_score, columns)`

Batch lookup — returns dict[query → list[match]].

`.merge(other, other_on, min_score, top_n, return_columns, return_score)`

Fuzzy-join the reference dataset against other DataFrame.

`fuzzy_merge(left, right, left_on, right_on, ...)`

Parameter	Default	Description
`min_score`	`80.0`	Minimum score threshold
`scorer`	`"wratio"`	Matching algorithm
`normalize_arabic`	`True`	Arabic normalization
`name_aware`	`False`	Positional scoring
`top_n`	`1`	Top N matches per left row
`suffixes`	`("_left","_right")`	Suffix for overlapping columns
`return_score`	`True`	Add `fuzzy_score` column
`use_blocking`	`True`	Enable blocking index

Arabic Name Matching

fl = FuzzyLookup("names.csv", column="name", name_aware=True)

# Normalized automatically before matching:
# أحمد  →  احمد   (alef variants)
# فاطمة →  فاطمه  (teh marbuta)
# موسى  →  موسي   (alef maqsura)
# مُحَمَّد → محمد   (diacritics removed)

# Positional scoring:
# "محمد كمال" vs "محمد كمال"  →  100   ✓ exact
# "محمد كمال" vs "كمال محمد"  →  ~55   ✗ wrong order penalized
# "محمد كمال" vs "محمد علي"   →  ~65   ~ first token matches

Performance

The blocking index reduces the candidate pool per query from the full dataset to ~10% by bucketing on the first 2 characters of the first name token.

Dataset	Without blocking	With blocking	Speedup
500 queries × 10,000 rows	26s	2.1s	12x
2,000 queries × 10,000 rows	~104s	~8s	~12x

Disable if first tokens are very inconsistent: use_blocking=False

Requirements

Python ≥ 3.8
pandas ≥ 1.3
rapidfuzz ≥ 3.0
openpyxl ≥ 3.0

License

MIT

Project details

These details have been verified by PyPI

Project links

Homepage

GitHub Statistics

Maintainers

mohamed1410

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.0

Jun 7, 2026

0.0.1

Jun 4, 2026

0.0.0

Jun 7, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzylookup-0.2.0.tar.gz (10.0 kB view details)

Uploaded Jun 7, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

fuzzylookup-0.2.0-py3-none-any.whl (10.6 kB view details)

Uploaded Jun 7, 2026 Python 3

File details

Details for the file fuzzylookup-0.2.0.tar.gz.

File metadata

Download URL: fuzzylookup-0.2.0.tar.gz
Upload date: Jun 7, 2026
Size: 10.0 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fuzzylookup-0.2.0.tar.gz
Algorithm	Hash digest
SHA256	`b9be49dabbf51993f4c9252d0f30d46346be510bf2407277d639a81d0a0b66b8`
MD5	`a645cca8a6f811d783e167d677d676f0`
BLAKE2b-256	`0744fb8e5f138053a15cdda43574e31320c2033f515102186e5f2cda3ebacd35`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fuzzylookup-0.2.0.tar.gz:

Publisher: python-publish.yml on Moda141/Fuzzylookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fuzzylookup-0.2.0.tar.gz
- Subject digest: b9be49dabbf51993f4c9252d0f30d46346be510bf2407277d639a81d0a0b66b8
- Sigstore transparency entry: 1744924391
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: Moda141/Fuzzylookup@2d722f145f88266dee22b3aeb350478ee3757a04
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Moda141
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@2d722f145f88266dee22b3aeb350478ee3757a04
- Trigger Event: workflow_dispatch

File details

Details for the file fuzzylookup-0.2.0-py3-none-any.whl.

File metadata

Download URL: fuzzylookup-0.2.0-py3-none-any.whl
Upload date: Jun 7, 2026
Size: 10.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for fuzzylookup-0.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`8b80e31b7896774b7f0f9439ea0e7bac6fa00380320232b2a993489701910e00`
MD5	`84d7cde16193d92dfd7e7638740b0fa0`
BLAKE2b-256	`b3dfc1555b03e09c1b96502489654e0208c7bb043565e591b03bf2aceb60a25b`

See more details on using hashes here.

Provenance

The following attestation bundles were made for fuzzylookup-0.2.0-py3-none-any.whl:

Publisher: python-publish.yml on Moda141/Fuzzylookup

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: fuzzylookup-0.2.0-py3-none-any.whl
- Subject digest: 8b80e31b7896774b7f0f9439ea0e7bac6fa00380320232b2a993489701910e00
- Sigstore transparency entry: 1744924458
- Sigstore integration time: Jun 7, 2026
Source repository:
- Permalink: Moda141/Fuzzylookup@2d722f145f88266dee22b3aeb350478ee3757a04
- Branch / Tag: refs/heads/main
- Owner: https://github.com/Moda141
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: python-publish.yml@2d722f145f88266dee22b3aeb350478ee3757a04
- Trigger Event: workflow_dispatch

Fuzzylookup 0.2.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

FuzzyLookup

Features

Quick Start

Lookup from a file

From SQL

Fuzzy merge — join two DataFrames

API Reference

FuzzyLookup(source, column, ...)

.lookup(query, top_n, min_score, columns)

.lookup_best(query, min_score, columns)

.lookup_many(queries, top_n, min_score, columns)

.merge(other, other_on, min_score, top_n, return_columns, return_score)

fuzzy_merge(left, right, left_on, right_on, ...)

Arabic Name Matching

Performance

Requirements

License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

`FuzzyLookup(source, column, ...)`

`.lookup(query, top_n, min_score, columns)`

`.lookup_best(query, min_score, columns)`

`.lookup_many(queries, top_n, min_score, columns)`

`.merge(other, other_on, min_score, top_n, return_columns, return_score)`

`fuzzy_merge(left, right, left_on, right_on, ...)`