Skip to main content

Character trigram fuzzy set.

Project description

Character trigram fuzzy set implementation providing cosine similarity-based fuzzy matching.

This library does that one thing on iterables of strings. Any beyond that–Levenshtein distance, scoring, bigram fallback, etc.–is left as an exercise to the reader.

Usage

import os.path
from timeit import timeit
import requests

# Retrieve a file containing around 470,000 English words
url = 'https://github.com/dwyl/english-words/raw/master/words.txt'
r = requests.get(url, stream=True)
words_path = os.path.expanduser('~/words.txt')
if not os.path.isfile(words_path):
    with open(words_path, 'wb') as f:
        for chunk in r.iter_content(chunk_size=1024):
            if chunk:
                f.write(chunk)

# Usage
import charactertrigramfuzzyset as ctfs
items = [line.rstrip() for line in open(words_path, 'r')]
fs = ctfs.CharacterTrigramFuzzySet(items)
fs.get('bryan')

# Profiling, generally around 10-20 ms per call on my machine
timeit("fs.get('bryan')", setup='''
import charactertrigramfuzzyset as ctfs
items = [line.rstrip() for line in open('{words_path}', 'r')]
fs = ctfs.CharacterTrigramFuzzySet(items)
'''.format(words_path=words_path), number=1000)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

charactertrigramfuzzyset-0.0.2.tar.gz (3.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

charactertrigramfuzzyset-0.0.2-py3-none-any.whl (2.8 kB view details)

Uploaded Python 3

File details

Details for the file charactertrigramfuzzyset-0.0.2.tar.gz.

File metadata

File hashes

Hashes for charactertrigramfuzzyset-0.0.2.tar.gz
Algorithm Hash digest
SHA256 b6ef7b5c94acb08f702e6527784bcede22cda991361fc86e2f2398af9cc7c4af
MD5 6d5fe9b75ee6151bd8050936677742e3
BLAKE2b-256 dee8450f3b107bdb71c2123b4a8a85a5639435c3e7017eccabc8b14d492aaf02

See more details on using hashes here.

File details

Details for the file charactertrigramfuzzyset-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for charactertrigramfuzzyset-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 50aa1430364ba5bd7cbeb6280c0d2b6347d2ae5b8fbe466a394120d7a9d54034
MD5 1bd2bdcd86181cc54a77a2a4942d66bc
BLAKE2b-256 18fbc7846080640702bf71eacb199c305e9af27fbba31ad3c912559cc073f53f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page