Skip to main content

A Fuzzy Matching Approach for Clustering Strings

Project description

Fuzz Up [W.I.P.]

Build status codecov PyPI PyPI - Downloads License

fuzzup offers (1) a simple approach for clustering string entitities based on Levenshtein Distance using Fuzzy Matching in conjunction with a simple rule-based clustering method.

fuzzup also provides (2) functions for computing the prominence of the resulting entity clusters resulting from (1).

fuzzup has been designed to fit the output from NER predictions from the Hugging Face transformers NER pipeline specifically.

Installation guide

fuzzup can be installed from the Python Package Index (PyPI) by:

pip install fuzzup

If you want the development version then install directly from Github.

Workflow

... COMING SOON!

Background

fuzzup is developed as a part of Ekstra Bladet’s activities on Platform Intelligence in News (PIN). PIN is an industrial research project that is carried out in collaboration between the Technical University of Denmark, University of Copenhagen and Copenhagen Business School with funding from Innovation Fund Denmark. The project runs from 2020-2023 and develops recommender systems and natural language processing systems geared for news publishing, some of which are open sourced like fuzzup.

Read more

The detailed documentation and motivation for fuzzup including code references and extended workflow examples can be accessed here.

Contact

We hope, that you will find fuzzup useful.

Please direct any questions and feedbacks to us!

If you want to contribute (which we encourage you to), open a PR.

If you encounter a bug or want to suggest an enhancement, please open an issue.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fuzzup-0.0.21.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

fuzzup-0.0.21-py3-none-any.whl (3.3 kB view details)

Uploaded Python 3

File details

Details for the file fuzzup-0.0.21.tar.gz.

File metadata

  • Download URL: fuzzup-0.0.21.tar.gz
  • Upload date:
  • Size: 3.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for fuzzup-0.0.21.tar.gz
Algorithm Hash digest
SHA256 4e47957d9b3ea057acd1255e4b252c0f4e6a781d050e7bc64bb4cb4a681fe784
MD5 4337717940b4722c9dd48c5e6777d840
BLAKE2b-256 e461c566366b64fca575319151a2d7c789929ac7deb6e019e6debd29e4545459

See more details on using hashes here.

File details

Details for the file fuzzup-0.0.21-py3-none-any.whl.

File metadata

  • Download URL: fuzzup-0.0.21-py3-none-any.whl
  • Upload date:
  • Size: 3.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.63.0 importlib-metadata/4.11.2 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.8.12

File hashes

Hashes for fuzzup-0.0.21-py3-none-any.whl
Algorithm Hash digest
SHA256 85a107d3768f16ec156a6c5fc5f68f119a2f6354713324092d7b84f058c5adc8
MD5 0e4f2ddb17edef60ad18e8618e64ed7c
BLAKE2b-256 2fdb20707ef29fc538831f0b867a43d660171a5dd9b9b42461e224025ba5023a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page