Skip to main content

A python library for accurate and scaleable data deduplication and entity-resolution

Project description

dedupe is a library that uses machine learning to perform de-duplication and entity resolution quickly on structured data. dedupe is the open source engine for dedupe.io

dedupe will help you:

  • remove duplicate entries from a spreadsheet of names and addresses

  • link a list with customer information to another with order history, even without unique customer id’s

  • take a database of campaign contributions and figure out which ones were made by the same person, even if the names were entered slightly differently for each record

dedupe takes in human training data and comes up with the best rules for your dataset to quickly and automatically find similar records, even with very large databases.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dedupe_fork_eccovia-2.0.14.tar.gz (113.3 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

dedupe_fork_eccovia-2.0.14-cp312-cp312-musllinux_1_1_x86_64.whl (153.4 kB view details)

Uploaded CPython 3.12musllinux: musl 1.1+ x86-64

dedupe_fork_eccovia-2.0.14-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl (151.0 kB view details)

Uploaded CPython 3.12manylinux: glibc 2.17+ x86-64manylinux: glibc 2.5+ x86-64

File details

Details for the file dedupe_fork_eccovia-2.0.14.tar.gz.

File metadata

  • Download URL: dedupe_fork_eccovia-2.0.14.tar.gz
  • Upload date:
  • Size: 113.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for dedupe_fork_eccovia-2.0.14.tar.gz
Algorithm Hash digest
SHA256 b9d5531a4dd9d12c3c37cc49a3cf37aed54d7a41b2abc20af048a24b3b506182
MD5 62c68eb473586340d6f18b6c7c7c596e
BLAKE2b-256 e8dc3c66061d446c417097021e6e19b83440df57795a28a6f59ee7e4780369af

See more details on using hashes here.

File details

Details for the file dedupe_fork_eccovia-2.0.14-cp312-cp312-musllinux_1_1_x86_64.whl.

File metadata

File hashes

Hashes for dedupe_fork_eccovia-2.0.14-cp312-cp312-musllinux_1_1_x86_64.whl
Algorithm Hash digest
SHA256 9692aaca00a3e373db4b1a02d4001eac707344cb7c6db52e52dac95bff87b875
MD5 131809f22f2b4254da63f0c5a2c77c97
BLAKE2b-256 bdafbd3864d5e54a3b9b384c1e4bf31c670e87390eb24d0f86fe00205aa9e6ec

See more details on using hashes here.

File details

Details for the file dedupe_fork_eccovia-2.0.14-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for dedupe_fork_eccovia-2.0.14-cp312-cp312-manylinux_2_5_x86_64.manylinux1_x86_64.manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 b12c64ec96ba8e25f175803f104fabf2d2d53d308c2a2ba8b50481b2d1758c9b
MD5 aabf4d105c2b051882c73a53f0e5ddcd
BLAKE2b-256 a359b9d9dfd9ba88a40f92e28ffb5711f2375a612902fea528317e5cc71d0b07

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page