Skip to main content

No project description provided

Project description

Mismo

PyPI - Version PyPI - Python Version


Table of Contents

Goals

Use Ibis as the core

This gives a few benefits that are key to record linkage:

  • Ability to use datasets that are larger than memory
  • Ability to use multiple backends (eg duckdb for single node, or bigquery or spark for distributed)

Thoughtful, composable API

Use a duck-typing approach to allow users to plug in their own components eg "Blocker" has a block method with a certain signature. This makes mismo a bit more complicated than dedupe or splink, but it will be much more flexible.

Extras

  • More ergonomic model persistence than dedupe. splink did a good job here.
  • Support determinism using random_state (unlike dedupe)

License

mismo is distributed under the terms of the LGPL-3.0-or-later license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mismo-0.1.0.tar.gz (24.8 kB view details)

Uploaded Source

Built Distribution

mismo-0.1.0-py3-none-any.whl (33.6 kB view details)

Uploaded Python 3

File details

Details for the file mismo-0.1.0.tar.gz.

File metadata

  • Download URL: mismo-0.1.0.tar.gz
  • Upload date:
  • Size: 24.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.7.0 CPython/3.11.1

File hashes

Hashes for mismo-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f2b1842108e93caff57969d82c6b6a5ec21250056f9b6094059e152635914e81
MD5 60fdb9797a796345fb56af3df4fb3afe
BLAKE2b-256 18ec881ccd2cc1f9fc8536082648b1736b2546c6b4c983bee81b2435cd7bf15e

See more details on using hashes here.

File details

Details for the file mismo-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: mismo-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 33.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: pdm/2.7.0 CPython/3.11.1

File hashes

Hashes for mismo-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 698f3656d0a7dc8289500664f94e1ed053900b030fde96341150160df8f3f222
MD5 44a63250ac169adbbe71a6ce60f58375
BLAKE2b-256 08e3de2bc5158942c49b17dfadb43efdc0f18e4a8cf01375788ce9e769e40cf6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page