No project description provided
Project description
Mismo
Table of Contents
Goals
Use Ibis as the core
This gives a few benefits that are key to record linkage:
- Ability to use datasets that are larger than memory
- Ability to use multiple backends (eg
duckdb
for single node, orbigquery
orspark
for distributed)
Thoughtful, composable API
Use a duck-typing approach to allow users to plug in their own components
eg "Blocker" has a block
method with a certain signature.
This makes mismo a bit more complicated than dedupe
or splink
, but
it will be much more flexible.
Extras
- More ergonomic model persistence than
dedupe
.splink
did a good job here. - Support determinism using
random_state
(unlikededupe
)
License
mismo
is distributed under the terms of the LGPL-3.0-or-later license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mismo-0.1.0.tar.gz
(24.8 kB
view details)
Built Distribution
mismo-0.1.0-py3-none-any.whl
(33.6 kB
view details)
File details
Details for the file mismo-0.1.0.tar.gz
.
File metadata
- Download URL: mismo-0.1.0.tar.gz
- Upload date:
- Size: 24.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.7.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f2b1842108e93caff57969d82c6b6a5ec21250056f9b6094059e152635914e81 |
|
MD5 | 60fdb9797a796345fb56af3df4fb3afe |
|
BLAKE2b-256 | 18ec881ccd2cc1f9fc8536082648b1736b2546c6b4c983bee81b2435cd7bf15e |
File details
Details for the file mismo-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: mismo-0.1.0-py3-none-any.whl
- Upload date:
- Size: 33.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: pdm/2.7.0 CPython/3.11.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 698f3656d0a7dc8289500664f94e1ed053900b030fde96341150160df8f3f222 |
|
MD5 | 44a63250ac169adbbe71a6ce60f58375 |
|
BLAKE2b-256 | 08e3de2bc5158942c49b17dfadb43efdc0f18e4a8cf01375788ce9e769e40cf6 |