No project description provided
Project description
Mismo
Table of Contents
Goals
Use Ibis as the core
This gives a few benefits that are key to record linkage:
- Ability to use datasets that are larger than memory
- Ability to use multiple backends (eg
duckdb
for single node, orbigquery
orspark
for distributed)
Thoughtful, composable API
Use a duck-typing approach to allow users to plug in their own components
eg "Blocker" has a block
method with a certain signature.
This makes mismo a bit more complicated than dedupe
or splink
, but
it will be much more flexible.
Extras
- More ergonomic model persistence than
dedupe
.splink
did a good job here. - Support determinism using
random_state
(unlikededupe
)
License
mismo
is distributed under the terms of the LGPL-3.0-or-later license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
mismo-0.1.0.tar.gz
(24.8 kB
view hashes)
Built Distribution
mismo-0.1.0-py3-none-any.whl
(33.6 kB
view hashes)