A flexible record linkage framework that enables matching between multiple datasets using both exact and fuzzy matching techniques.
Project description
mi-chainlink
A powerful, flexible framework for entity resolution and record linkage using DuckDB as the database engine built upon the work of Who Owns Chicago by the Mansueto Institute for Urban Innovation including the work of Kevin Bryson, Ana (Anita) Restrepo Lachman, Caitlin P., Joaquin Pinto, and Divij Sinha.
This package enables you to load data from various sources, clean and standardize entity names and addresses, and create links between entities based on exact and fuzzy matching techniques.
Source: https://github.com/mansueto-institute/mi-chainlink
Documentation: https://mansueto-institute.github.io/mi-chainlink/
Issues: https://github.com/mansueto-institute/mi-chainlink/issues
Overview
This framework helps you solve the entity resolution problem by:
- Loading data from multiple sources into a DuckDB database
- Cleaning and standardizing entity names and addresses
- Creating exact matches between entities based on names and addresses
- Generating fuzzy matches using TF-IDF similarity
- Exporting the resulting linked data for further analysis
The system is designed to be configurable through YAML files and supports incremental updates to an existing database.
Installation
Package is available on PyPI. You can install it using pip or uv:
pip install mi-chainlink
uv add mi-chainlink
Usage
Command Line Interface
# Run interactive session
chainlink
# Run with path to config yaml
chainlink path/to/config.yaml
Python Interface
from chainlink import chainlink
chainlink(
config: dict, ## dict with config details
config_path: str | Path = DIR / "configs/config.yaml", ## path to store dict post processing
)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mi_chainlink-0.0.11.tar.gz.
File metadata
- Download URL: mi_chainlink-0.0.11.tar.gz
- Upload date:
- Size: 46.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2ca822bc7ffe99e90ff79dd7d580241a8d226c9cf17c0a1f705a411769646c0b
|
|
| MD5 |
8ca01ebc7ae1557a3c9c239257108267
|
|
| BLAKE2b-256 |
098392efecb52aa9a3f6c762c1a19810488105580b98e00d0c0e6d25b1f9c50c
|
Provenance
The following attestation bundles were made for mi_chainlink-0.0.11.tar.gz:
Publisher:
on-release-main.yml on mansueto-institute/mi-chainlink
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mi_chainlink-0.0.11.tar.gz -
Subject digest:
2ca822bc7ffe99e90ff79dd7d580241a8d226c9cf17c0a1f705a411769646c0b - Sigstore transparency entry: 1166320133
- Sigstore integration time:
-
Permalink:
mansueto-institute/mi-chainlink@9ac47e22518d5d41319516446721d607cfc2fe87 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/mansueto-institute
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
on-release-main.yml@9ac47e22518d5d41319516446721d607cfc2fe87 -
Trigger Event:
release
-
Statement type:
File details
Details for the file mi_chainlink-0.0.11-py3-none-any.whl.
File metadata
- Download URL: mi_chainlink-0.0.11-py3-none-any.whl
- Upload date:
- Size: 41.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b3efd274baaf64e972eccda765e5cc6cd6980c1a07eda3afa13f3689e050aff9
|
|
| MD5 |
2b52c614d7d3890b14a6dd6a031936eb
|
|
| BLAKE2b-256 |
03495765c8c0817a93ded17b2164d67d05f219becde8a5adceaf4f110cb1f328
|
Provenance
The following attestation bundles were made for mi_chainlink-0.0.11-py3-none-any.whl:
Publisher:
on-release-main.yml on mansueto-institute/mi-chainlink
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mi_chainlink-0.0.11-py3-none-any.whl -
Subject digest:
b3efd274baaf64e972eccda765e5cc6cd6980c1a07eda3afa13f3689e050aff9 - Sigstore transparency entry: 1166320188
- Sigstore integration time:
-
Permalink:
mansueto-institute/mi-chainlink@9ac47e22518d5d41319516446721d607cfc2fe87 -
Branch / Tag:
refs/tags/v0.0.11 - Owner: https://github.com/mansueto-institute
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
on-release-main.yml@9ac47e22518d5d41319516446721d607cfc2fe87 -
Trigger Event:
release
-
Statement type: