Skip to main content

A flexible record linkage framework that enables matching between multiple datasets using both exact and fuzzy matching techniques.

Project description

mi-chainlink

A powerful, flexible framework for entity resolution and record linkage using DuckDB as the database engine built upon the work of Who Owns Chicago by the Mansueto Institute for Urban Innovation including the work of Kevin Bryson, Ana (Anita) Restrepo Lachman, Caitlin P., Joaquin Pinto, and Divij Sinha.

This package enables you to load data from various sources, clean and standardize entity names and addresses, and create links between entities based on exact and fuzzy matching techniques.

Source: https://github.com/mansueto-institute/mi-chainlink

Documentation: https://mansueto-institute.github.io/mi-chainlink/

Issues: https://github.com/mansueto-institute/mi-chainlink/issues

Overview

This framework helps you solve the entity resolution problem by:

  1. Loading data from multiple sources into a DuckDB database
  2. Cleaning and standardizing entity names and addresses
  3. Creating exact matches between entities based on names and addresses
  4. Generating fuzzy matches using TF-IDF similarity
  5. Exporting the resulting linked data for further analysis

The system is designed to be configurable through YAML files and supports incremental updates to an existing database.

Installation

Package is available on PyPI. You can install it using pip or uv:

pip install mi-chainlink
uv add mi-chainlink

Usage

Command Line Interface

# Run interactive session
chainlink

# Run with path to config yaml
chainlink path/to/config.yaml

Python Interface

from chainlink import chainlink

chainlink(
    config: dict, ## dict with config details
    config_path: str | Path = DIR / "configs/config.yaml", ## path to store dict post processing
)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mi_chainlink-0.0.11.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mi_chainlink-0.0.11-py3-none-any.whl (41.0 kB view details)

Uploaded Python 3

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page