Python wrapper for the Julia package CorrectMatch.jl
Project description
CorrectMatch
A thin Python wrapper around the Julia module CorrectMatch.jl, to estimate uniqueness from small population samples.
Installation
Install CorrectMatch using your favorite package manager, e.g., pip:
pip install correctmatch
or uv:
uv add correctmatch
We use JuliaCall to seamlessly run Julia code from Python. The Julia binary and its dependencies, including CorrectMatch.jl, will be automatically installed on first use.
Usage
This module estimates the uniqueness of a population, on which multiple discrete attributes can be collected. For instance, the following array is a sample of 1000 rows for five discrete attributes:
>>> import numpy as np
>>> arr = np.random.randint(1, 5, size=(1000, 5))
>>> arr[:3, :]
array([[1, 1, 1, 3, 1],
[3, 3, 2, 3, 3],
[3, 3, 4, 3, 2]])
We can estimate the uniqueness of a population of 1000 individuals, or 10000 individuals, from this sample:
>>> import correctmatch
>>> correctmatch.uniqueness(arr) # empirical uniqueness for 1,000 records
0.371
>>> correctmatch.correctness(arr) # empirical correctness for 1,000 records
0.637
by fitting a copula model to the observed records:
>>> fitted_model = correctmatch.fit_model(arr)
>>> fitted_arr = correctmatch.sample_model(fitted_model, 1000)
>>> fitted_arr[:3, :]
array([[4, 2, 1, 4, 1],
[4, 2, 3, 2, 3],
[1, 3, 1, 3, 1]])
>>> correctmatch.uniqueness(fitted_arr)
0.373
>>> correctmatch.correctness(fitted_arr)
0.639
CorrectMatch can also estimate uniqueness and correctness directly from pandas DataFrames, including those with categorical or string columns:
>>> import pandas as pd
>>> df = pd.DataFrame(arr, columns=['A', 'B', 'C', 'D', 'E'])
>>> correctmatch.uniqueness(df)
0.371
>>> df_cat = pd.DataFrame({
... 'color': pd.Categorical(['red', 'blue', 'green', 'red', 'blue']),
... 'size': pd.Categorical(['S', 'M', 'L', 'S', 'S'])
... })
>>> correctmatch.uniqueness(df_cat)
0.6
Individual-level metrics
Beyond population-level metrics, CorrectMatch can estimate the uniqueness and correctness of a specific individual given a fitted model in, say, a population of 1000 records:
>>> model = correctmatch.fit_model(arr)
>>> individual = arr[0] # or df.iloc[0] for DataFrames
>>> correctmatch.individual_uniqueness(model, individual, 1000)
0.39545972037740124
>>> correctmatch.individual_correctness(model, individual, 1000)
0.652110111566283
These functions estimate how likely a specific record is to be unique or correctly re-identified in a population.
In the demo/ folder, we have compiled more examples with real-world datasets.
License
GNU General Public License v3.0
See LICENSE to see the full text.
Patent-pending code. Additional support and details are available for commercial uses.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file correctmatch-1.4.1.tar.gz.
File metadata
- Download URL: correctmatch-1.4.1.tar.gz
- Upload date:
- Size: 188.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
306e25ba12b2568d75ce611c6057db3e9773497975a66f4d6461fc28484951d9
|
|
| MD5 |
8178998b56521055ae925b3703bc619a
|
|
| BLAKE2b-256 |
c00d3c1839f8341ef20010941acb9a6f93f37c53347d95880d7fc6c6dd342b9a
|
File details
Details for the file correctmatch-1.4.1-py3-none-any.whl.
File metadata
- Download URL: correctmatch-1.4.1-py3-none-any.whl
- Upload date:
- Size: 16.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
da3168f43ab0918fd21097cffb863c343252c59d06483ad9fdd8619ded8614d5
|
|
| MD5 |
a87f1b8c1be18078fadb191846365cef
|
|
| BLAKE2b-256 |
880dc7f8f8f63e49bd9eb5be9acfa59efd840fb53f7af55d1f97ba684390d68a
|