Skip to main content

Python wrapper for the Julia package CorrectMatch.jl

Project description

CorrectMatch

CI PyPI version

A thin Python wrapper around the Julia module CorrectMatch.jl, to estimate uniqueness from small population samples.

Installation

Install CorrectMatch using your favorite package manager, e.g., pip: pip install correctmatch or uv: uv add correctmatch

We use JuliaCall to seamlessly run Julia code from Python. The Julia binary and its dependencies, including CorrectMatch.jl, will be automatically installed on first use.

Usage

This module estimates the uniqueness of a population, on which multiple discrete attributes can be collected. For instance, the following array is a sample of 1000 rows for five discrete attributes:

>>> import numpy as np
>>> arr = np.random.randint(1, 5, size=(1000, 5))
>>> arr[:3, :]
array([[1, 1, 1, 3, 1],
       [3, 3, 2, 3, 3],
       [3, 3, 4, 3, 2]])

We can estimate the uniqueness of a population of 1000 individuals, or 10000 individuals, from this sample:

>>> import correctmatch

>>> correctmatch.uniqueness(arr)  # empirical uniqueness for 1,000 records
0.371
>>> correctmatch.correctness(arr)  # empirical correctness for 1,000 records
0.637

by fitting a copula model to the observed records:

>>> fitted_model = correctmatch.fit_model(arr)
>>> fitted_arr = correctmatch.sample_model(fitted_model, 1000)
>>> fitted_arr[:3, :]
array([[4, 2, 1, 4, 1],
       [4, 2, 3, 2, 3],
       [1, 3, 1, 3, 1]])
>>> correctmatch.uniqueness(fitted_arr)
0.373
>>> correctmatch.correctness(fitted_arr)
0.639

CorrectMatch can also estimate uniqueness and correctness directly from pandas DataFrames, including those with categorical or string columns:

>>> import pandas as pd
>>> df = pd.DataFrame(arr, columns=['A', 'B', 'C', 'D', 'E'])
>>> correctmatch.uniqueness(df)
0.371
>>> df_cat = pd.DataFrame({
...     'color': pd.Categorical(['red', 'blue', 'green', 'red', 'blue']),
...     'size': pd.Categorical(['S', 'M', 'L', 'S', 'S'])
... })
>>> correctmatch.uniqueness(df_cat)
0.6

Individual-level metrics

Beyond population-level metrics, CorrectMatch can estimate the uniqueness and correctness of a specific individual given a fitted model in, say, a population of 1000 records:

>>> model = correctmatch.fit_model(arr)
>>> individual = arr[0]  # or df.iloc[0] for DataFrames
>>> correctmatch.individual_uniqueness(model, individual, 1000)
0.39545972037740124
>>> correctmatch.individual_correctness(model, individual, 1000)
0.652110111566283

These functions estimate how likely a specific record is to be unique or correctly re-identified in a population.

In the demo/ folder, we have compiled more examples with real-world datasets.

License

GNU General Public License v3.0

See LICENSE to see the full text.

Patent-pending code. Additional support and details are available for commercial uses.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

correctmatch-1.4.1.tar.gz (188.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

correctmatch-1.4.1-py3-none-any.whl (16.4 kB view details)

Uploaded Python 3

File details

Details for the file correctmatch-1.4.1.tar.gz.

File metadata

  • Download URL: correctmatch-1.4.1.tar.gz
  • Upload date:
  • Size: 188.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for correctmatch-1.4.1.tar.gz
Algorithm Hash digest
SHA256 306e25ba12b2568d75ce611c6057db3e9773497975a66f4d6461fc28484951d9
MD5 8178998b56521055ae925b3703bc619a
BLAKE2b-256 c00d3c1839f8341ef20010941acb9a6f93f37c53347d95880d7fc6c6dd342b9a

See more details on using hashes here.

File details

Details for the file correctmatch-1.4.1-py3-none-any.whl.

File metadata

  • Download URL: correctmatch-1.4.1-py3-none-any.whl
  • Upload date:
  • Size: 16.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for correctmatch-1.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 da3168f43ab0918fd21097cffb863c343252c59d06483ad9fdd8619ded8614d5
MD5 a87f1b8c1be18078fadb191846365cef
BLAKE2b-256 880dc7f8f8f63e49bd9eb5be9acfa59efd840fb53f7af55d1f97ba684390d68a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page