Skip to main content

Base abstract data types for Entity Resolution

Project description

matchescu-base

This Python package includes common abstract data types (adt package), utilities (common package) and generic data extraction algorithms for entity resolution. These abstractions are used in other packages such as:

  • matchescu-reference-extraction: extracts entity references from data sources
  • matchescu-reference-stores: stores entity references efficiently
  • matchescu-comparison-space-generation: generates the comparison space used for matching or clustering
  • matchescu-matching: various methods of scoring the similarity of entity references
  • matchescu-clustering: various methods of scoring the colocation of entity references
  • matchescu-profile-assembly: algorithms used to build concrete entity profiles from specific data structures (tuples, lists or graphs)

On its own, the package may be used to create other structured approaches towards entity resolution, particularly based on the Resolvi reference architecture.

Set up dev environment

  1. (optional) install pyenv
  2. install Python 3.11
  3. install Poetry
  4. clone this repository
  5. run a couple of shell commands
$ cd <REPO_ROOT>
$ poetry install

Run tests

$ poetry run pytest

Activate virtual environment

$ poetry shell

-or-

$ source .venv/bin/activate

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_base-0.26.0.tar.gz (7.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

matchescu_base-0.26.0-py3-none-any.whl (11.3 kB view details)

Uploaded Python 3

File details

Details for the file matchescu_base-0.26.0.tar.gz.

File metadata

  • Download URL: matchescu_base-0.26.0.tar.gz
  • Upload date:
  • Size: 7.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1008-azure

File hashes

Hashes for matchescu_base-0.26.0.tar.gz
Algorithm Hash digest
SHA256 ec42648ec1e8e0857c54a8ff03b5f52f10942aa92b6900d18c8cf4787d29c3a0
MD5 37886a15209707d259e62e2b53f33e3e
BLAKE2b-256 1ca5839e68902bbb554000a7ec33b431a19fe9448d5d918005f4e68781eae027

See more details on using hashes here.

File details

Details for the file matchescu_base-0.26.0-py3-none-any.whl.

File metadata

  • Download URL: matchescu_base-0.26.0-py3-none-any.whl
  • Upload date:
  • Size: 11.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1008-azure

File hashes

Hashes for matchescu_base-0.26.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4d02984fd41b80fca4ba07b7b6968aa902c38dbd9d62f4b75fc007cc0176ce6a
MD5 2bfe7f9019423a1821a5fc5f94ab36a0
BLAKE2b-256 5a48818d2d1f4db1e76554e501f115b3c998ed8ed1eb78d0fe0e84258e9359d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page