Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.2.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.9.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1010-azure

File hashes

Hashes for matchescu_reference_extraction-0.9.2.tar.gz
Algorithm Hash digest
SHA256 d69eaae859990c3ca9398aa07f3c99890f7b9b32908223ae6cc2a88511a96974
MD5 6721501cc28c5ae8b7726279e5f29ee6
BLAKE2b-256 dac9dbead33088545def9dcef5710837145d7879e3600def4c3b5641612e28dd

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.2-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.2-py3-none-any.whl
Algorithm Hash digest
SHA256 3a318e82034579249764f78710ac1a478dc88014e3fa2a92ecbbcf3c335bada9
MD5 f321e4f8755d8fb2712c07ca4ef7df45
BLAKE2b-256 b2d8c558b7dd8803c8301a0ca232ee4136b5b017353c73efe2ae872000d7ee19

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page