Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.7.5.tar.gz (3.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.7.5.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.5.tar.gz
Algorithm Hash digest
SHA256 407b0f785a5b7425b11ec7821603267d59a6b1b2b7bcf02a736c8cbf80d3b6c1
MD5 1ffc81a0ed60f007a145892445b9f15d
BLAKE2b-256 53f37092da6ff1db558bd533cafc3c3cb52ca405e5f5d75e0d4a8759d82f8d51

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.7.5-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.5-py3-none-any.whl
Algorithm Hash digest
SHA256 18eadd093033180ae229274ec2c78fbc5c8393cd42c1819a0e4e58292b78b934
MD5 50b26ac854a8a2ad3fb0ebb9edc951ff
BLAKE2b-256 79c4e8b21228cc6232d627de69da9257e160f108d79409f209a0fcfbb0285dd3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page