Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.5.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.5.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.9.5.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1010-azure

File hashes

Hashes for matchescu_reference_extraction-0.9.5.tar.gz
Algorithm Hash digest
SHA256 099b3686a22a1c60cec375d2f2ca4eaa889bb2e7661a8c15e325e284a7f85807
MD5 a315ead262a8f3757bff5f93131b40f1
BLAKE2b-256 3a02420d1910ca3da541a21cba2ce08e1b00e0932782440ac2405811acbec120

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.5-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.5-py3-none-any.whl
Algorithm Hash digest
SHA256 95076377fe21aa56b1615cb8f6cf4aafc90f57f0d1ffa70af4726ee4469e19b9
MD5 0020dfdcf702c4da126d11f75eab7e71
BLAKE2b-256 38aac1c30790f0d9d763b9ffd5eabe04d4cbb3ed1e56ac3100b2bcd02dbc8bcc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page