Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.8.0.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.8.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.8.0.tar.gz
Algorithm Hash digest
SHA256 1376bf9a9ae8a57093336e86010181d0c2bb22809e9afea8d2cd3d41fb6a48fd
MD5 495e2e6c05dcbfdef7ed3e44c332a77c
BLAKE2b-256 ec29c7aa76b83a2ce78dab5e9d5b3b30dc024d37ef7bac8b6685323fafdc3b61

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.8.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.8.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f04073c34bd0e8f70bf342024679640dc178adb2c1b4692aa48d048ea257938d
MD5 143bb291c91692a0d80ae1b5e501ba97
BLAKE2b-256 6419cb7420979d593b472ce90e685da4e114e424dea21d9d78f66a89a35b1f79

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page