Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.4.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.4.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.4.0.tar.gz
Algorithm Hash digest
SHA256 b23af7395e4246adab15d676f1d9751ea21b2bcadd5b3fb0267096210c77e074
MD5 f995354c26ee5c8b2680a8bae2e746a1
BLAKE2b-256 af5f2eb898d1a3ad58f24d0a5e4ea7c4e071e199929345caf4349d6b8eee1f35

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.4.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ce22d5a5919063a49e15694b5c4efaf0666a77e615d50f2e3813a44a19836c9e
MD5 fcb7aaefc2aa0d2f2de71860f1af42e6
BLAKE2b-256 fd7cb5772d31ee9fff771be150ad79a1e68eecc6b2c384760057497e190e455e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page