Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.1.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.1.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e76b62435256662144d1e71fbbe170bf15c06f8ff96283b3a22c06c86883c4da
MD5 346917c3334dfcbbbedecbf98003db94
BLAKE2b-256 139d9b78818216ab33be00de635f4330a7b73e9a4c65baf97207c3c833da4477

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 90e88a2a6aea4814778adad9cc6f0b49195b07690a3f6e7be3f63a358b52049e
MD5 86f7fe349bf0f52a731c7d90b32f6857
BLAKE2b-256 d60640a5d26a3df706f6fcebdf5976c2e41c6f0f2a722194159f815dc79c5618

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page