Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.7.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.7.3.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.3.tar.gz
Algorithm Hash digest
SHA256 a2f41fa9a6cc77fc253dee6bab60f22e3f17edc7851505085558bdbb2c073642
MD5 c463e31eb03703ad93f49d5fd2da6e21
BLAKE2b-256 d9c4237b898f54aec1906abe9320d2a5754798a168351ffa437c8519b0b358f6

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.7.3-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.3-py3-none-any.whl
Algorithm Hash digest
SHA256 56fe945f50bbdc76147530050a64050d23e8742cc5c1681da7ff4a6ded28573e
MD5 3e46551ee306df2cdd7efaa0ea4c8515
BLAKE2b-256 c2737e36853fe3053f3705ee1220ed946b120799b2ade86139a3a6b7b73235db

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page