Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.6.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.6.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.0.tar.gz
Algorithm Hash digest
SHA256 7db6f7944a5c825eb2a7d15ada2f404fa2253a37bfbffa46f625fc226257bf1c
MD5 d2f4818efc65e5f983352f0de65e4da2
BLAKE2b-256 8f34df07e0f46095505e3663f1dc14317df44f1924699a0c0773d2d74cb2d02c

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.6.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04e195b6a364d5724a9a6fd442bf370466a7835574ebdc42c0254cb54e49194a
MD5 d72f640fac5a51e48d91a4a62f4edc82
BLAKE2b-256 4f274c8532fd2a428783d311a5e411f5d19d412d2c8bdb7f33e4f479e559e146

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page