Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.6.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.6.1.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.1.tar.gz
Algorithm Hash digest
SHA256 f7cc5b999a2094496d2c7237b479531b94d9a154b47ccee8adef24cb9f167a98
MD5 dcc39b66522df3ec83c3bc14d22bec9b
BLAKE2b-256 f387a5f2fee9bda35331a1ddf038178ae59b584b53affa2bf81c2ad9d6e6debe

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.6.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2f9b2d07da47834ad8d10616d58dd0e71bfbe699f139d521f671fc1f56b773c6
MD5 e93697e3303bc8f0172a73d10eb4f7d2
BLAKE2b-256 e5de88ded89e344e30183349904f6f279e648ef4e6a30218abc93247dc72f020

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page