Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.5.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.5.1.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.5.1.tar.gz
Algorithm Hash digest
SHA256 103a22dfce69d65509ae50ca05a2e63d92befa675df2871c2ab6501df0377055
MD5 85c308e91379bda1d952dc0668b3ddbc
BLAKE2b-256 0123bfa16b7fb3e352e5e7deff92f04f36ff9f3cc93396d3ed637bc33dacef86

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 315193b3b04b304a7332d16f7e50ae93202334502a3c91cc9b6b96306b44b03d
MD5 2aaf18ed02cb4ac408c3a314629eb110
BLAKE2b-256 ee908b255f7060b17a8969ae19f17c7ed8fd0a92d424bad96f23f6553fb9d83c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page