Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.6.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.6.3.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.3.tar.gz
Algorithm Hash digest
SHA256 4b163616b52a4268ade31bb34931cf2b62bd0cd657bb113de727dbaf47b3388c
MD5 9e3448ea23fd07d853f5b18fb45f233f
BLAKE2b-256 720aea77b49ff71f407e8d684305d65be9e7c8d8617d3a0cfbd40ad16e66dae2

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.6.3-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.3-py3-none-any.whl
Algorithm Hash digest
SHA256 df89d7b72e4cb06f4cd04b5839ef8cadf292147b5557a348c3e71ea05c5d8de2
MD5 c922913247800ef853d65d42ff373ebc
BLAKE2b-256 7ec76c2c758d2592178aae064760bb88c77d834ff5075ef398293cfcea93b5bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page