Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.3.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.3.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.3.0.tar.gz
Algorithm Hash digest
SHA256 12009b89d14aea9586f97c4e1489aedcf4f67e3ad44306f8d1e174c1e20a90c4
MD5 1a28d88cc526be3711797855c196e566
BLAKE2b-256 264fa9e5b0305ac17117bf4e320ed1bc705e6ff418eb02c78dce90a3412390ad

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.3.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3224829a4f44d6eddec0b1813edf2df90299070a496e4bb88fd8c72088fe8f42
MD5 f9d2ae0cbc4270d8de2cc9e4ad6cb8b7
BLAKE2b-256 4b0c47fe5feb103a02a43884549ecae3adb908c5bf5b13ea745956137ba4f5e0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page