Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.8.1.tar.gz (3.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.8.1.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.8.1.tar.gz
Algorithm Hash digest
SHA256 8d0a0e2792eec4de86e7c90c66d6e64b6e8643e8a094b63308dfd8c2d216b5d1
MD5 b0d0540f412a6fe36d1cf9c7211b3891
BLAKE2b-256 d68aac4d340ce2971d7e5800a373a4e61c07fe883e3a762d710c918d94525806

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.8.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.8.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ee60469c7fb28779cbf1690a3a2841805d5072a60ac50c30ebf4569306d4aeb4
MD5 927f2bb196f3209e1aaa2253e5bb2838
BLAKE2b-256 0dac1a348bc9714d1e7b6d71f4ac4d5773eef2a55a0bafea95720f07c5e0398a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page