Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.1.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.1.1.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.1.1.tar.gz
Algorithm Hash digest
SHA256 7f600407642fa67adf86e0303e5eaf2d0bba9fff189e6a58f341f52ceb2256f7
MD5 7c5b0e048aebb5814099d7319c244b30
BLAKE2b-256 f5799fa10efcce325433a84a7df2a424212b72ad3ba16905346e9d4888a9ce74

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 c0b5984835547d64c370602fb4256101fa49839970f8c11d4e292249ea17d795
MD5 bee52e84bcd133ecfb0a037432eb1ce6
BLAKE2b-256 9f913bf02a9f5e91b3e5a58637f474aaeba066a3c0071d13f6ce0aa1acf09272

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page