Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.1.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.1.tar.gz
Algorithm Hash digest
SHA256 f716538b8ee93d965316b13ae3ffc2cd263753a102992ac0d0f4e8cf313ecc49
MD5 58b6864256e3c11891460bfc30939899
BLAKE2b-256 00db6ad808b36a33124aeec70e4b6ad95e660152d151c5e9d6cc4afc7e2d1d66

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f597218a18aa46fd7b1681adada7eef87a7fd661acbfb1828c35ee2db23b6470
MD5 399cc487dc2b7daaccb258c794928d2c
BLAKE2b-256 98b1b59bfcd8f65c5f0e42100e2ebdd0e9945c86736941ec5715381beee9581b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page