Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.7.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.7.0.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.7.0.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.11.0-1012-azure

File hashes

Hashes for matchescu_reference_extraction-0.7.0.tar.gz
Algorithm Hash digest
SHA256 9c3959254df2e6d221f238f94fe35dc2ba95f43bb158f91c8caddd0878b85391
MD5 bff867ba9f6e856330a963ccd2728d58
BLAKE2b-256 e3f17a040998e900e2532b9a9426feda297a0491c13992e5a2c965ea2490e79d

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.7.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 817e886eb4aa91973c609e33689d7698cae931ff012247247577e0e5bc3be7ba
MD5 f59b25e3c52c2abdb614c6ca5f9c9e04
BLAKE2b-256 76b9812fa6ca84559301957c493a5a23df47ace4d6651c1a9708e31487c3cefc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page