Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.3.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.3.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.9.3.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1010-azure

File hashes

Hashes for matchescu_reference_extraction-0.9.3.tar.gz
Algorithm Hash digest
SHA256 49bb75b7daf8ecdd7664beb4c245503f27b1909f27082365304c8aca1a8b7256
MD5 ffdf1640df0402159f75e2a82ad1b10c
BLAKE2b-256 da785648bd1a1ea6cb024af5f7e43d0afb8650717b2d7b1275418d8ccee60388

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.3-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.3-py3-none-any.whl
Algorithm Hash digest
SHA256 6822969a52e325a83392fe0655848c161530ea9a122f00995dbe9b5c46609a8f
MD5 133c9bdaf0d3d9b892db40ffa63fd544
BLAKE2b-256 06ae0246bccd41fbb600102aaf5379fa821feba6e1300f9fffe332099af02896

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page