Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.4.tar.gz (4.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.4.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.9.4.tar.gz
  • Upload date:
  • Size: 4.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.1.3 CPython/3.13.5 Linux/6.17.0-1010-azure

File hashes

Hashes for matchescu_reference_extraction-0.9.4.tar.gz
Algorithm Hash digest
SHA256 b5db97a957319de5e10d00751766870b18ae0a51399128c030000eda526a66d6
MD5 20c7620cc26cc37a75ed19229c76e0d5
BLAKE2b-256 2af161d7c733f8012c76478d1edfcd3f19864b167e38c38254639fb238cb39cc

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.4-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 6a4141b2a64dd7e512ca553f1be80bac2dc2d9c290ecf8d68c744883c7374a2a
MD5 e3f92a9a5a739cc6c4141b72fe2a3374
BLAKE2b-256 0ae2ae4b47ed69ff0d776b66255eb1e802a3941617457408d8bca8f7e064a2de

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page