Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.7.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.7.2.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.7.2.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.11.0-1012-azure

File hashes

Hashes for matchescu_reference_extraction-0.7.2.tar.gz
Algorithm Hash digest
SHA256 4546ef827bc8520dd65e48e8868d53f4df99ca4d8a9462fb4d656527b2755758
MD5 79f9e254e16eabdd34a2c96efcde203a
BLAKE2b-256 712dfbad3fcdc1eda315a31b402294a4c254405e2ffa577e9dca52fd3388cf14

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.7.2-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8cd62eb896d94bf9d85a7213ce164af87e4b7f5b6d688ab67c4fd848bafc29a3
MD5 be121d808c6dcd16a056b6bf3d296d0a
BLAKE2b-256 bba119604f9527071cd5ccc8147a0b52d740489c6c58782f3530e8d306940674

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page