Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.7.1.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.7.1.tar.gz.

File metadata

  • Download URL: matchescu_reference_extraction-0.7.1.tar.gz
  • Upload date:
  • Size: 4.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.4 CPython/3.12.7 Linux/6.11.0-1012-azure

File hashes

Hashes for matchescu_reference_extraction-0.7.1.tar.gz
Algorithm Hash digest
SHA256 2a617b40148b8cb9d5245a7b0e263e459e128aec3c02e04b80efb6226cb25fd1
MD5 6a7d3de6fb1925fd7786cff87ad92268
BLAKE2b-256 573dce0a1a4e340aec5c58ca8ffa8d04e9be15d2e99eec3fd669df1c90aae40a

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.7.1-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.7.1-py3-none-any.whl
Algorithm Hash digest
SHA256 8a2ad4a133e8ccab019c7e4ecc5f973e96d589fd7374b62e8fde665b0f797aef
MD5 11b4f44d7aa800c571d28c5b332831e8
BLAKE2b-256 96ebbcd443e5badc6c3abcb3d43750ee94cc590914309d799bc0aae1f3bc9e45

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page