Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.6.2.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.6.2.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.2.tar.gz
Algorithm Hash digest
SHA256 4398bc68d55aef6bb746b729d0cca3f4ae12a55f1cbaf583c70558e3736065b7
MD5 d25b9a54f6cad39654d8f4900cb7b019
BLAKE2b-256 a2c7b9d36666890bd3728159d3e0f286c3b3550cbfdc40fe0945f344f65f6de0

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.6.2-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.6.2-py3-none-any.whl
Algorithm Hash digest
SHA256 8f2f36ab54ed525c64427c00de3f9035f15e0672722d316774bf790d2597b21e
MD5 4bf89c407140255d64cd45ec7503424b
BLAKE2b-256 3dac3a4b9f63520a0986f6a29932c9c516e72025d72ed10625eb59786abb70c8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page