Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.9.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.9.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.0.tar.gz
Algorithm Hash digest
SHA256 7bc8fd96293a67ced56cd912c3f1547f600dd4f93b86a0bd9878df5fd563efbd
MD5 6bcfac2c475e172767be64dad05f9804
BLAKE2b-256 6742a6d328acb23737f3061dc37f84b412e709062312ccb53c43fdb61d3072ad

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.9.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 33dcfcc2ccc253d4482f49907802b4b77c10badbccf263e3007d0b9eaf614454
MD5 9e6fe202453741e15e823d080f6eb3dd
BLAKE2b-256 c53ee40cfdad87bc2ffc8c1b541d3f351864ae005870f18661290d347e947649

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page