Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.5.0.tar.gz (4.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.5.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.5.0.tar.gz
Algorithm Hash digest
SHA256 3b1775b87a826e6b962d9ea6cfbcf5f65839d212407dc671a9372f56e9eda9eb
MD5 0dca494ef35ecdf7e6862ba6fd9e575d
BLAKE2b-256 10f31d11d3883d35e2e1c63df84c0645eea6c69613154de4c56a35a780965297

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.5.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 4b77af8c6fe5dbca03fca954882717edac6395a39f63c98742b210a4d3b124d9
MD5 9bc0645998568281032a67348189686d
BLAKE2b-256 9ca324a0eb5c0a4b1b8d2946cf8e905d6368be8061862ab809c66db618c8a0fc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page