Skip to main content

Extract references from a multitude of data sources

Project description

from matchescu.data_sources import CsvDataSource

matchescu-reference-extraction

This package implements an entity reference extraction subsystem for entity resolution. The main concepts that are relevant here are:

  • a generic attribute-based data record implementation (can access data by str or int key),
  • various data_sources which support reading records from different data stores, and
  • generic extraction_engines that convert data records to entity references.

Development

Run the following commands to ensure you have a proper environment.

$ pyenv install 3.12
$ poetry install
$ poetry run pytest

When you contribute code, open a new feature/* or hotfix/* branch.

Usage

from matchescu.data_sources import CsvDataSource
from matchescu.extraction import Traits

traits = list(
    Traits().int(["id"])
    .string(["name", "description", "manufacturer"])
    .currency(["price"])
)
csv = CsvDataSource("./path/to/csv/file", list(traits))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

matchescu_reference_extraction-0.2.0.tar.gz (4.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

File details

Details for the file matchescu_reference_extraction-0.2.0.tar.gz.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1b3005f052bcb1bdc2caebda7e40e4898b3210940c0b5778e0cbb3e0c7103844
MD5 23364687198cdd84539bc7e6d7650ffb
BLAKE2b-256 f344181e14d765e4588b7db8e8d417d201d1a0abd5e3a3cbd319aea3b801eba2

See more details on using hashes here.

File details

Details for the file matchescu_reference_extraction-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for matchescu_reference_extraction-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 055dccfe9ed2f860204640d37ad93d88b2f9ecf4ad2be69536faaeada1c3606e
MD5 53790b79dbd68b2f50c56c23f824b077
BLAKE2b-256 c3420be1e1f1e6b358d7f7ba25c64dec695ec5c972d022f1a565f2be05411a13

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page