A tool to enrich any OCDM compliant Knowledge Graph, finding new identifiers and deduplicating entities

These details have not been verified by PyPI

Project description

OC GraphEnricher

OC GraphEnricher enriches OpenCitations Data Model (OCDM) compliant knowledge graphs by finding missing identifiers and deduplicating entities.

Documentation: https://opencitations.github.io/oc_graphenricher/

Quick start

pip install oc-graphenricher

from oc_ocdm.graph.graph_set import GraphSet
from oc_ocdm.reader import Reader
from rdflib import Graph

from oc_graphenricher.enricher import GraphEnricher
from oc_graphenricher.deduplication import GraphDeduplicator
from oc_graphenricher.storage import single_file_storage

graph = Graph().parse("data/input.nt", format="nt11")

reader = Reader()
graph_set = GraphSet(base_iri="https://w3id.org/oc/meta/")
reader.import_entities_from_graph(
    graph_set,
    graph,
    enable_validation=False,
    resp_agent="https://w3id.org/oc/meta/prov/pa/2",
)

GraphEnricher(
    graph_set=graph_set,
    storage=single_file_storage(
        graph_path="enriched.json",
        provenance_path="provenance.json",
    ),
).enrich()
GraphDeduplicator(
    graph_set=graph_set,
    storage=single_file_storage(
        graph_path="deduplicated.json",
        provenance_path="provenance.json",
    ),
).deduplicate_and_save()

By default, GraphDeduplicator does not merge contributor roles only because author names are similar. To enable that opt-in behavior, pass merge_similar_named_contributors=True.

Use deduplicate() instead of deduplicate_and_save() when another application needs to manage storage or provenance output itself. Use preferred_survivors with a set of entity URIs to keep selected entities when duplicate clusters are merged. Without a preferred survivor, duplicate clusters keep the entity with more functional metadata. Ties use URI order. Use merge_clusters() when another application has already selected the merge clusters, for example from a reviewed CSV. The mapping key is the surviving entity URI and the values are the URIs to merge into it. This method does not discover or merge any extra duplicates outside the provided mapping.

For configuration options and usage details, see the documentation.

License

Distributed under the ISC License. See LICENSE.

To cite the latest version of this software (2.1.8), use this BibTeX entry:

@software{oc-graphenricher-2.1.8,
author = {Gabriele Pisciotta and Arcangelo Massari and Elia Rizzetto and Arianna Moretti and Ilaria De Dominicis and Silvio Peroni and Simone Persiani and Davide Brembilla},
title = {oc-graphenricher},
url = {https://archive.softwareheritage.org/swh:1:snp:5f58a0cd8d71190be26edbf4fcf9535dbc49c693;origin=https://github.com/opencitations/oc_graphenricher},
version = {2.1.8},
year = {2026}
}

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

2.1.9

Jul 6, 2026

2.1.8

Jul 6, 2026

2.1.7

Jul 6, 2026

2.1.6

Jul 6, 2026

2.1.5

Jul 6, 2026

2.1.4

Jul 6, 2026

2.1.3

Jul 5, 2026

2.1.2

Jul 4, 2026

2.1.1

Jul 4, 2026

2.1.0

Jul 4, 2026

2.0.0

Jul 4, 2026

1.0.1

Jul 3, 2026

1.0.0

Jun 25, 2026

0.2.5

Oct 9, 2023

0.2.3

May 6, 2021

0.2.2

May 1, 2021

0.2.1

Apr 14, 2021

0.2.0

Apr 13, 2021

0.1.1

Apr 12, 2021

0.1.0

Apr 12, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

oc_graphenricher-2.1.9.tar.gz (23.5 kB view details)

Uploaded Jul 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

oc_graphenricher-2.1.9-py3-none-any.whl (26.4 kB view details)

Uploaded Jul 6, 2026 Python 3

File details

Details for the file oc_graphenricher-2.1.9.tar.gz.

File metadata

Download URL: oc_graphenricher-2.1.9.tar.gz
Upload date: Jul 6, 2026
Size: 23.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.27 {"installer":{"name":"uv","version":"0.11.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for oc_graphenricher-2.1.9.tar.gz
Algorithm	Hash digest
SHA256	`d4d20362c34d12fb70c685e965ef024615784169c6af227c6c6ec4fa168db598`
MD5	`f025d31d19f75c223344f92c598d7257`
BLAKE2b-256	`5413b213544b9dfc2007aa586146e8c624b55ad258a04ff60ac62891bec51af8`

See more details on using hashes here.

File details

Details for the file oc_graphenricher-2.1.9-py3-none-any.whl.

File metadata

Download URL: oc_graphenricher-2.1.9-py3-none-any.whl
Upload date: Jul 6, 2026
Size: 26.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.11.27 {"installer":{"name":"uv","version":"0.11.27","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for oc_graphenricher-2.1.9-py3-none-any.whl
Algorithm	Hash digest
SHA256	`1f59c268e7c2d8801cc4e956e362c53278de1c57c1c53bb633fea87b0add5737`
MD5	`7aa909a9033114f0c36dca847759ee1f`
BLAKE2b-256	`fc45a395d5f9507a99b888854fdfc2f967181347b4ae83bdbafa9ee57b281a21`

See more details on using hashes here.

oc-graphenricher 2.1.9

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

OC GraphEnricher

Quick start

License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes