Skip to main content

No project description provided

Project description

Run tests Coverage PyPI PyPI version GitHub code size in bytes

rdflib-ocdm

A Python library that extends RDFLib to support the OpenCitations Provenance Model, providing provenance tracking and change management capabilities for RDF data.

Overview

rdflib-ocdm is designed to be fully compatible with RDFLib while adding specialized functionality for handling provenance information according to the OpenCitations Provenance Model. It provides mechanisms for:

  • Tracking provenance information for RDF entities
  • Managing entity snapshots to record changes over time
  • Handling entity creation, modification, and merging with proper provenance
  • Storing and retrieving RDF data with provenance information

This library serves as a retrocompatible extension to RDFLib that adds support for the OpenCitations Provenance Model, particularly focusing on provenance tracking and change management.

While oc_ocdm is a Python interface specifically designed for creating and managing bibliographic data according to the OpenCitations Data Model, rdflib-ocdm can be considered a subset of both RDFlib and oc_ocdm. The key advantage of rdflib-ocdm is that it allows you to use the OpenCitations provenance model with any type of data, not just bibliographic data.

Key Features

  • Extended Graph Classes: OCDMGraph and OCDMConjunctiveGraph that inherit from RDFLib's Graph and ConjunctiveGraph classes
  • Provenance Tracking: Automatic generation of provenance information when entities are created or modified
  • Snapshot Management: Creation and management of snapshot entities to record the state of entities at different points in time
  • Counter Handlers: Various implementations for managing entity identifiers (in-memory, filesystem, SQLite)
  • Storer: Utilities for storing RDF data in various formats and endpoints
  • Domain Agnostic: Unlike oc_ocdm which is specific to bibliographic data, rdflib-ocdm can be used with any type of RDF data

Installation

pip install rdflib-ocdm

Usage

Basic Usage

from rdflib import URIRef, Literal
from rdflib_ocdm.ocdm_graph import OCDMGraph
from rdflib_ocdm.counter_handler.in_memory_counter_handler import InMemoryCounterHandler
from rdflib_ocdm.storer import Storer

# Create a new OCDM graph with a counter handler
counter_handler = InMemoryCounterHandler()
g = OCDMGraph(counter_handler)

# Add triples with provenance tracking
resp_agent = URIRef("https://orcid.org/0000-0002-8420-0696")
primary_source = URIRef("https://api.crossref.org/")
g.add((URIRef("https://example.org/resource"), 
       URIRef("http://purl.org/dc/terms/title"), 
       Literal("Example Resource")),
       resp_agent=resp_agent,
       primary_source=primary_source)

# Generate provenance information
g.generate_provenance()

# Store the graph
storer = Storer(g, output_format="json-ld")
storer.store_graphs_in_file("output.json")

Working with Existing Data

When working with pre-existing RDF data, you need to establish a baseline state from which changes can be tracked. The preexisting_finished method serves this critical purpose:

  1. It marks a specific point in time as the baseline state of your graph
  2. It creates a snapshot of this baseline state for each entity
  3. When you later call generate_provenance(), the system will calculate the differences (deltas) between the current state and this baseline

This delta calculation is essential for accurate provenance tracking, as it allows the system to record exactly what changed, when it changed, and who made the change.

from rdflib import URIRef, Literal
from rdflib_ocdm.ocdm_graph import OCDMGraph
from rdflib_ocdm.counter_handler.in_memory_counter_handler import InMemoryCounterHandler

# Create a graph and load existing data
g = OCDMGraph(InMemoryCounterHandler())
g.parse("existing_data.ttl", format="turtle")

# Mark the current state as the baseline for delta calculation
# This is crucial for the system to know what's "original" vs. what's "changed"
resp_agent = URIRef("https://orcid.org/0000-0002-8420-0696")
primary_source = URIRef("https://example.org/data-source")
g.preexisting_finished(resp_agent=resp_agent, primary_source=primary_source)

# Now you can make changes to the graph
g.add((URIRef("https://example.org/resource"), 
       URIRef("http://purl.org/dc/terms/description"), 
       Literal("Updated description")),
       resp_agent=resp_agent,
       primary_source=primary_source)

# Generate provenance information that will calculate and record the deltas
# between the baseline state and the current state
g.generate_provenance()

# Get the provenance graphs that contain the delta information
prov_graphs = g.get_provenance_graphs()

## Running Tests

The project includes a comprehensive test suite to ensure functionality and maintain code quality. To run the tests locally:

### Prerequisites

- [Poetry](https://python-poetry.org/) for dependency management
- Docker for running test databases (used by some tests)

### Setup

1. Clone the repository and install dependencies:
   ```bash
   git clone https://github.com/opencitations/rdflib-ocdm.git
   cd rdflib-ocdm
   poetry install --with dev
  1. Start the test databases (if needed):
    # On Linux/macOS
    ./test/start-test-databases.sh
    
    # On Windows
    .\test\start-test-databases.ps1
    

Running Tests

Run the tests with coverage:

poetry run python -m coverage run --rcfile=test/coverage/.coveragerc 

Generate and view the coverage report:

poetry run coverage report  # Console output
poetry run coverage html    # HTML report (available in htmlcov/ directory)

Cleanup

After running tests, stop the test databases:

# On Linux/macOS
./test/stop-test-databases.sh

# On Windows
.\test\stop-test-databases.ps1

Note: On Linux/macOS, you may need to make the test scripts executable before running them. Use the following command:

chmod +x test/start-test-databases.sh test/stop-test-databases.sh

Contributing

Please see CONTRIBUTING.md for guidelines on how to contribute to this project, including commit message conventions and how to trigger different types of releases.

References

  • Persiani, S., Daquino, M., Peroni, S. (2022). A Programming Interface for Creating Data According to the SPAR Ontologies and the OpenCitations Data Model. In: Groth, P., et al. The Semantic Web. ESWC 2022. Lecture Notes in Computer Science, vol 13261. Springer, Cham. https://doi.org/10.1007/978-3-031-06981-9_18

License

ISC License

Related Projects

  • oc_ocdm: A Python library for importing, creating, modifying, and exporting RDF data structures compliant with the OpenCitations Data Model (OCDM v2.0.1). It provides a specialized interface for working with bibliographic data according to the OpenCitations specifications.

  • time-agnostic-library: A Python library that enables time-travel queries on RDF datasets compliant with the OpenCitations provenance model. It allows users to query different versions of the data at specific points in time, supporting version materialization and various structured query types across versions and deltas.

  • heritrace: HERITRACE (Heritage Enhanced Repository Interface for Tracing, Research, Archival Curation, and Engagement) is a semantic editor designed for GLAM professionals (galleries, libraries, archives, museums). It enables non-technical domain experts to enrich and edit metadata with robust semantic capabilities, focusing on user-friendliness, provenance tracking, and change management.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rdflib_ocdm-1.0.5.tar.gz (22.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rdflib_ocdm-1.0.5-py3-none-any.whl (36.2 kB view details)

Uploaded Python 3

File details

Details for the file rdflib_ocdm-1.0.5.tar.gz.

File metadata

  • Download URL: rdflib_ocdm-1.0.5.tar.gz
  • Upload date:
  • Size: 22.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for rdflib_ocdm-1.0.5.tar.gz
Algorithm Hash digest
SHA256 4e171b450e09902547da6d847ec773da54df5c0f1a32fef3f64c2d4b4ba1ffaf
MD5 75f8f6db324922102a0ddd7ce369e48d
BLAKE2b-256 76354081df5095a22c36de67effc167b8d94372283506ac7067b5261e8161a42

See more details on using hashes here.

File details

Details for the file rdflib_ocdm-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: rdflib_ocdm-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 36.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.2.1 CPython/3.10.18 Linux/6.11.0-1018-azure

File hashes

Hashes for rdflib_ocdm-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 1b181ced15b82c7aeed95388a9e5e8a6aa124886ecd716745276e982c809f7e2
MD5 684599b52e4e36157eab138f1ec6c7cd
BLAKE2b-256 b91357bc743ffe94621f3941666ed29f059abf9af9164d149aa8139d06b668e4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page