Skip to main content

Python bindings for the PurRDF RDF 1.2 kernel, GTS carrier, SHACL, and slice tooling

Project description

PurRDF logo — a black cat holding an RDF triple

PurRDF

The RDF 1.2 toolkit with a purr: primitives, codecs, SPARQL, SHACL, ShEx, and graph transport.

One RDF engine. One behavior. Every language.

CI crates.io PyPI npm DOI: 10.67342/pkg8gpp4no/v1 License: MIT OR Apache-2.0 MSRV 1.96


Why does this exist?

RDF tooling fragments along two axes.

Across languages: every ecosystem has its own parser, with its own bugs, its own corner-case interpretations, and its own subset of the spec. Move a graph from a Rust service to a Python pipeline to a browser and you have silently changed what the data means three times.

Across time: RDF 1.2 — triple terms, reifiers, base-direction literals — is where the standard is going, and almost no incumbent library carries it.

PurRDF exists so that a graph is the same graph everywhere. It is a from-scratch, dependency-light Rust core — parser to SPARQL engine to SHACL validator to binary transport — carried verbatim into Python, WebAssembly/JavaScript, and C. There are deliberately no Cargo feature flags anywhere in the workspace (CI enforces this): a data carrier must not have optional behavior, so every consumer gets the same byte-identical semantics.

PurRDF is the data backbone of the GMEOW stack and the reference home of the GTS graph-transport engine, but it assumes nothing about your ontology or application.

What's inside

  • RDF 1.2 primitives — an immutable, value-interned dataset IR (TermId space, string arena, copy-on-write mutation), with triple terms in object position, reifier/annotation side-tables, and base-direction literals (rdf:dirLangString).
  • Native codecs — first-party parsers/serializers for Turtle, TriG, N-Triples, N-Quads, RDF/XML, JSON-LD (star), and YAML-LD; byte-deterministic output.
  • Canonicalization — W3C RDFC-1.0 dataset canonicalization, tested against the W3C fixture suite.
  • SPARQL 1.1/1.2 — native parser → algebra → multiset evaluator over the interned IR (property paths, aggregates, EXISTS decorrelation, cost-based BGP planning, injectable SERVICE federation), gated by the W3C SPARQL 1.1 conformance harness. Results in SPARQL JSON/XML/CSV/TSV.
  • SHACL validation — a native validator with the complete SHACL Core feature set (all constraint components, full property paths, qualified value shapes, property pairs), SHACL-SPARQL constraints/targets on the native engine, and scoped SHACL 1.2 draft support for reifier shapes — 114/120 passing on the vendored W3C test suite (the 6 ledgered are custom-component and pre-binding-semantics gaps).
  • ShEx 2.1 — a from-scratch ShExC + ShExJ schema layer and validator gated against the official shexTest suite: 1,051/1,051 attempted validation tests, zero expected-failures (imports/semantic-actions staged next), 99/99 negative syntax, 14/14 negative structure. See docs/CONFORMANCE.md.
  • GTS graph transport — a single-file, content-addressed, append-only container for RDF 1.2 graphs and the binaries they reference: BLAKE3-chained CBOR segments, deterministic fold, COSE signing/encryption, pure-Rust crypto (wasm-friendly). Spec in docs/GTS-SPEC.md, frozen cross-language conformance vectors in vectors/.
  • Slices, mappings, and provenance — a manifest-based slice catalog with content-addressed artifact IDs, an explicit RDF↔GTS loss ledger (generated/rdf-loss-matrix.json), SSSOM mapping TSV support, and an FnO function-catalog codec.
  • Zero-dependency foundationspurrdf-iri (RFC 3987/3986) and purrdf-xsd (XSD 1.1 value space) have no runtime dependencies at all; purrdf-events (the object-safe ingestion seam) has none either.

Quickstart

Rust

cargo add purrdf
use purrdf::{parse_dataset, serialize_dataset, RdfDatasetBuilder, RdfLiteral, SerializeGraph};

// Build a dataset in interned TermId space.
let mut b = RdfDatasetBuilder::new();
let alice = b.intern_iri("https://example.org/alice");
let knows = b.intern_iri("http://xmlns.com/foaf/0.1/knows");
let bob = b.intern_iri("https://example.org/bob");
let name = b.intern_iri("http://xmlns.com/foaf/0.1/name");
let hi = b.intern_literal(RdfLiteral::simple("Alice"));
b.push_quad(alice, knows, bob, None);
b.push_quad(alice, name, hi, None);
let ds = b.freeze().expect("freeze");

// Serialize to any native codec and parse back, losslessly.
let ttl = serialize_dataset(&ds, "text/turtle", SerializeGraph::Dataset).unwrap();
let back = parse_dataset(&ttl, "text/turtle", None).unwrap();
assert_eq!(back.quad_count(), 2);

Python

pip install purrdf
import purrdf

quads = purrdf.parse(
    '<https://example.org/alice> <http://xmlns.com/foaf/0.1/name> "Alice" .',
    purrdf.RdfFormat.TURTLE,
)

from purrdf_native import shacl, shex

report = shacl.validate(shapes_ttl=my_shapes, data_nt=my_data)
print(report["conforms"])

result = shex.validate(my_schema_shexc, my_data_ttl,
                       [("https://example.org/alice", "https://example.org/PersonShape")])
print(result["conforms"])

The Python package also ships an rdflib compatibility layer and GTS relational exports (gts_to_sqlite, gts_to_duckdb, gts_to_parquet).

JavaScript / WebAssembly

An RDF/JS-shaped API (DataFactory / Dataset / Stream) over the same engine, including the RDF 1.2 features no incumbent RDF/JS library carries — quoted triple terms and base-direction literals:

import { ready, DataFactory, Dataset } from "@blackcatinformatics/purrdf";

await ready(); // one-time async wasm instantiation

const f = new DataFactory();
const rtl = f.directionalLiteral("مرحبا", "ar", "rtl");

const ds = new Dataset();
ds.add(f.quad(f.namedNode("https://ex/s"), f.namedNode("https://ex/says"), rtl));

const nq = ds.serialize("nquads");           // directions survive the round-trip
const reparsed = Dataset.parse(nq, "nquads");

See crates/rdf-wasm (make wasm-pkg builds the ESM package).

C

libpurrdf (crates/rdf-capi) exposes parse, serialize, pattern iteration, copy-on-write mutation, SPARQL, and GTS round-trips behind a panic-safe C ABI with a committed header (include/purrdf.h) that CI checks for drift. Built with cargo-c: make capi-build.

Crate map

Crate What it is
purrdf Umbrella facade: the RDF surface at the root, slice and shapes as modules. Start here.
purrdf-rdf RDF 1.2 implementation: native codecs, GTS adapters, describe, canonicalization entry points.
purrdf-core The kernel: interned IR, diagnostics, store traits, provenance, loss ledger, RDFC-1.0.
purrdf-gts GTS container engine: reader, writer, fold, verify, COSE sign/encrypt.
purrdf-sparql-algebra SPARQL 1.1/1.2 parser → query algebra AST.
purrdf-sparql-eval Multiset SPARQL evaluator in interned TermId space.
purrdf-sparql-results SPARQL results JSON/XML/CSV/TSV, plus a provenance-carrying extension.
purrdf-shapes SHACL validation engine (full Core + SHACL-SPARQL).
purrdf-shex ShEx 2.1: ShExC/ShExJ schemas and validation.
purrdf-slice Slice catalog: manifests, typed artifacts, ownership/dependency analysis.
purrdf-iri Zero-dependency IRI/URI parsing, resolution, normalization, CURIEs.
purrdf-xsd Zero-dependency XSD 1.1 value space with SPARQL numeric promotion.
purrdf-events Zero-dependency object-safe RDF event sink/source seam.
purrdf-wasm The wasm32 engine behind the purrdf ESM package.
purrdf-capi libpurrdf C ABI (unpublished; built via cargo-c).
purrdf-sparql-conformance W3C SPARQL conformance harness (unpublished).

Fast by measurement, not by assertion

The IR keeps every term once in a string arena addressed by copyable NonZeroU32 ids, hashes with fixed-key ahash everywhere hot, and freezes datasets into Box<[QuadRow]> tables with lazy ordinal permutation indexes (~4 bytes/quad per axis). Performance claims are backed by criterion benchmarks rather than adjectives — crates/rdf-core/benches/ir_layout.rs measures AoS vs. SoA vs. predicate-adjacency layouts (allocation counts, high-water mark, end-to-end latency), and the shipped layout is whichever wins. Run them with make bench.

Conformance

Every engine is gated by its official test suite, vendored and frozen in-repo — full scoreboard and how-to-run in docs/CONFORMANCE.md:

Engine Suite Result
ShEx 2.1 validation shexTest v2.1.0 (vectors/shexTest/) 1,051 / 1,051 attempted, 0 xfail
ShEx schemas / negative syntax / structure shexTest v2.1.0 425/425 · 99/99 · 14/14
SHACL W3C data-shapes (vectors/shacl/) 114 / 120 (6 ledgered)
SHACL (first-party frozen corpus) crates/shapes/corpus/ 48 / 48
SPARQL 1.1 W3C suite via purrdf-sparql-conformance green, xfail-ledgered
RDFC-1.0 W3C canonicalization fixtures green
GTS frozen cross-language vectors (vectors/) byte-exact

Development

make metadata   # regenerate + verify generated artifacts
make check      # fmt, build, tests, hygiene gates
make bench      # criterion benchmarks

Releases are tag-driven with OIDC trusted publishing (crates.io and PyPI), with build-provenance attestations and SPDX SBOMs — see docs/RELEASE.md.

The GMEOW family

PurRDF is the library layer of a small family of linked-data projects:

  • gmeow-ontology — the GMEOW reasoning-centric super-vocabulary and its publishing toolchain (PurRDF's primary consumer).
  • gmeow-gts — the GTS specification and its multi-language engines; PurRDF hosts the Rust engine.

Extraction history and source commits: PROVENANCE.md. Brand assets and usage: docs/BRAND.md.

License

Licensed under either of Apache License 2.0 or MIT license at your option, as described in LICENSING.md.

If you use PurRDF in research, please cite it — see CITATION.cff.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

purrdf-0.1.5.tar.gz (1.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

purrdf-0.1.5-cp313-abi3-manylinux_2_34_x86_64.whl (3.4 MB view details)

Uploaded CPython 3.13+manylinux: glibc 2.34+ x86-64

File details

Details for the file purrdf-0.1.5.tar.gz.

File metadata

  • Download URL: purrdf-0.1.5.tar.gz
  • Upload date:
  • Size: 1.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for purrdf-0.1.5.tar.gz
Algorithm Hash digest
SHA256 b1924bfe97d05fb118c133fc75de36817f56ff02a98b9633d65d572b003f8967
MD5 c5e5fef32540de133026a4b73b1d12d6
BLAKE2b-256 83eecb1e70a0ded49fac75c0c65389fbb8ff997209dd8f0ab4a4ce9ffd1b5eb6

See more details on using hashes here.

Provenance

The following attestation bundles were made for purrdf-0.1.5.tar.gz:

Publisher: release-pypi.yaml on Blackcat-Informatics/purrdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file purrdf-0.1.5-cp313-abi3-manylinux_2_34_x86_64.whl.

File metadata

File hashes

Hashes for purrdf-0.1.5-cp313-abi3-manylinux_2_34_x86_64.whl
Algorithm Hash digest
SHA256 67449c8d3d7763cdeadeec452de585e4b44fd9865ecd26b466b22054b76af279
MD5 b2441b2a45f78ff7527c37573c5ee3da
BLAKE2b-256 e8ef3b8cab6d304fb305ff8a0ec6baf87717ef80d2130ef598787e7f5e7c3b8c

See more details on using hashes here.

Provenance

The following attestation bundles were made for purrdf-0.1.5-cp313-abi3-manylinux_2_34_x86_64.whl:

Publisher: release-pypi.yaml on Blackcat-Informatics/purrdf

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page