A blazingly fast RDF-Star database powered by Polars
Project description
RDF-StarBase
A blazingly fast RDF★ database with native provenance tracking
RDF-StarBase is a native RDF★ platform for storing, querying, and visualizing assertions about data — not just data itself. Every triple carries full provenance: who said it, when, how confident they were, and which process generated it.
Key Features
- Blazingly Fast — Built on Polars with Rust-speed DataFrame operations
- Native RDF-Star — First-class support for quoted triples and statement metadata
- Full Provenance — Every assertion tracked with source, timestamp, confidence, process
- Competing Claims — See ALL assertions, not just the "winning" one
- SPARQL-Star — Query with standard SPARQL syntax + provenance extensions
- Assertion Registry — Track data sources, APIs, and mappings as first-class entities
- REST API — FastAPI-powered web interface with interactive docs
- Graph Visualization — React + D3.js frontend for exploring knowledge graphs
- Parquet Persistence — Efficient columnar storage for analytics workloads
Why RDF-StarBase?
Traditional databases store values.
Traditional catalogs store descriptions.
RDF-StarBase stores assertions about reality.
When your CRM says customer.age = 34 and your Data Lake says customer.age = 36, most systems silently overwrite. RDF-StarBase keeps both, letting you:
- See competing claims side-by-side
- Filter by source, confidence, or recency
- Maintain full audit trails
- Let downstream systems choose which to trust
Installation
PyPI (Recommended)
pip install rdf-starbase[web] # Include REST API dependencies
Or for minimal installation:
pip install rdf-starbase
Docker (Quickest Start)
Run the complete stack (frontend + backend + database) in a single container:
docker run -d \
--name rdfstarbase \
-p 8000:8000 \
-v rdfstarbase-data:/data/repositories \
ontusdev/rdf-starbase:latest
Open http://localhost:8000/app/ to access the web interface.
Or use docker-compose:
services:
rdfstarbase:
image: ontusdev/rdf-starbase:latest
ports:
- "8000:8000"
volumes:
- rdfstarbase-data:/data/repositories
environment:
- PYTHONUNBUFFERED=1
volumes:
rdfstarbase-data:
Start with docker-compose up -d.
Features included in Docker:
- Monaco SPARQL editor with syntax highlighting
- Schema browser with class/property exploration
- Import/export UI for Turtle, RDF/XML, N-Triples, JSON-LD
- Interactive graph visualization with D3.js
- REST API at http://localhost:8000/docs
From Source
git clone https://github.com/ontus/rdf-starbase.git
cd rdf-starbase
pip install -e ".[dev]"
Quick Start
from rdf_starbase import TripleStore, ProvenanceContext
# Create a store
store = TripleStore()
# Add triples with provenance
prov = ProvenanceContext(
source="CRM_System",
confidence=0.85,
process="api_sync"
)
store.add_triple(
"http://example.org/customer/123",
"http://xmlns.com/foaf/0.1/name",
"Alice Johnson",
prov
)
# Query with provenance filtering
results = store.get_triples(
subject="http://example.org/customer/123",
min_confidence=0.8
)
# Detect competing claims
claims = store.get_competing_claims(
subject="http://example.org/customer/123",
predicate="http://example.org/age"
)
🔍 SPARQL-Star Queries
from rdf_starbase import execute_sparql
# Standard SPARQL
results = execute_sparql(store, """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?name WHERE {
<http://example.org/customer/123> foaf:name ?name
}
""")
# With provenance extensions
results = execute_sparql(store, """
SELECT ?s ?p ?o WHERE {
?s ?p ?o .
FILTER_CONFIDENCE(>= 0.9)
FILTER_SOURCE("CRM_System")
}
""")
# ASK queries
exists = execute_sparql(store, """
ASK WHERE {
<http://example.org/customer/123> <http://xmlns.com/foaf/0.1/name> ?name
}
""") # Returns: True
Advanced Query Features
# OPTIONAL - include data when available
results = execute_sparql(store, """
PREFIX foaf: <http://xmlns.com/foaf/0.1/>
SELECT ?person ?name ?email WHERE {
?person foaf:name ?name .
OPTIONAL { ?person foaf:mbox ?email }
}
""")
# UNION - combine multiple patterns
results = execute_sparql(store, """
SELECT ?entity ?label WHERE {
{ ?entity rdfs:label ?label }
UNION
{ ?entity foaf:name ?label }
}
""")
# BIND - computed values
results = execute_sparql(store, """
SELECT ?product ?price ?taxed WHERE {
?product ex:price ?price .
BIND(?price * 1.1 AS ?taxed)
}
""")
# Aggregates with GROUP BY
results = execute_sparql(store, """
SELECT ?source (COUNT(*) AS ?count) (AVG(?confidence) AS ?avg_conf) WHERE {
?s ?p ?o .
}
GROUP BY ?source
HAVING (COUNT(*) > 10)
""")
# CONSTRUCT - generate new triples
results = execute_sparql(store, """
CONSTRUCT {
?person foaf:knows ?other .
}
WHERE {
?person ex:worksAt ?company .
?other ex:worksAt ?company .
FILTER(?person != ?other)
}
""")
# INSERT DATA - add new triples
execute_sparql(store, """
INSERT DATA {
<http://example.org/alice> foaf:name "Alice" .
<http://example.org/alice> foaf:age 30 .
}
""")
# DELETE DATA - remove specific triples
execute_sparql(store, """
DELETE DATA {
<http://example.org/alice> foaf:age 30 .
}
""")
# DELETE WHERE - remove matching patterns
execute_sparql(store, """
DELETE WHERE {
<http://example.org/alice> foaf:knows ?anyone .
}
""")
# DELETE/INSERT WHERE - update values atomically
execute_sparql(store, """
DELETE { ?s ex:status "active" }
INSERT { ?s ex:status "archived" }
WHERE { ?s ex:status "active" }
""")
# Property paths - navigate graph relationships
results = execute_sparql(store, """
SELECT ?ancestor WHERE {
<http://example.org/alice> foaf:knows+ ?ancestor . # One or more hops
}
""")
results = execute_sparql(store, """
SELECT ?connected WHERE {
<http://example.org/alice> (foaf:knows|foaf:worksWith)* ?connected . # Zero or more via knows OR worksWith
}
""")
results = execute_sparql(store, """
SELECT ?knower WHERE {
?knower ^foaf:knows <http://example.org/bob> . # Inverse: who knows Bob?
}
""")
# Time-travel queries - query historical state
results = execute_sparql(store, """
SELECT ?s ?name WHERE {
?s foaf:name ?name .
}
AS OF "2025-01-15T00:00:00Z"
""")
# ASK with time-travel
existed = execute_sparql(store, """
ASK WHERE {
<http://example.org/alice> foaf:name ?name .
}
AS OF "2024-06-01"
""") # Returns: True if Alice existed on that date
📊 Named Graph Management
RDF-StarBase supports named graphs (graph containers/clusters) with full SPARQL Graph Store Protocol operations:
from rdf_starbase import execute_sparql
# CREATE GRAPH - create a new named graph
execute_sparql(store, """
CREATE GRAPH <http://example.org/graphs/customers>
""")
# LOAD - load RDF data from a file into a graph
execute_sparql(store, """
LOAD <file:///data/customers.ttl>
INTO GRAPH <http://example.org/graphs/customers>
""")
# Or load from HTTP
execute_sparql(store, """
LOAD <https://example.org/data/products.ttl>
INTO GRAPH <http://example.org/graphs/products>
""")
# COPY - copy all triples from one graph to another
execute_sparql(store, """
COPY GRAPH <http://example.org/graphs/customers>
TO GRAPH <http://example.org/graphs/customers_backup>
""")
# MOVE - move triples (copy then clear source)
execute_sparql(store, """
MOVE GRAPH <http://example.org/graphs/staging>
TO GRAPH <http://example.org/graphs/production>
""")
# ADD - add triples to another graph (merge)
execute_sparql(store, """
ADD GRAPH <http://example.org/graphs/updates>
TO GRAPH <http://example.org/graphs/main>
""")
# CLEAR - remove all triples from a graph (graph still exists)
execute_sparql(store, """
CLEAR GRAPH <http://example.org/graphs/temp>
""")
# DROP - delete a graph and all its triples
execute_sparql(store, """
DROP GRAPH <http://example.org/graphs/old_data>
""")
# Special graph targets
execute_sparql(store, "CLEAR DEFAULT") # Clear default graph
execute_sparql(store, "DROP NAMED") # Drop all named graphs
execute_sparql(store, "CLEAR ALL") # Clear everything
# SILENT mode - don't fail if graph doesn't exist
execute_sparql(store, """
DROP SILENT GRAPH <http://example.org/graphs/maybe_exists>
""")
# List all named graphs
graphs = store.list_graphs()
print(graphs) # ['http://example.org/graphs/customers', 'http://example.org/graphs/products']
Querying Named Graphs
# FROM clause - restrict query to specific graph
results = execute_sparql(store, """
SELECT ?customer ?name
FROM <http://example.org/graphs/customers>
WHERE {
?customer foaf:name ?name
}
""")
# FROM with multiple graphs (union of datasets)
results = execute_sparql(store, """
SELECT ?entity ?label
FROM <http://example.org/graphs/customers>
FROM <http://example.org/graphs/products>
WHERE {
?entity rdfs:label ?label
}
""")
# GRAPH pattern - query specific named graph in WHERE clause
results = execute_sparql(store, """
SELECT ?customer ?name WHERE {
GRAPH <http://example.org/graphs/customers> {
?customer foaf:name ?name
}
}
""")
# GRAPH with variable - discover which graph contains data
results = execute_sparql(store, """
SELECT ?graph ?entity ?name WHERE {
GRAPH ?graph {
?entity foaf:name ?name
}
}
""")
# Combined patterns - default graph + specific named graph
results = execute_sparql(store, """
SELECT ?person ?friend ?friendName WHERE {
?person foaf:knows ?friend .
GRAPH <http://example.org/graphs/profiles> {
?friend foaf:name ?friendName
}
}
""")
# FROM NAMED - specify available named graphs for GRAPH patterns
results = execute_sparql(store, """
SELECT ?g ?s ?name
FROM NAMED <http://example.org/graphs/customers>
FROM NAMED <http://example.org/graphs/employees>
WHERE {
GRAPH ?g { ?s foaf:name ?name }
}
""")
⭐ RDF-Star: Quoted Triples
RDF-Star allows you to make statements about statements:
# The assertion "Alice knows Bob" is claimed by Wikipedia
store.add_quoted_triple(
subject="<<http://example.org/alice http://xmlns.com/foaf/0.1/knows http://example.org/bob>>",
predicate="http://example.org/assertedBy",
obj="http://dbpedia.org/resource/Wikipedia",
provenance=prov
)
Query with SPARQL-Star:
SELECT ?who WHERE {
<< ?person foaf:knows ?other >> ex:assertedBy ?who
}
Competing Claims Detection
# Multiple systems report different ages
crm_prov = ProvenanceContext(source="CRM", confidence=0.85)
lake_prov = ProvenanceContext(source="DataLake", confidence=0.92)
store.add_triple(customer, "http://example.org/age", 34, crm_prov)
store.add_triple(customer, "http://example.org/age", 36, lake_prov)
# See all competing values
claims = store.get_competing_claims(customer, "http://example.org/age")
print(claims)
# shape: (2, 4)
# ┌────────┬──────────┬────────────┬─────────────────────┐
# │ object │ source │ confidence │ timestamp │
# ├────────┼──────────┼────────────┼─────────────────────┤
# │ 36 │ DataLake │ 0.92 │ 2026-01-16 03:00:00 │
# │ 34 │ CRM │ 0.85 │ 2026-01-16 02:00:00 │
# └────────┴──────────┴────────────┴─────────────────────┘
Persistence
# Save to Parquet (columnar, fast, compressible)
store.save("knowledge_graph.parquet")
# Load back
loaded_store = TripleStore.load("knowledge_graph.parquet")
Architecture
┌─────────────────────────────────────────────────────────────────────┐
│ RDF-StarBase │
├─────────────────────────────────────────────────────────────────────┤
│ React + D3.js Frontend │ REST API (FastAPI) │
├──────────────────────────────┼──────────────────────────────────────┤
│ SPARQL-Star Parser │ Query Executor │ Assertion Registry │
├─────────────────────────────────────────────────────────────────────┤
│ Triple Store (Polars DataFrames) │
├─────────────────────────────────────────────────────────────────────┤
│ Parquet I/O │ Provenance Tracking │ Competing Claims Detection │
└─────────────────────────────────────────────────────────────────────┘
Core Stack:
- Polars — Rust-powered DataFrames for blazing performance
- FastAPI — Modern async REST API framework
- pyparsing — SPARQL-Star parser
- Pydantic — Data model validation
- D3.js — Graph visualization
- PyArrow — Parquet persistence
Performance
RDF-StarBase leverages Polars' Rust backend for:
- Vectorized operations on millions of triples
- Lazy evaluation for query optimization
- Zero-copy reads from Parquet
- Parallel execution across cores
Web API
Start the server:
# Using uvicorn directly
uvicorn rdf_starbase.web:app --reload
# Or with the module
python -m rdf_starbase.web
Then open:
- API Docs: http://localhost:8000/docs
- ReDoc: http://localhost:8000/redoc
REST Endpoints
| Endpoint | Method | Description |
|---|---|---|
/triples |
GET | Query triples with filters |
/triples |
POST | Add new triple with provenance |
/triples/{subject}/claims |
GET | Get competing claims |
/sparql |
POST | Execute SPARQL-Star query |
/sources |
GET/POST | Manage data sources |
/graph/nodes |
GET | Visualization data |
/graph/edges |
GET | Graph edges |
/stats |
GET | Database statistics |
🤖 AI Grounding API
A specialized API layer designed for AI/LLM consumption, separate from the UI visualization endpoints:
| Endpoint | Method | Description |
|---|---|---|
/ai/query |
POST | Structured fact retrieval with provenance for RAG |
/ai/verify |
POST | Verify if a claim is supported by the knowledge base |
/ai/context/{iri} |
GET | Get all facts about an entity with citations |
/ai/materialize |
POST | Trigger reasoning and persist inferences |
/ai/inferences |
GET | List materialized inferences |
/ai/health |
GET | AI API health check |
Why a Separate AI API?
| Aspect | UI API (/graph/*) |
AI Grounding API (/ai/*) |
|---|---|---|
| Consumer | D3.js visualization | LLM tool calls / agents |
| Response format | Nodes + edges for rendering | Facts + provenance + citations |
| Query pattern | Browsing, neighborhood exploration | Precise fact lookup, verification |
| Filtering | Limit by count, visual simplicity | Confidence threshold, freshness |
Example: Grounding an AI Response
import httpx
# 1. Query relevant facts for RAG
response = httpx.post("http://localhost:8000/ai/query", json={
"subject": "http://example.org/customer/123",
"min_confidence": "high", # high (>=0.9), medium (>=0.7), low (>=0.5), any
"max_age_days": 30, # Only recent facts
})
facts = response.json()["facts"]
# 2. Verify a claim before stating it
verify = httpx.post("http://localhost:8000/ai/verify", json={
"subject": "http://example.org/customer/123",
"predicate": "http://xmlns.com/foaf/0.1/age",
"expected_object": "34",
})
result = verify.json()
if result["claim_supported"]:
print(f"Claim verified with {result['confidence']:.0%} confidence")
elif result["has_conflicts"]:
print("Warning: Competing claims exist!")
print(result["recommendation"])
# 3. Get full entity context
context = httpx.get("http://localhost:8000/ai/context/http://example.org/customer/123")
entity_facts = context.json()["facts"]
related = context.json()["related_entities"]
Inference Materialization
Materialize RDFS/OWL inferences with provenance tracking:
# Run reasoning engine and persist inferred triples
response = httpx.post("http://localhost:8000/ai/materialize", json={
"enable_rdfs": True, # RDFS entailment rules
"enable_owl": True, # OWL 2 RL rules
"max_iterations": 100,
})
print(f"Inferred {response.json()['triples_inferred']} triples")
# Query inferred facts (marked with source='reasoner')
inferences = httpx.get("http://localhost:8000/ai/inferences")
for fact in inferences.json()["inferences"]:
print(f"Inferred: {fact['subject']} {fact['predicate']} {fact['object']}")
📋 Assertion Registry
Track data sources as first-class entities:
from rdf_starbase import AssertionRegistry, SourceType
registry = AssertionRegistry()
# Register a data source
source = registry.register_source(
name="CRM_Production",
source_type=SourceType.API,
uri="https://api.crm.example.com/v2",
owner="sales-team",
tags=["production", "customer-data"],
)
# Track sync runs
run = registry.start_sync(source.id)
# ... perform sync ...
registry.complete_sync(run.id, records_processed=1000)
# Get sync history
history = registry.get_sync_history(source.id)
🧪 Development
# Install dev dependencies
pip install -e ".[dev]"
# Run tests
pytest tests/ -v
# Run with coverage
pytest tests/ --cov=src/rdf_starbase
# Format code
black src/ tests/
ruff check src/ tests/
📊 Frontend (React + D3)
cd frontend
npm install
npm run dev
Then open http://localhost:3000 (proxies API to :8000)
📚 Examples
See the examples/ directory:
quickstart.py— Core features demonstrationcompeting_claims.py— Handling conflicting data from multiple sourcessparql_queries.py— SPARQL-Star query examplesregistry_demo.py— Assertion Registry usage
🗺️ Roadmap
✅ Completed (MVP)
- Native RDF-Star storage
- Provenance tracking (source, timestamp, confidence, process)
- Competing claims detection
- SPARQL-Star parser (SELECT, ASK, FILTER, ORDER BY, LIMIT, OFFSET)
- SPARQL-Star executor with Polars backend
- Provenance filter extensions
- Parquet persistence
- Assertion Registry (datasets, APIs, mappings)
- REST API with FastAPI
- React + D3 graph visualization
✅ Completed (Advanced Query Features)
- OPTIONAL patterns (left outer joins)
- UNION patterns (combine result sets)
- MINUS patterns (set difference)
- FILTER expressions (comparisons, boolean logic, regex, string functions)
- BIND clauses (variable assignment, expressions, functions)
- VALUES inline data
- Aggregate functions (COUNT, SUM, AVG, MIN, MAX, GROUP_CONCAT, SAMPLE)
- GROUP BY and HAVING
- CONSTRUCT queries (template-based triple generation)
- DESCRIBE queries (resource description)
- SPARQL UPDATE (INSERT DATA, DELETE DATA, DELETE WHERE, DELETE/INSERT WHERE)
- OWL reasoning (rdfs:subClassOf, owl:sameAs, owl:inverseOf, owl:TransitiveProperty)
- Property path queries (
/,|,^,*,+,?) - Time-travel queries (
AS OF "2025-01-15T00:00:00Z") - AI Grounding API (
/ai/query,/ai/verify,/ai/context) - Inference materialization (
/ai/materialize,/ai/inferences) - Named Graph Management (CREATE, DROP, CLEAR, LOAD, COPY, MOVE, ADD)
- FROM clause dataset specification
- GRAPH pattern queries
🔜 Next
- Trust scoring and decay
🚀 Future
- Federation across instances
- Governance workflows
📄 License
MIT License — see LICENSE for details.
🙏 Acknowledgments
- Polars — The lightning-fast DataFrame library
- RDF-Star Working Group — For the specification
- FastAPI — Modern Python web framework
- D3.js — Data visualization library
- pyparsing — Parser combinators for Python
RDF-StarBase — The place where enterprises store beliefs, not just data.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rdf_starbase-0.2.0.tar.gz.
File metadata
- Download URL: rdf_starbase-0.2.0.tar.gz
- Upload date:
- Size: 288.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0952f14978e44bed7a9b9e25a3bd7399bc3bcfccb3b05c88805fa68707e5f06
|
|
| MD5 |
9c99501dc9a5e4578636b5cac0b78dce
|
|
| BLAKE2b-256 |
efe1ab49e569d28e78080532a7329dd6157942f7ad18f4972185bd27939bf44f
|
File details
Details for the file rdf_starbase-0.2.0-py3-none-any.whl.
File metadata
- Download URL: rdf_starbase-0.2.0-py3-none-any.whl
- Upload date:
- Size: 167.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
92161512f391bac0980f31b1c702d9372cdc0605318902b5399559128dbffe91
|
|
| MD5 |
cab823f42c83c5b3e8c20b8e31a45f29
|
|
| BLAKE2b-256 |
eb6495485aca3a625e6583c69d3b8d59cc3ba54da2800e189238f5dc4ce9fbba
|