Skip to main content

An SQLAlchemy-like ORM for SPARQL endpoints.

Project description

SPARQLMojo

An SQLAlchemy-like ORM for SPARQL endpoints with Pydantic validation. Currently in beta, so there may be breaking changes.

Table of Contents

Features

  • Declarative RDF models using Python classes with Pydantic validation
  • Type-safe field definitions with automatic validation
  • A session layer for querying and updating SPARQL endpoints
  • A query compiler that converts Pythonic queries to SPARQL
  • Session identity map to prevent duplicate instances and ensure consistency
  • PREFIX management system for namespace handling with short-form IRIs
  • Language-tagged literal support for multilingual text data
  • Property path support with ORM-like convenience methods and inverse path support for reverse relationship traversal
  • Field-level filtering with intuitive syntax and automatic datatype casting for numeric comparisons
  • String filtering on IRI fields with chainable str(), lower(), upper() methods for case-insensitive matching
  • Ontology-aware models with SchemaRegistry for automatic inverse relationship discovery via owl:inverseOf
  • InverseField for clean, semantic reverse relationship navigation with automatic fallback to SPARQL ^ operator
  • Class hierarchy support with automatic polymorphic queries — querying a base class returns all subclass instances without any extra configuration

Installation

# Install dependencies
poetry install

# Or install the package in editable mode
pip install -e .

Version

Check the installed version:

import sparqlmojo
print(sparqlmojo.__version__)  # Output: 0.1.0

Or from the command line:

python -c "import sparqlmojo; print(sparqlmojo.__version__)"

Versioning Workflow

This project uses semantic versioning with automated releases. See the Release Process section for details on creating releases.

Usage

from typing import Annotated

from sparqlmojo import (
    Condition,
    InverseField,
    IRIField,
    LiteralField,
    Model,
    ObjectPropertyField,
    RDF_TYPE,
    SchemaRegistry,
    Session,
    SPARQLCompiler,
    SubjectField,
)


class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None
    knows: Annotated[str | None, ObjectPropertyField("schema:knows", range_="Person")] = None


# Create a session
s = Session(endpoint="http://example.org/sparql")

# For endpoints with separate read/write URLs (e.g., Fuseki):
# s = Session(
#     endpoint="http://example.org/sparql",           # For SELECT queries
#     write_endpoint="http://example.org/update"      # For INSERT/DELETE/UPDATE
# )

# Configure HTTP method for SELECT queries (see "HTTP Method Configuration" below):
# s = Session(endpoint="http://example.org/sparql", query_method="GET")

# Build and compile a query
q = s.query(Person).filter(Condition("age", ">", 30)).limit(5)
sparql = SPARQLCompiler.compile_query(q)
print(sparql)

# Create an instance with validation
bob = Person(iri="http://example.org/bob", name="Bob", age=28)
s.add(bob)
s.commit()

# Pydantic validates types automatically
try:
    invalid = Person(iri="http://example.org/alice", name="Alice", age="not a number")  # Raises ValidationError
except Exception as e:
    print(f"Validation error: {e}")

HTTP Method Configuration

SPARQLMojo supports configurable HTTP methods for SPARQL SELECT queries. By default, POST is used to avoid URL length limitations with large queries.

Query Methods

Method Description Use Case
POST Use HTTP POST for SELECT queries (default) Recommended for most cases; avoids URL length issues
GET Use HTTP GET for SELECT queries Required by some read-only endpoints; better caching

Configuration

from sparqlmojo import Session

# Default: Always use POST (safest option)
session = Session(endpoint="http://example.org/sparql")
# or explicitly:
session = Session(endpoint="http://example.org/sparql", query_method="POST")

# Use GET (for endpoints that require it or for caching benefits)
session = Session(endpoint="http://example.org/sparql", query_method="GET")

When to Use Each Mode

POST (Default)

  • Recommended for most applications
  • No risk of HTTP 414 "URI Too Long" errors
  • Works with queries of any size, including large VALUES clauses
  • Some proxies/CDNs may not cache POST requests

GET

  • Better HTTP caching (responses can be cached by proxies)
  • Required by some read-only SPARQL endpoints
  • Risk of HTTP 414 errors with large queries (URLs > 2000 characters)
  • Query is visible in server access logs (potential security consideration)

Note: UPDATE queries (INSERT, DELETE) always use POST regardless of this setting, as required by the SPARQL protocol.

Identity Map

SPARQLMojo now includes a Session identity map to prevent duplicate instances and ensure consistency:

# First retrieval creates new instance
person1 = session.get(Person, "http://example.org/bob")

# Second retrieval returns the SAME instance (not a duplicate)
person2 = session.get(Person, "http://example.org/bob")

assert person1 is person2  # True - same object reference

# Changes to one reference are visible in all references
person1.name = "Robert"
print(person2.name)  # "Robert" - same object

Benefits

  • Memory Efficiency: Uses weak references for automatic garbage collection
  • Consistency: All operations on the same entity work with the same object
  • Performance: Avoids creating duplicate objects for the same entity
  • Automatic Management: No manual cache management required

Manual Cache Management

# Remove specific instance from identity map
session.expunge(person)

# Clear all instances from identity map
session.expunge_all()

PREFIX Management System

SPARQLMojo now includes a comprehensive PREFIX management system for namespace handling:

Features

  • Built-in Common Prefixes: schema, foaf, rdf, rdfs, owl, xsd, dc, dcterms, skos, ex
  • Custom Prefix Registration: Add your own namespace prefixes
  • Short-form IRI Support: Use schema:Person instead of full IRIs
  • Automatic PREFIX Declarations: SPARQL queries include proper PREFIX clauses
  • IRI Expansion/Contraction: Convert between short-form and full IRIs

Usage

from typing import Annotated

from sparqlmojo import IRIField, LiteralField, Model, RDF_TYPE, Session, SubjectField

# Define model with short-form IRIs
class Person(Model):
    rdf_type: Annotated[str, IRIField(RDF_TYPE, default="schema:Person")]
    iri: Annotated[str, SubjectField()]
    name: Annotated[str | None, LiteralField("schema:name")] = None
    age: Annotated[int | None, LiteralField("schema:age")] = None

# Create session with built-in prefix registry
session = Session()

# Register custom prefix
session.register_prefix("my", "http://example.org/my/")

# Query generation with automatic PREFIX declarations
query = session.query(Person)
sparql = query.compile()
# Generates: PREFIX schema: <http://schema.org/> ...

# IRI expansion/contraction
expanded = session.expand_iri("schema:Person")  # "http://schema.org/Person"
contracted = session.contract_iri("http://schema.org/Person")  # "schema:Person"

Benefits

  • Improved Developer Experience: No need to write full IRIs everywhere
  • Better Readability: Code is more concise and understandable
  • Easy Maintenance: Update namespace URIs in one place
  • Standards Compliance: Generates proper SPARQL PREFIX declarations

Language-Tagged Literals

SPARQLMojo supports language-tagged literals via LangString and MultiLangString fields for multilingual text data with BCP 47 language tag validation.

Full documentation

Collection Fields

SPARQLMojo supports collection fields (LiteralList, LangStringList, IRIList, TypedLiteralList) for aggregating multiple values from multi-valued RDF properties into Python lists, with support for filtering, size limiting, and efficient multi-field queries.

Full documentation

UPDATE Operations

SPARQLMojo supports UPDATE operations with dirty tracking, as well as batch inserts, updates, and deletes with automatic chunking for large datasets.

Full documentation

Running Tests

# Run all tests
poetry run pytest

# Run specific test file
poetry run pytest tests/test_basic.py

See Also: Test Fixtures Documentation for comprehensive documentation of shared fixtures, test models, and test organization.

Test Dataset

The project includes a comprehensive library management test dataset in tests/fixtures/library.ttl with Books, Users, and Checkout Records, along with worked examples showing how Python model instances translate to RDF triples.

Full documentation

Limitations

This is a prototype with several intentional limitations:

  • No transaction support: Simple staging mechanism for inserts only
  • No conflict resolution: Basic operations only
  • Not production-ready: Focuses on demonstrating design patterns

For real-world use, consider adding:

  • Proper literal typing
  • Better parsing of results
  • Streaming results and pagination
  • Transaction support

Known Issues and Risks

Pydantic Internal API Dependency

SPARQLMojo uses Pydantic's internal ModelMetaclass to enable the intuitive field-level filtering syntax:

# This clean syntax is powered by the custom metaclass
query.filter(Person.name == "Alice")
query.filter(Product.price > 100)

The Risk: The metaclass is imported from Pydantic's private internal API:

from pydantic._internal._model_construction import ModelMetaclass as PydanticModelMetaclass

The _internal prefix indicates this is not part of Pydantic's public API and could change without notice in any Pydantic release. According to the Pydantic maintainers, they "want to be able to refactor the ModelMetaclass without it being considered a breaking change."

What This Means:

  • ⚠️ No stability guarantees: The metaclass implementation may change in minor/patch releases
  • ⚠️ No deprecation warnings: Changes won't be announced in advance
  • ⚠️ Potential breakage: Any Pydantic update could require code changes

Mitigation Strategy:

  1. Pin Pydantic version carefully in production environments
  2. Test thoroughly after any Pydantic updates before upgrading
  3. Fallback available: If the metaclass breaks, fall back to the less elegant method-based approach:
    # Alternative syntax that doesn't depend on private APIs
    query.filter(Person._get_field_filter("name") == "Alice")
    

Why We Use It Anyway: The UX benefit of the SQLAlchemy-like syntax is significant for a prototype focused on design clarity. For production use, consider the risk-reward tradeoff for your specific needs.

References:

VALUES Clause Support

SPARQLMojo supports the SPARQL VALUES clause for efficient query constraints with explicit value sets, via both an ORM-style field-reference API and a dict-style API for multi-variable bindings.

Full documentation

Property Paths

SPARQLMojo supports SPARQL property paths for advanced relationship traversal, with ORM-like convenience methods (transitive, zero_or_more, inverse, etc.) and a PropertyPath escape hatch for complex expressions.

Full documentation

Ontology-Aware Models with SchemaRegistry

SPARQLMojo provides ontology-aware modeling through SchemaRegistry, enabling automatic inverse relationship discovery via owl:inverseOf and compile-time schema validation (domain, range, cardinality).

Full documentation

Class Hierarchy Support

SPARQLMojo supports rdfs:subClassOf class hierarchies — querying a base class automatically returns all registered subclass instances via polymorphic VALUES ?type queries, with no extra configuration required.

Full documentation

Field-Level Filtering

SPARQLMojo provides intuitive field-level filtering similar to SQLAlchemy, with Python comparison operators, automatic datatype casting, chainable string methods for IRI fields, and logical operators (and_, or_, not_).

Full documentation

Release Process

SPARQLMojo uses a tag-based release workflow with automated CHANGELOG management and Codeberg Releases.

Workflow Overview

  1. During Development: Update CHANGELOG.md in the [Unreleased] section when creating merge requests
  2. Accumulate Changes: Multiple MRs can add to [Unreleased] before a release
  3. Create Release: Tag the commit to trigger automated release creation

For Contributors (Merge Request Time)

When creating a merge request, update CHANGELOG.md under the [Unreleased] section:

## [Unreleased]

### Fixed
- Issue #123: Fixed bug in query compilation

### Added
- New feature for advanced filtering

### Changed
- Improved performance of batch operations

Follow Keep a Changelog format with sections:

  • Fixed - Bug fixes
  • Added - New features
  • Changed - Changes to existing functionality
  • Deprecated - Soon-to-be removed features
  • Removed - Removed features
  • Security - Security fixes

For Maintainers (Release Time)

When ready to release a new version:

# 1. Preview release notes and create tag
./scripts/tag-release.sh v0.12.0

# 2. Push the tag to trigger CI/CD automation
git push origin v0.12.0

The CI/CD workflow (.gitea/workflows/release.yml) automatically:

  • Extracts release notes from [Unreleased] section
  • Updates CHANGELOG.md ([Unreleased][0.12.0] - 2026-03-05)
  • Adds new empty [Unreleased] section at the top
  • Commits and pushes CHANGELOG update to main
  • Creates Codeberg release with extracted notes

Manual Alternative (if CI/CD unavailable):

# 1. Create and push tag
git tag v0.12.0 && git push origin v0.12.0

# 2. Run publish script manually
./scripts/publish-release.sh v0.12.0

# 3. Push CHANGELOG update
git push origin main

Release Scripts

  • tag-release.sh - Create annotated tag with release notes preview
  • publish-release.sh - Update CHANGELOG and publish to Codeberg
  • create-release.sh - Legacy all-in-one script (use tag-release.sh instead)

See scripts/README.md for detailed documentation.

Version Format

Use semantic versioning: vMAJOR.MINOR.PATCH

  • MAJOR: Breaking changes
  • MINOR: New features (backward compatible)
  • PATCH: Bug fixes (backward compatible)

Examples: v0.11.0, v1.0.0, v1.2.3

Dependencies

  • pydantic>=2.12.4 - Data validation and type checking
  • SPARQLWrapper>=2.0.0 - SPARQL endpoint communication
  • rdflib>=6.0.0 - RDF graph parsing and manipulation

Key Benefits of Pydantic Integration

  • Type Safety: Fields are validated at runtime against their type annotations
  • Better IDE Support: Full autocomplete and type hints in modern IDEs
  • Clear Error Messages: Pydantic provides detailed validation errors
  • Automatic Coercion: Compatible types are automatically converted (e.g., "123"123 for int fields)
  • Extra Field Protection: Unknown fields are rejected by default

License

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sparqlmojo-0.15.2.tar.gz (99.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

sparqlmojo-0.15.2-py3-none-any.whl (109.5 kB view details)

Uploaded Python 3

File details

Details for the file sparqlmojo-0.15.2.tar.gz.

File metadata

  • Download URL: sparqlmojo-0.15.2.tar.gz
  • Upload date:
  • Size: 99.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.12.57+deb13-amd64

File hashes

Hashes for sparqlmojo-0.15.2.tar.gz
Algorithm Hash digest
SHA256 9bcf5e235f8d337886f4b7dafa9fc9cdaad571c21b90eb47eb764351d33eb651
MD5 33033cecde9cbc749a48e01c6c1027cb
BLAKE2b-256 755bd43f82b38e7e789f2d2466e9be0bc32b2ed1b7a8ad242aa7e624e652bc60

See more details on using hashes here.

File details

Details for the file sparqlmojo-0.15.2-py3-none-any.whl.

File metadata

  • Download URL: sparqlmojo-0.15.2-py3-none-any.whl
  • Upload date:
  • Size: 109.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/2.3.2 CPython/3.12.13 Linux/6.12.57+deb13-amd64

File hashes

Hashes for sparqlmojo-0.15.2-py3-none-any.whl
Algorithm Hash digest
SHA256 465b846cea6e01fa2e83d18048dddcdba11c1c74eeaf74dc8c360a2680b11d10
MD5 cc8ff10b3dc6d5a165e51eb4a16ec7b5
BLAKE2b-256 9604c7c24cbc0530cf0be607d5049a31cdfb2a94ebf464ef0cb6981678dcb86c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page