Agentic ontology and knowledge graph co-generation
Project description
OntoCast 
Agentic ontology-assisted framework for semantic triple extraction
Overview
OntoCast is a framework for extracting semantic triples (creating a Knowledge Graph) from documents using an agentic, ontology-driven approach. It combines ontology management, natural language processing, and knowledge graph serialization to turn unstructured text into structured, queryable data.
Key Features
- Ontology-Guided Extraction: Ensures semantic consistency and co-evolves ontologies
- Entity Disambiguation: Resolves references across document chunks
- Multi-Format Support: Handles text, JSON, PDF, and Markdown
- Semantic Chunking: Splits text based on semantic similarity
- MCP Compatibility: Implements Model Control Protocol endpoints
- RDF Output: Produces standardized RDF/Turtle
- Triple Store Integration: Supports Neo4j (n10s) and Apache Fuseki
- Hierarchical Configuration: Type-safe configuration system with environment variable support
- CLI Parameters: Flexible command-line interface with
--skip-ontology-critiqueoption - Automatic LLM Caching: Built-in response caching for improved performance and cost reduction
- GraphUpdate Operations: Token-efficient SPARQL-based updates instead of full graph regeneration
- Budget Tracking: Comprehensive tracking of LLM usage and triple generation metrics
- Ontology Versioning: Automatic semantic versioning with hash-based lineage tracking
Applications
OntoCast can be used for:
- Knowledge Graph Construction: Build domain-specific or general-purpose knowledge graphs from documents
- Semantic Search: Power search and retrieval with structured triples
- GraphRAG: Enable retrieval-augmented generation over knowledge graphs (e.g., with LLMs)
- Ontology Management: Automate ontology creation, validation, and refinement
- Data Integration: Unify data from diverse sources into a semantic graph
Installation
uv add ontocast
# or
pip install ontocast
Quick Start
1. Configuration
Create a .env file with your configuration:
# LLM Configuration
LLM_PROVIDER=openai
LLM_API_KEY=your-api-key-here
LLM_MODEL_NAME=gpt-4o-mini
LLM_TEMPERATURE=0.1
# Server Configuration
PORT=8999
MAX_VISITS=3
RECURSION_LIMIT=1000
ESTIMATED_CHUNKS=30
ONTOLOGY_MAX_TRIPLES=10000
# Path Configuration
ONTOCAST_WORKING_DIRECTORY=/path/to/working
ONTOCAST_ONTOLOGY_DIRECTORY=/path/to/ontologies
ONTOCAST_CACHE_DIR=/path/to/cache
# Optional: Triple Store Configuration
FUSEKI_URI=http://localhost:3032/test
FUSEKI_AUTH=admin:password
FUSEKI_DATASET=ontocast
# Optional: Skip ontology critique
SKIP_ONTOLOGY_DEVELOPMENT=false
# Optional: Maximum triples allowed in ontology graph (set empty for unlimited)
ONTOLOGY_MAX_TRIPLES=10000
2. Start Server
ontocast \
--env-path .env \
--working-directory /path/to/working \
--ontology-directory /path/to/ontologies
3. Process Documents
curl -X POST http://localhost:8999/process -F "file=@document.pdf"
4. API Endpoints
The OntoCast server provides the following endpoints:
-
POST /process: Process documents and extract semantic triples
curl -X POST http://localhost:8999/process -F "file=@document.pdf"
-
POST /flush: Flush/clean triple store data
# Clean all datasets (Fuseki) or entire database (Neo4j) curl -X POST http://localhost:8999/flush # Clean specific Fuseki dataset curl -X POST "http://localhost:8999/flush?dataset=my_dataset"
Note: For Fuseki, you can specify a
datasetquery parameter to clean a specific dataset. If omitted, all datasets are cleaned. For Neo4j, thedatasetparameter is ignored and all data is deleted. -
GET /health: Health check endpoint
-
GET /info: Service information endpoint
LLM Caching
OntoCast includes automatic LLM response caching to improve performance and reduce API costs. Caching is enabled by default and requires no configuration.
Cache Locations
- Tests:
.test_cache/llm/in the current working directory - Windows:
%USERPROFILE%\AppData\Local\ontocast\llm\ - Unix/Linux:
~/.cache/ontocast/llm/(or$XDG_CACHE_HOME/ontocast/llm/)
Benefits
- Faster Execution: Repeated queries return cached responses instantly
- Cost Reduction: Identical requests don't hit the LLM API
- Offline Capability: Tests can run without API access if responses are cached
- Transparent: No configuration required - works automatically
Custom Cache Directory
If you need to specify a custom cache directory:
from pathlib import Path
from ontocast.tool.llm import LLMTool
# Cache directory is managed automatically by Cacher
llm_tool = LLMTool.create(
config=llm_config
)
Configuration System
OntoCast uses a hierarchical configuration system built on Pydantic BaseSettings:
Environment Variables
| Variable | Description | Default | Required |
|---|---|---|---|
LLM_API_KEY |
API key for LLM provider | - | Yes |
LLM_PROVIDER |
LLM provider (openai, ollama) | openai | No |
LLM_MODEL_NAME |
Model name | gpt-4o-mini | No |
LLM_TEMPERATURE |
Temperature setting | 0.1 | No |
ONTOCAST_WORKING_DIRECTORY |
Working directory path | - | Yes |
ONTOCAST_ONTOLOGY_DIRECTORY |
Ontology files directory | - | No |
PORT |
Server port | 8999 | No |
MAX_VISITS |
Maximum visits per node | 3 | No |
SKIP_ONTOLOGY_DEVELOPMENT |
Skip ontology critique | false | No |
ONTOLOGY_MAX_TRIPLES |
Maximum triples allowed in ontology graph | 10000 | No |
SKIP_FACTS_RENDERING |
Skip facts rendering and go straight to aggregation | false | No |
ONTOCAST_CACHE_DIR |
Custom cache directory for LLM responses | Platform default | No |
Triple Store Configuration
# Fuseki (Preferred)
FUSEKI_URI=http://localhost:3032/test
FUSEKI_AUTH=admin:password
FUSEKI_DATASET=dataset_name
# Neo4j (Alternative)
NEO4J_URI=bolt://localhost:7689
NEO4J_AUTH=neo4j:password
CLI Parameters
# Skip ontology critique step
ontocast --skip-ontology-critique
# Process only first N chunks (for testing)
ontocast --head-chunks 5
Triple Store Setup
OntoCast supports multiple triple store backends with automatic fallback:
- Apache Fuseki (Recommended) - Native RDF with SPARQL support
- Neo4j with n10s - Graph database with RDF capabilities
- Filesystem (Fallback) - Local file-based storage
When multiple triple stores are configured, Fuseki is preferred over Neo4j.
Quick Setup with Docker
Fuseki:
cd docker/fuseki
cp .env.example .env
# Edit .env with your values
docker compose --env-file .env fuseki up -d
Neo4j:
cd docker/neo4j
cp .env.example .env
# Edit .env with your values
docker compose --env-file .env neo4j up -d
See Triple Store Setup for detailed instructions.
Documentation
- Quick Start Guide - Get started quickly
- Configuration System - Detailed configuration guide
- Triple Store Setup - Triple store configuration
- User Guide - Core concepts and workflow
- API Reference - Detailed API documentation
Recent Changes
Ontology Management Improvements
- Automatic Versioning: Semantic version increment based on change analysis (MAJOR/MINOR/PATCH)
- Hash-Based Lineage: Git-style versioning with parent hashes for tracking ontology evolution
- Multiple Version Storage: Versions stored as separate named graphs in Fuseki triple stores
- Timestamp Tracking:
updated_atfield tracks when ontology was last modified - Smart Version Analysis: Analyzes ontology changes (classes, properties, instances) to determine appropriate version bump
GraphUpdate System
- Token Efficiency: LLM outputs structured SPARQL operations (insert/delete) instead of full TTL graphs
- Incremental Updates: Only changes are generated, dramatically reducing token usage
- Structured Operations: TripleOp operations with explicit prefix declarations for precise updates
- SPARQL Generation: Automatic conversion of operations to executable SPARQL queries
Budget Tracking
- LLM Statistics: Tracks API calls, characters sent/received for cost monitoring
- Triple Metrics: Tracks ontology and facts triples generated per operation
- Summary Reports: Budget summaries logged at end of processing
- Integrated Tracking: Budget tracker integrated into AgentState for clean dependency injection
Configuration System Overhaul
- Hierarchical Configuration: New
ToolConfigandServerConfigstructure - Environment Variables: Support for
.envfiles and environment variables - Type Safety: Full type safety with Python 3.12 union syntax
- API Key: Changed from
OPENAI_API_KEYtoLLM_API_KEYfor consistency - Dependency Injection: Removed global variables, implemented proper DI
Enhanced Features
- CLI Parameters: New
--skip-ontology-critiqueand--skip-facts-renderingparameters - RDFGraph Operations: Improved
__iadd__method with proper prefix binding - Triple Store Management: Better separation between filesystem and external stores
- Serialization Interface: Unified
serialize()method for storing Ontology and RDFGraph objects - Error Handling: Improved error handling and validation
See CHANGELOG.md for complete details.
Examples
Basic Usage
from ontocast.config import Config
from ontocast.toolbox import ToolBox
# Load configuration
config = Config()
# Initialize tools
tools = ToolBox(config)
# Process documents
# ... (use tools for processing)
Server Usage
# Start server with custom configuration
ontocast \
--env-path .env \
--working-directory /data/working \
--ontology-directory /data/ontologies \
--skip-ontology-critique \
--head-chunks 10
Contributing
We welcome contributions! Please see our Contributing Guide for details.
License
This project is licensed under the Apache License 2.0 - see the LICENSE file for details.
Support
- Documentation: docs/
- Issues: GitHub Issues
- Discussions: GitHub Discussions
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ontocast-0.2.5.tar.gz.
File metadata
- Download URL: ontocast-0.2.5.tar.gz
- Upload date:
- Size: 16.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f15538068215e0b6835f9b1070108d3efa46b5c60a28144e1a2b286ef6ed5e21
|
|
| MD5 |
cb45b3f330f8d0bcf8790bdf03b3a574
|
|
| BLAKE2b-256 |
43af78cf615e5725e63de48cc1496e6228da6e25a8454f46d1ace25d06baad65
|
File details
Details for the file ontocast-0.2.5-py3-none-any.whl.
File metadata
- Download URL: ontocast-0.2.5-py3-none-any.whl
- Upload date:
- Size: 13.0 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9a04640625dbcd95ed8ac8d0f26bcc654d4b58241cb38a4f553b4b56a56b7c2e
|
|
| MD5 |
65b1428c379d6cd4b060b4bb30ffe89b
|
|
| BLAKE2b-256 |
1610ab182484301cc30a9a4a855c5be82605c659e5493af996ae2b9aadee8f9c
|