Skip to main content

๐Ÿง  Semantica - An Open Source Framework for building Semantic Layers and Knowledge Engineering

Project description

Semantica Logo

๐Ÿง  Semantica

Open-Source Semantic Layer & Knowledge Engineering Framework

Python 3.8+ License: MIT PyPI Monthly Downloads Total Downloads CI Discord

โญ Give us a Star โ€ข ๐Ÿด Fork us โ€ข ๐Ÿ’ฌ Join our Discord

Transform Choas into Intelligence. Build AI systems that are explainable, traceable, and trustworthy โ€” not black boxes.


๐Ÿš€ Why Semantica?

Semantica bridges the semantic gap between text similarity and true meaning. It's the semantic intelligence layer that makes your AI agents auditable, explainable, and compliant.

Perfect for high-stakes domains where mistakes have real consequences.


โšก Get Started in 30 Seconds

pip install semantica
from semantica.semantic_extract import NERExtractor
from semantica.kg import GraphBuilder

# Extract entities and build knowledge graph
ner = NERExtractor(method="ml", model="en_core_web_sm")
entities = ner.extract("Apple Inc. was founded by Steve Jobs in 1976.")
kg = GraphBuilder().build({"entities": entities, "relationships": []})

print(f"Built KG with {len(kg.get('entities', []))} entities")

๐Ÿ“– Full Quick Start โ€ข ๐Ÿณ Cookbook Examples โ€ข ๐Ÿ’ฌ Join Discord โ€ข โญ Star Us


Core Value Proposition

Trustworthy Explainable Auditable
Conflict detection & validation Transparent reasoning paths Complete provenance tracking
Rule-based governance Entity relationships & ontologies Source-level provenance
Production-grade QA Multi-hop graph reasoning Audit-ready compliance

Key Features & Benefits

Not Just Another Agentic Framework

Semantica complements LangChain, LlamaIndex, AutoGen, CrewAI, Google ADK, Agno, and other frameworks to enhance your agents with:

Feature Benefit
Auditable Complete provenance tracking with full audit trails
Explainable Transparent reasoning paths with entity relationships
Provenance-Aware Source-level provenance from documents to responses
Validated Built-in conflict detection, deduplication, QA
Governed Rule-based validation and semantic consistency

Perfect For High-Stakes Use Cases

๐Ÿฅ Healthcare ๐Ÿ’ฐ Finance โš–๏ธ Legal
Clinical decisions Fraud detection Evidence-backed research
Drug interactions Regulatory compliance Contract analysis
Patient safety Risk assessment Case law reasoning
๐Ÿ”’ Cybersecurity ๐Ÿ›๏ธ Government ๐Ÿญ Infrastructure ๐Ÿš— Autonomous
Threat attribution Policy decisions Power grids Decision logs
Incident response Classified info Transportation Safety validation

Powers Your AI Stack

  • GraphRAG Systems โ€” Retrieval with graph reasoning and hybrid search
  • AI Agents โ€” Trustworthy, accountable multi-agent systems with semantic memory
  • Reasoning Models โ€” Explainable AI decisions with reasoning paths
  • Enterprise AI โ€” Governed, auditable platforms for compliance

Integrations

  • Docling Support โ€” Document parsing with table extraction (PDF, DOCX, PPTX, XLSX)
  • AWS Neptune โ€” Amazon Neptune graph database support with IAM authentication
  • Custom Ontology Import โ€” Import existing ontologies (OWL, RDF, Turtle, JSON-LD)

Built for environments where every answer must be explainable and governed.


๐Ÿšจ The Problem: The Semantic Gap

Most AI systems fail in high-stakes domains because they operate on text similarity, not meaning.

Understanding the Semantic Gap

The semantic gap is the fundamental disconnect between what AI systems can process (text patterns, vector similarities) and what high-stakes applications require (semantic understanding, meaning, context, and relationships).

Traditional AI approaches:

  • Rely on statistical patterns and text similarity
  • Cannot understand relationships between entities
  • Cannot reason about domain-specific rules
  • Cannot explain why decisions were made
  • Cannot trace back to original sources with confidence

High-stakes AI requires:

  • Semantic understanding of entities and their relationships
  • Domain knowledge encoded as formal rules (ontologies)
  • Explainable reasoning paths
  • Source-level provenance
  • Conflict detection and resolution

Semantica bridges this gap by providing a semantic intelligence layer that transforms unstructured data into validated, explainable, and auditable knowledge.

What Organizations Have vs What They Need

Current State Required for High-Stakes AI
PDFs, DOCX, emails, logs Formal domain rules (ontologies)
APIs, databases, streams Structured and validated entities
Conflicting facts and duplicates Explicit semantic relationships
Siloed systems with no lineage Explainable reasoning paths
Source-level provenance
Audit-ready compliance

The Cost of Missing Semantics

  • Decisions cannot be explained โ€” No transparency in AI reasoning
  • Errors cannot be traced โ€” No way to debug or improve
  • Conflicts go undetected โ€” Contradictory information causes failures
  • Compliance becomes impossible โ€” No audit trails for regulations

Trustworthy AI requires semantic accountability.


๐Ÿ†š Semantica vs Traditional RAG

Feature Traditional RAG Semantica
Reasoning โŒ Black-box answers โœ… Explainable reasoning paths
Provenance โŒ No provenance โœ… Source-level provenance
Search โš ๏ธ Vector similarity only โœ… Semantic + graph reasoning
Quality โŒ No conflict handling โœ… Explicit contradiction detection
Safety โš ๏ธ Unsafe for high-stakes โœ… Designed for governed environments
Compliance โŒ No audit trails โœ… Audit-ready provenance

๐Ÿงฉ Semantica Architecture

1๏ธโƒฃ Input Layer โ€” Governed Ingestion

  • ๐Ÿ“„ Multiple Formats โ€” PDFs, DOCX, HTML, JSON, CSV, Excel, PPTX
  • ๐Ÿ”ง Docling Support โ€” Docling parser for table extraction
  • ๐Ÿ’พ Data Sources โ€” Databases, APIs, streams, archives, web content
  • ๐ŸŽจ Media Support โ€” Image parsing with OCR, audio/video metadata extraction
  • ๐Ÿ“Š Single Pipeline โ€” Unified ingestion with metadata and source tracking

2๏ธโƒฃ Semantic Layer โ€” Trust & Reasoning Engine

  • ๐Ÿ” Entity Extraction โ€” NER, normalization, classification
  • ๐Ÿ”— Relationship Discovery โ€” Triplet generation, semantic links
  • ๐Ÿ“ Ontology Induction โ€” Automated domain rule generation
  • ๐Ÿ”„ Deduplication โ€” Jaro-Winkler similarity, conflict resolution
  • โœ… Quality Assurance โ€” Conflict detection, validation
  • ๐Ÿ“Š Provenance Tracking โ€” Source, time, confidence metadata
  • ๐Ÿง  Reasoning Traces โ€” Explainable inference paths

3๏ธโƒฃ Output Layer โ€” Auditable Knowledge Assets

  • ๐Ÿ“Š Knowledge Graphs โ€” Queryable, temporal, explainable
  • ๐Ÿ“ OWL Ontologies โ€” HermiT/Pellet validated, custom ontology import support
  • ๐Ÿ”ข Vector Embeddings โ€” FastEmbed by default
  • โ˜๏ธ AWS Neptune โ€” Amazon Neptune graph database support
  • ๐Ÿ” Provenance โ€” Every AI response links back to:
    • ๐Ÿ“„ Source documents
    • ๐Ÿท๏ธ Extracted entities & relations
    • ๐Ÿ“ Ontology rules applied
    • ๐Ÿง  Reasoning steps used

๐Ÿฅ Built for High-Stakes Domains

Designed for domains where mistakes have real consequences and every decision must be accountable:

  • ๐Ÿฅ Healthcare & Life Sciences โ€” Clinical decision support, drug interaction analysis, medical literature reasoning, patient safety compliance
  • ๐Ÿ’ฐ Finance & Risk โ€” Fraud detection, regulatory compliance (SOX, GDPR, MiFID II), credit risk assessment, algorithmic trading validation
  • โš–๏ธ Legal & Compliance โ€” Evidence-backed legal research, contract analysis, regulatory change management, case law reasoning
  • ๐Ÿ”’ Cybersecurity & Intelligence โ€” Threat attribution, incident response, security audit trails, intelligence analysis
  • ๐Ÿ›๏ธ Government & Defense โ€” Governed AI systems, policy decisions, classified information handling, defense intelligence
  • ๐Ÿญ Critical Infrastructure โ€” Power grid management, transportation safety, water treatment, emergency response
  • ๐Ÿš— Autonomous Systems โ€” Self-driving vehicles, drone navigation, robotics safety, industrial automation

๐Ÿ‘ฅ Who Uses Semantica?

  • ๐Ÿค– AI / ML Engineers โ€” Building explainable GraphRAG & agents
  • โš™๏ธ Data Engineers โ€” Creating governed semantic pipelines
  • ๐Ÿ“Š Knowledge Engineers โ€” Managing ontologies & KGs at scale
  • ๐Ÿข Enterprise Teams โ€” Requiring trustworthy AI infrastructure
  • ๐Ÿ›ก๏ธ Risk & Compliance Teams โ€” Needing audit-ready systems

๐Ÿ“ฆ Installation

Install from PyPI (Recommended)

pip install semantica
# or
pip install semantica[all]

Install from Source (Development)

# Clone and install in editable mode
git clone https://github.com/Hawksight-AI/semantica.git
cd semantica
pip install -e .

# Or with all optional dependencies
pip install -e ".[all]"

# Development setup
pip install -e ".[dev]"

๐Ÿ“š Resources

New to Semantica? Check out the Cookbook for hands-on examples!

โœจ Core Capabilities

Data Ingestion Semantic Extract Knowledge Graphs Ontology
Multiple Formats Entity & Relations Graph Analytics Auto Generation
Context GraphRAG LLM Providers Pipeline
Agent Memory, Context Graph, Context Retriever Hybrid RAG 100+ LLMs Parallel Workers
QA Reasoning
Conflict Resolution Rule-based Inference

Universal Data Ingestion

Multiple file formats โ€ข PDF, DOCX, HTML, JSON, CSV, databases, feeds, archives

from semantica.ingest import FileIngestor, WebIngestor, DBIngestor

file_ingestor = FileIngestor(recursive=True)
web_ingestor = WebIngestor(max_depth=3)
db_ingestor = DBIngestor(connection_string="postgresql://...")

sources = []
sources.extend(file_ingestor.ingest("documents/"))
sources.extend(web_ingestor.ingest("https://example.com"))
sources.extend(db_ingestor.ingest(query="SELECT * FROM articles"))

print(f" Ingested {len(sources)} sources")

Cookbook: Data Ingestion

Document Parsing & Processing

Multi-format parsing โ€ข Docling Support โ€ข Text normalization โ€ข Intelligent chunking

from semantica.parse import DocumentParser, DoclingParser
from semantica.normalize import TextNormalizer
from semantica.split import TextSplitter

# Standard parsing
parser = DocumentParser()
parsed = parser.parse("document.pdf", format="auto")

# Parsing with Docling (for complex layouts/tables)
# Requires: pip install docling
docling_parser = DoclingParser(enable_ocr=True)
result = docling_parser.parse("complex_table.pdf")

print(f"Text (Markdown): {result['full_text'][:100]}...")
print(f"Extracted {len(result['tables'])} tables")
for i, table in enumerate(result['tables']):
    print(f"Table {i+1} headers: {table.get('headers', [])}")

# Normalize text
normalizer = TextNormalizer()
normalized = normalizer.normalize(parsed, clean_html=True, normalize_entities=True)

# Split into chunks
splitter = TextSplitter(method="token", chunk_size=1000, chunk_overlap=200)
chunks = splitter.split(normalized)

Cookbook: Document Parsing โ€ข Data Normalization โ€ข Chunking & Splitting

Semantic Intelligence Engine

Entity & Relation Extraction โ€ข NER, Relationships, Events, Triplets with LLM Enhancement

from semantica.semantic_extract import NERExtractor, RelationExtractor

text = "Apple Inc., founded by Steve Jobs in 1976, acquired Beats Electronics for $3 billion."

# Extract entities
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
entities = ner_extractor.extract(text)

# Extract relationships
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")
relationships = relation_extractor.extract(text, entities=entities)

print(f"Entities: {len(entities)}, Relationships: {len(relationships)}")

Cookbook: Entity Extraction โ€ข Relation Extraction โ€ข Advanced Extraction

Knowledge Graph Construction

Production-Ready KGs โ€ข Entity Resolution โ€ข Temporal Support โ€ข Graph Analytics

from semantica.semantic_extract import NERExtractor, RelationExtractor
from semantica.kg import GraphBuilder

# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")

entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)

# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})

print(f"Nodes: {len(kg.get('entities', []))}, Edges: {len(kg.get('relationships', []))}")

Cookbook: Building Knowledge Graphs โ€ข Graph Analytics

Embeddings & Vector Store

FastEmbed by default โ€ข Multiple backends โ€ข Semantic search

from semantica.embeddings import EmbeddingGenerator
from semantica.vector_store import VectorStore

# Generate embeddings
embedding_gen = EmbeddingGenerator(model_name="sentence-transformers/all-MiniLM-L6-v2", dimension=384)
embeddings = embedding_gen.generate_embeddings(chunks, data_type="text")

# Store in vector database
vector_store = VectorStore(backend="faiss", dimension=384)
vector_store.store_vectors(vectors=embeddings, metadata=[{"text": chunk} for chunk in chunks])

# Search
results = vector_store.search(query="supply chain", top_k=5)

Cookbook: Embedding Generation โ€ข Vector Store

Graph Store & Triplet Store

Neo4j, FalkorDB, Amazon Neptune โ€ข SPARQL queries โ€ข RDF triplets

from semantica.graph_store import GraphStore
from semantica.triplet_store import TripletStore

# Graph Store (Neo4j, FalkorDB)
graph_store = GraphStore(backend="neo4j", uri="bolt://localhost:7687", user="neo4j", password="password")
graph_store.add_nodes([{"id": "n1", "labels": ["Person"], "properties": {"name": "Alice"}}])

# Amazon Neptune Graph Store (OpenCypher via HTTP with IAM Auth)
neptune_store = GraphStore(
    backend="neptune",
    endpoint="your-cluster.us-east-1.neptune.amazonaws.com",
    port=8182,
    region="us-east-1",
    iam_auth=True,  # Uses AWS credential chain (boto3, env vars, or IAM role)
)

# Node Operations
neptune_store.add_nodes([
    {"labels": ["Person"], "properties": {"id": "alice", "name": "Alice", "age": 30}},
    {"labels": ["Person"], "properties": {"id": "bob", "name": "Bob", "age": 25}},
])

# Query Operations
result = neptune_store.execute_query("MATCH (p:Person) RETURN p.name, p.age")

# Triplet Store (Blazegraph, Jena, RDF4J)
triplet_store = TripletStore(backend="blazegraph", endpoint="http://localhost:9999/blazegraph")
triplet_store.add_triplet({"subject": "Alice", "predicate": "knows", "object": "Bob"})
results = triplet_store.execute_query("SELECT ?s ?p ?o WHERE { ?s ?p ?o } LIMIT 10")

Cookbook: Graph Store โ€ข Triplet Store

Ontology Generation & Management

6-Stage LLM Pipeline โ€ข Automatic OWL Generation โ€ข HermiT/Pellet Validation โ€ข Custom Ontology Import (OWL, RDF, Turtle, JSON-LD)

from semantica.ontology import OntologyGenerator
from semantica.ingest import ingest_ontology

# Generate ontology automatically
generator = OntologyGenerator(llm_provider="openai", model="gpt-4")
ontology = generator.generate_from_documents(sources=["domain_docs/"])

# Or import your existing ontology
custom_ontology = ingest_ontology("my_ontology.ttl")  # Supports OWL, RDF, Turtle, JSON-LD
print(f"Classes: {len(custom_ontology.classes)}")

Cookbook: Ontology

Context Engineering & Memory Systems

Persistent Memory โ€ข Context Graph โ€ข Context Retriever โ€ข Hybrid Retrieval (Vector + Graph) โ€ข Production Graph Store (Neo4j) โ€ข Entity Linking โ€ข Multi-Hop Reasoning

from semantica.context import AgentContext, ContextGraph, ContextRetriever
from semantica.vector_store import VectorStore
from semantica.graph_store import GraphStore
from semantica.llms import Groq

# Initialize Context with Hybrid Retrieval (Graph + Vector)
context = AgentContext(
    vector_store=VectorStore(backend="faiss"),
    knowledge_graph=GraphStore(backend="neo4j"), # Optional: Use persistent graph
    hybrid_alpha=0.75  # 75% weight to Knowledge Graph, 25% to Vector
)

# Build Context Graph from entities and relationships
graph_stats = context.build_graph(
    entities=kg.get('entities', []),
    relationships=kg.get('relationships', []),
    link_entities=True
)

# Store memory with automatic entity linking
context.store(
    "User is building a RAG system with Semantica",
    metadata={"priority": "high", "topic": "rag"}
)

# Use Context Retriever for hybrid retrieval
retriever = context.retriever  # Access underlying ContextRetriever
results = retriever.retrieve(
    query="What is the user building?",
    max_results=10,
    use_graph_expansion=True
)

# Retrieve with context expansion
results = context.retrieve("What is the user building?", use_graph_expansion=True)

# Query with reasoning and LLM-generated responses
llm_provider = Groq(model="llama-3.1-8b-instant", api_key=os.getenv("GROQ_API_KEY"))
reasoned_result = context.query_with_reasoning(
    query="What is the user building?",
    llm_provider=llm_provider,
    max_hops=2
)

Core Components:

  • ContextGraph: Builds and manages context graphs from entities and relationships for enhanced retrieval
  • ContextRetriever: Performs hybrid retrieval combining vector search, graph traversal, and memory for optimal context relevance
  • AgentContext: High-level interface integrating Context Graph and Context Retriever for GraphRAG applications

Core Notebooks:

Related Components: Vector Store โ€ข Embedding Generation โ€ข Advanced Vector Store

Knowledge Graph-Powered RAG (GraphRAG)

Vector + Graph Hybrid Search โ€ข Multi-Hop Reasoning โ€ข LLM-Generated Responses โ€ข Semantic Re-ranking

from semantica.context import AgentContext
from semantica.llms import Groq, OpenAI, LiteLLM
from semantica.vector_store import VectorStore
import os

# Initialize GraphRAG with hybrid retrieval
context = AgentContext(
    vector_store=VectorStore(backend="faiss"),
    knowledge_graph=kg
)

# Configure LLM provider (supports Groq, OpenAI, HuggingFace, LiteLLM)
llm_provider = Groq(
    model="llama-3.1-8b-instant",
    api_key=os.getenv("GROQ_API_KEY")
)

# Query with multi-hop reasoning and LLM-generated responses
result = context.query_with_reasoning(
    query="What IPs are associated with security alerts?",
    llm_provider=llm_provider,
    max_results=10,
    max_hops=2
)

print(f"Response: {result['response']}")
print(f"Reasoning Path: {result['reasoning_path']}")
print(f"Confidence: {result['confidence']:.3f}")

Key Features:

  • Multi-Hop Reasoning: Traverses knowledge graph up to N hops to find related entities
  • LLM-Generated Responses: Natural language answers grounded in graph context
  • Reasoning Trace: Shows entity relationship paths used in reasoning
  • Multiple LLM Providers: Supports Groq, OpenAI, HuggingFace, and LiteLLM (100+ LLMs)

Cookbook: GraphRAG โ€ข Real-Time Anomaly Detection

LLM Providers Module

Unified LLM Interface โ€ข 100+ LLM Support via LiteLLM โ€ข Clean Imports โ€ข Multiple Providers

from semantica.llms import Groq, OpenAI, HuggingFaceLLM, LiteLLM
import os

# Groq
groq = Groq(
    model="llama-3.1-8b-instant",
    api_key=os.getenv("GROQ_API_KEY")
)
response = groq.generate("What is AI?")

# OpenAI
openai = OpenAI(
    model="gpt-4",
    api_key=os.getenv("OPENAI_API_KEY")
)
response = openai.generate("What is AI?")

# HuggingFace - Local models
hf = HuggingFaceLLM(model_name="gpt2")
response = hf.generate("What is AI?")

# LiteLLM - Unified interface to 100+ LLMs
litellm = LiteLLM(
    model="openai/gpt-4o",  # or "anthropic/claude-sonnet-4-20250514", "groq/llama-3.1-8b-instant", etc.
    api_key=os.getenv("OPENAI_API_KEY")
)
response = litellm.generate("What is AI?")

# Structured output
structured = groq.generate_structured("Extract entities from: Apple Inc. was founded by Steve Jobs.")

Supported Providers:

  • Groq: Inference with Llama models
  • OpenAI: GPT-3.5, GPT-4, and other OpenAI models
  • HuggingFace: Local LLM inference with Transformers
  • LiteLLM: Unified interface to 100+ LLM providers (OpenAI, Anthropic, Azure, Bedrock, Vertex AI, and more)

Reasoning & Inference Engine

Rule-based Inference โ€ข Forward/Backward Chaining โ€ข Rete Algorithm โ€ข Explanation Generation

from semantica.reasoning import Reasoner

# Initialize Reasoner
reasoner = Reasoner()

# Define rules and facts
rules = ["IF Parent(?a, ?b) AND Parent(?b, ?c) THEN Grandparent(?a, ?c)"]
facts = ["Parent(Alice, Bob)", "Parent(Bob, Charlie)"]

# Infer new facts (Forward Chaining)
inferred = reasoner.infer_facts(facts, rules)
print(f"Inferred: {inferred}") # ['Grandparent(Alice, Charlie)']

# Explain reasoning
from semantica.reasoning import ExplanationGenerator
explainer = ExplanationGenerator()
# ... generate explanation for inferred facts

Cookbook: Reasoning โ€ข Rete Engine

Pipeline Orchestration & Parallel Processing

Orchestrator-Worker Pattern โ€ข Parallel Execution โ€ข Scalable Processing

from semantica.pipeline import PipelineBuilder, ExecutionEngine

pipeline = PipelineBuilder() \
    .add_step("ingest", "custom", func=ingest_data) \
    .add_step("extract", "custom", func=extract_entities) \
    .add_step("build", "custom", func=build_graph) \
    .build()

result = ExecutionEngine().execute_pipeline(pipeline, parallel=True)

Production-Ready Quality Assurance

Enterprise-Grade QA โ€ข Conflict Detection โ€ข Deduplication

from semantica.deduplication import DuplicateDetector
from semantica.conflicts import ConflictDetector

entities = kg.get("entities", [])
conflicts = ConflictDetector().detect_conflicts(entities)
duplicates = DuplicateDetector(similarity_threshold=0.85).detect_duplicates(entities)

print(f"Conflicts: {len(conflicts)} | Duplicates: {len(duplicates)}")

Cookbook: Conflict Detection & Resolution โ€ข Deduplication

Visualization & Export

Interactive graphs โ€ข Multi-format export โ€ข Graph analytics

from semantica.visualization import KGVisualizer
from semantica.export import GraphExporter

# Visualize knowledge graph
viz = KGVisualizer(layout="force")
fig = viz.visualize_network(kg, output="interactive")
fig.show()

# Export to multiple formats
exporter = GraphExporter()
exporter.export(kg, format="json", output_path="graph.json")
exporter.export(kg, format="graphml", output_path="graph.graphml")

Cookbook: Visualization โ€ข Export

Seed Data Integration

Foundation data โ€ข Entity resolution โ€ข Domain knowledge

from semantica.seed import SeedDataManager

seed_manager = SeedDataManager()
seed_manager.seed_data.entities = [
    {"id": "s1", "text": "Supplier A", "type": "Supplier", "source": "foundation", "verified": True}
]

# Use seed data for entity resolution
resolved = seed_manager.resolve_entities(extracted_entities)

Cookbook: Seed Data

๐Ÿš€ Quick Start

For comprehensive examples, see the Cookbook with interactive notebooks!

from semantica.semantic_extract import NERExtractor, RelationExtractor
from semantica.kg import GraphBuilder
from semantica.context import AgentContext, ContextGraph
from semantica.vector_store import VectorStore

# Extract entities and relationships
ner_extractor = NERExtractor(method="ml", model="en_core_web_sm")
relation_extractor = RelationExtractor(method="dependency", model="en_core_web_sm")

text = "Apple Inc. was founded by Steve Jobs in 1976."
entities = ner_extractor.extract(text)
relationships = relation_extractor.extract(text, entities=entities)

# Build knowledge graph
builder = GraphBuilder()
kg = builder.build({"entities": entities, "relationships": relationships})

# Query using GraphRAG
vector_store = VectorStore(backend="faiss", dimension=384)
context_graph = ContextGraph()
context_graph.build_from_entities_and_relationships(
    entities=kg.get('entities', []),
    relationships=kg.get('relationships', [])
)
context = AgentContext(vector_store=vector_store, knowledge_graph=context_graph)

results = context.retrieve("Who founded Apple?", max_results=5)
print(f"Found {len(results)} results")

Cookbook: Your First Knowledge Graph

๐ŸŽฏ Use Cases

Enterprise Knowledge Engineering โ€” Unify data sources into knowledge graphs, breaking down silos.

AI Agents & Autonomous Systems โ€” Build agents with persistent memory and semantic understanding.

Multi-Format Document Processing โ€” Process multiple formats through a unified pipeline.

Data Pipeline Processing โ€” Build scalable pipelines with parallel execution.

Intelligence & Security โ€” Analyze networks, threat intelligence, forensic analysis.

Finance & Trading โ€” Fraud detection, market intelligence, risk assessment.

Biomedical โ€” Drug discovery, medical literature analysis.

๐Ÿณ Semantica Cookbook

Interactive Jupyter Notebooks designed to take you from beginner to expert.

View Full Cookbook

Featured Recipes

Recipe Description Link
GraphRAG Complete Build a production-ready Graph Retrieval Augmented Generation system. Features Graph Validation, Hybrid Retrieval, and Logical Inference. Open Notebook
RAG vs. GraphRAG Side-by-side comparison. Demonstrates the Reasoning Gap and how GraphRAG solves it with Inference Engines. Open Notebook
First Knowledge Graph Go from raw text to a queryable knowledge graph in 20 minutes. Open Notebook
Real-Time Anomalies Detect anomalies in streaming data using temporal knowledge graphs and pattern detection. Open Notebook

Core Tutorials

Industry Use Cases (14 Cookbooks)

Domain-Specific Cookbooks showcasing real-world applications with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and comprehensive Semantica module integration:

Biomedical

Finance

  • Financial Data Integration MCP - Alpha Vantage API, MCP servers, seed data, real-time ingestion
  • Fraud Detection - Transaction streams, temporal KGs, pattern detection, conflict resolution, Context Graph, Context Retriever, GraphRAG with Groq LLM

Blockchain

Cybersecurity

Intelligence & Law Enforcement

Renewable Energy

Supply Chain

Explore Use Case Examples โ€” See real-world implementations in finance, biomedical, cybersecurity, and more. 14 comprehensive domain-specific cookbooks with real data sources, advanced chunking strategies, temporal KGs, GraphRAG, and full Semantica module integration.

๐Ÿ”ฌ Advanced Features

Docling Integration โ€” Document parsing with table extraction for PDFs, DOCX, PPTX, and XLSX files. Supports OCR and multiple export formats.

AWS Neptune Support โ€” Amazon Neptune graph database integration with IAM authentication and OpenCypher queries.

Custom Ontology Import โ€” Import existing ontologies (OWL, RDF, Turtle, JSON-LD, N3) and extend Schema.org, FOAF, Dublin Core, or custom ontologies.

Incremental Updates โ€” Real-time stream processing with Kafka, RabbitMQ, Kinesis for live updates.

Multi-Language Support โ€” Process multiple languages with automatic detection.

Advanced Reasoning โ€” Forward/backward chaining, Rete-based pattern matching, and automated explanation generation.

Graph Analytics โ€” Centrality, community detection, path finding, temporal analysis.

Custom Pipelines โ€” Build custom pipelines with parallel execution.

API Integration โ€” Integrate external APIs for entity enrichment.

See Advanced Examples โ€” Advanced extraction, graph analytics, reasoning, and more.

๐Ÿ—บ๏ธ Roadmap

Q1 2026

  • Core framework (v1.0)
  • GraphRAG engine
  • 6-stage ontology pipeline
  • Advanced reasoning v2 (Rete, Forward/Backward Chaining)
  • Quality assurance features and Quality Assurance module
  • Enhanced multi-language support
  • Evals
  • Real-time streaming improvements

Q2 2026

  • Multi-modal processing

๐Ÿค Community & Support

Join Our Community

Channel Purpose
Discord Real-time help, showcases
GitHub Discussions Q&A, feature requests

Learning Resources

Enterprise Support

Enterprise support, professional services, and commercial licensing will be available in the future. For now, we offer community support through Discord and GitHub Discussions.

Current Support:

Future Enterprise Offerings:

  • Professional support with SLA
  • Enterprise licensing
  • Custom development services
  • Priority feature requests
  • Dedicated support channels

Stay tuned for updates!

๐Ÿค Contributing

How to Contribute

# Fork and clone
git clone https://github.com/your-username/semantica.git
cd semantica

# Create branch
git checkout -b feature/your-feature

# Install dev dependencies
pip install -e ".[dev,test]"

# Make changes and test
pytest tests/
black semantica/
flake8 semantica/

# Commit and push
git commit -m "Add feature"
git push origin feature/your-feature

Contribution Types

  1. Code - New features, bug fixes
  2. Documentation - Improvements, tutorials
  3. Bug Reports - Create issue
  4. Feature Requests - Request feature

๐Ÿ“œ License

Semantica is licensed under the MIT License - see the LICENSE file for details.

Built by the Semantica Community

GitHub โ€ข Discord

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

semantica-0.2.5.tar.gz (849.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

semantica-0.2.5-py3-none-any.whl (971.0 kB view details)

Uploaded Python 3

File details

Details for the file semantica-0.2.5.tar.gz.

File metadata

  • Download URL: semantica-0.2.5.tar.gz
  • Upload date:
  • Size: 849.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semantica-0.2.5.tar.gz
Algorithm Hash digest
SHA256 7c41d98173fc766d775d851ecb8cb1185763f4ce5ea404ff8a0923cf5c0769a4
MD5 56e963648f2e48b90c01c14e2053d1fb
BLAKE2b-256 08f88e392889c88abd6fd6784296d05c81e3bff9a06dfb1994f0245caa22d840

See more details on using hashes here.

Provenance

The following attestation bundles were made for semantica-0.2.5.tar.gz:

Publisher: release.yml on Hawksight-AI/semantica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file semantica-0.2.5-py3-none-any.whl.

File metadata

  • Download URL: semantica-0.2.5-py3-none-any.whl
  • Upload date:
  • Size: 971.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for semantica-0.2.5-py3-none-any.whl
Algorithm Hash digest
SHA256 bd13c26455faa95bec0ec6b838a57ff0b3c76920b1988a65f5134fea1145f219
MD5 a9fa81a0305810f1ee5c24b91ab1a83e
BLAKE2b-256 08ecf81b04c5d209f4c19a2fd14864b555833be2843ab480ee702774fe444d37

See more details on using hashes here.

Provenance

The following attestation bundles were made for semantica-0.2.5-py3-none-any.whl:

Publisher: release.yml on Hawksight-AI/semantica

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page