Self-Healing Knowledge Graph for RAG Pipelines - pip-installable library
Project description
๐ก๏ธ Sentinel: Self-Healing Temporal Knowledge Graph
Sentinel is an autonomous knowledge graph that automatically scrapes, extracts, stores, and maintains structured knowledge from the web. It uses AI to understand content, tracks changes over time, and heals itself when information becomes stale.
[!IMPORTANT] ๐ง Python Package New Release In Progress
Please note that a new release of the Python package is currently in progress, and it may take some time to complete.
Sentinel is a library in progress. We are working hard to improve stability, add new features, and refine the API. While the core functionality is ready for testing, expect breaking changes and rapid updates.
We are building it and making it better every day! ๐
๐ Key Features
- ๐ค Autonomous: Automatically scrapes, extracts, and updates knowledge
- โฐ Temporal: Track how knowledge evolves over time
- ๐ง Self-Healing: Detects and updates stale information automatically
- ๐ง AI-Powered: Uses LLMs to extract entities and relationships
- ๐ Graph-Based: Stores knowledge in a Neo4j temporal graph
- ๐ Web Scraping: Intelligent scraping with Firecrawl or local fallback
- ๐ป Developer-Friendly: Simple Python API and CLI tool
- ๐จ Beautiful UI: 3D graph visualization with Next.js
๐ Quick Start
Installation
pip install sentinel-core
Setup
# Interactive setup wizard
sentinel init
# Or manually create .env file
cat > .env << EOF
NEO4J_URI=bolt://localhost:7687
NEO4J_USER=neo4j
NEO4J_PASSWORD=your-password
OLLAMA_MODEL=ollama/phi3
EOF
Start Services
# Start Neo4j
docker run -d -p 7687:7687 -p 7474:7474 \
-e NEO4J_AUTH=neo4j/password \
neo4j:latest
# Start Ollama (for local LLM)
ollama serve
ollama pull phi3
Your First Knowledge Graph
# Process a URL
sentinel watch https://stripe.com/pricing
# Check status
sentinel status
# View in UI
cd sentinel_platform/ui
npm install && npm run dev
# Visit http://localhost:3000
๐ Usage
Python API
import asyncio
from sentinel_core import Sentinel, GraphManager, GraphExtractor
from sentinel_core.scraper import get_scraper
async def main():
# Initialize
graph = GraphManager()
scraper = get_scraper()
extractor = GraphExtractor(model_name="ollama/phi3")
sentinel = Sentinel(graph, scraper, extractor)
# Process URL
result = await sentinel.process_url("https://example.com")
print(f"Extracted {result['extracted_nodes']} nodes!")
# Query graph
snapshot = graph.get_graph_snapshot()
print(f"Total: {snapshot['metadata']['node_count']} nodes")
graph.close()
asyncio.run(main())
CLI Tool
# Show version
sentinel version
# Check system status
sentinel status
# Process a URL
sentinel watch https://example.com
# Run healing cycle
sentinel heal --days 7
# Interactive setup
sentinel init
๐ฏ Use Cases
1. Product Pricing Monitoring
Track pricing changes across competitors automatically.
urls = [
"https://stripe.com/pricing",
"https://paypal.com/pricing",
"https://square.com/pricing"
]
for url in urls:
await sentinel.process_url(url)
2. Documentation Tracking
Monitor documentation changes for your favorite libraries.
docs = {
"React": "https://react.dev/learn",
"Next.js": "https://nextjs.org/docs",
}
for name, url in docs.items():
await sentinel.process_url(url)
# Auto-heal to detect changes
await sentinel.run_healing_cycle(days_threshold=7)
3. News Aggregation
Build a knowledge graph from multiple news sources.
news_sources = [
"https://techcrunch.com/",
"https://theverge.com/",
]
for url in news_sources:
await sentinel.process_url(url)
4. Research Paper Tracking
Track research papers and their citations.
papers = [
"https://arxiv.org/abs/2303.08774", # GPT-4
"https://arxiv.org/abs/2005.14165", # GPT-3
]
for paper in papers:
await sentinel.process_url(paper)
๐๏ธ Architecture
๐ Documentation
- User Guide - Start Here!
- Quick Start Guide
- CLI Reference
- Usage Examples
โ ๏ธ Limitations & Best Practices
1. Reliability & Hallucinations
LLMs can occasionally "hallucinate" relationships or misinterpret complex DOM structures. Sentinel mitigates this by:
- Using Firecrawl: Converts complex JS/HTML into clean Markdown, reducing noise.
- Structured Extraction: Uses
instructorto enforce strict Pydantic schemas for nodes and edges. - Verification: The
healcommand re-verifies content hashes before any costly LLM extraction.
2. Self-Healing Mechanism
Sentinel uses a Hash-based Change Detection strategy:
- Monitor: Checks for nodes that haven't been verified in
days_threshold(default: 7). - Scrape & Hash: Re-scrapes the URL and computes a SHA-256 hash of the content.
- Diff: Compares the new hash with the stored hash in Neo4j.
- Match: Updates the
last_verifiedtimestamp (Zero LLM cost). - Mismatch: Triggers a full LLM extraction and graph update.
- Match: Updates the
3. Cost & Scale
- LLM Costs: Frequent updates on large sites can be expensive. Use the
days_thresholdinsentinel healto control frequency. - Storage: The temporal graph grows over time. Currently, Sentinel does not auto-prune old versions. We recommend periodically archiving old
VALID_TOrelationships if storage is a concern.
๐ ๏ธ Development
Setup Development Environment
# Clone repository
git clone https://github.com/Om7035/Sentinel-The-Self-Healing-Knowledge-Graph
cd Sentinel-The-Self-Healing-Knowledge-Graph
# Create virtual environment
python -m venv .venv
source .venv/bin/activate # On Windows: .venv\Scripts\activate
# Install dependencies
pip install -e ".[all]"
# Run tests
pytest tests/
Project Structure
sentinel/
โโโ sentinel_core/ # Core library (pip-installable)
โ โโโ scraper/ # Web scraping (Firecrawl + Local)
โ โโโ graph_store.py # Neo4j temporal graph
โ โโโ graph_extractor.py # LLM-based extraction
โ โโโ orchestrator.py # Main Sentinel class
โโโ sentinel_platform/ # Demo platform
โ โโโ api/ # FastAPI backend
โ โโโ ui/ # Next.js frontend
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โโโ sentinel_cli.py # CLI tool
๐ค Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/AmazingFeature) - Commit your changes (
git commit -m 'Add some AmazingFeature') - Push to the branch (
git push origin feature/AmazingFeature) - Open a Pull Request
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built with LangChain, Neo4j, and FastAPI
- Inspired by the need for self-maintaining knowledge systems
- Special thanks to the open-source community
๐ง Contact
- Author: Om Kawale
- Email: speedtech602@gmail.com
- GitHub: @Om7035
- Project: Sentinel
โญ Star History
If you find Sentinel useful, please consider giving it a star! โญ
Made with โค๏ธ by Om Kawale
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file sentinel_core-0.1.7.tar.gz.
File metadata
- Download URL: sentinel_core-0.1.7.tar.gz
- Upload date:
- Size: 34.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
62a19b58704edeaeee5f855125f0b27b2270264e4ab84880a1eb535de9a75a8b
|
|
| MD5 |
7fedf4b53cb3b938abd28a16dbe01ccb
|
|
| BLAKE2b-256 |
2535cdea90d86824d30484b8b6926041fc2d0d74439a22e13d4718873f6c6626
|
File details
Details for the file sentinel_core-0.1.7-py3-none-any.whl.
File metadata
- Download URL: sentinel_core-0.1.7-py3-none-any.whl
- Upload date:
- Size: 38.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8aadcf2eff6d92740a9771c4448601dce8d2eca2d6bc1371f04534b5fd808fa8
|
|
| MD5 |
a89eda3c4b401f51fef0f46581241083
|
|
| BLAKE2b-256 |
6f973cb6e36fad1ffe5f084827932902cd3b056c1055eba0630fa17e83f16469
|