Framework-agnostic Python library for graph-backed personal knowledge bases from chat data
Project description
bot-knows
A framework-agnostic Python library for building graph-backed personal knowledge bases from chat data.
Features
- Multi-source Chat Ingestion: Import chats from ChatGPT, Claude, and custom JSON formats
- Semantic Topic Extraction: LLM-powered topic extraction with confidence scores
- Intelligent Deduplication: Embedding-based semantic deduplication with configurable thresholds
- Graph-backed Knowledge Base: Neo4j-powered relationship graph for topics and messages
- Evidence-weighted Recall: Spaced repetition-inspired recall system with decay and reinforcement
- Pluggable Infrastructure: Bring your own storage, graph database, or LLM provider
Requirements
- Python >= 3.13
- MongoDB (storage) - or custom storage implementation
- Neo4j (graph database) - or custom graph implementation
- Redis (optional, for caching)
- OpenAI or Anthropic API key (for LLM features) - or custom LLM implementation
Installation
pip install bot-knows
Or with uv:
uv add bot-knows
Quick Start
The BotKnows class is the main orchestrator that accepts implementation classes for storage, graph database, and LLM providers. Configuration is automatically loaded from environment variables.
Using Built-in Infrastructure
from bot_knows import (
BotKnows,
MongoStorageRepository,
Neo4jGraphRepository,
OpenAIProvider,
ChatGPTAdapter,
)
async def main():
# Config is loaded from .env automatically
async with BotKnows(
storage_class=MongoStorageRepository,
graphdb_class=Neo4jGraphRepository,
llm_class=OpenAIProvider,
) as bk:
# Import ChatGPT conversations
result = await bk.insert_chats("conversations.json", ChatGPTAdapter)
print(f"Imported {result.chats_new} chats, {result.topics_created} topics")
# Query the knowledge base
topics = await bk.get_chat_topics(chat_id)
due_topics = await bk.get_due_topics(threshold=0.3)
Available Implementations
Storage:
MongoStorageRepository- MongoDB-based storage
Graph Database:
Neo4jGraphRepository- Neo4j graph database
LLM Providers:
OpenAIProvider- OpenAI API (GPT models + embeddings)AnthropicProvider- Anthropic API (Claude models)
Import Adapters:
ChatGPTAdapter- ChatGPT export formatClaudeAdapter- Claude export formatGenericJSONAdapter- Custom JSON format
Custom Implementations
You can provide your own implementations by implementing the required interfaces. Set config_class = None on your class and pass configuration via the *_custom_config parameters.
Interfaces
StorageInterface- Persistent storage for chats, messages, topics, evidence, and recall stateGraphServiceInterface- Graph database operations for the knowledge graphLLMInterface- LLM interactions for classification and topic extractionEmbeddingServiceInterface- Text embedding generation
Example: Custom Storage Implementation
from bot_knows import BotKnows, StorageInterface, Neo4jGraphRepository, OpenAIProvider
class MyCustomStorage:
"""Custom storage implementation."""
config_class = None # Signals custom config
@classmethod
async def from_dict(cls, config: dict) -> "MyCustomStorage":
"""Factory method for custom config."""
return cls(connection_string=config["connection_string"])
def __init__(self, connection_string: str):
self.conn = connection_string
# Implement all StorageInterface methods...
async def save_chat(self, chat): ...
async def get_chat(self, chat_id): ...
# ... etc
async with BotKnows(
storage_class=MyCustomStorage,
graphdb_class=Neo4jGraphRepository,
llm_class=OpenAIProvider,
storage_custom_config={"connection_string": "postgresql://..."},
) as bk:
result = await bk.insert_chats("data.json", ChatGPTAdapter)
Example: Custom LLM Provider
from bot_knows import BotKnows, LLMInterface, MongoStorageRepository, Neo4jGraphRepository
class MyLLMProvider:
"""Custom LLM provider (e.g., local model, different API)."""
config_class = None
@classmethod
async def from_dict(cls, config: dict) -> "MyLLMProvider":
return cls(api_url=config["api_url"], model=config["model"])
def __init__(self, api_url: str, model: str):
self.api_url = api_url
self.model = model
# Implement LLMInterface methods
async def classify_chat(self, first_pair, last_pair): ...
async def extract_topics(self, user_content, assistant_content): ...
async def normalize_topic_name(self, name): ...
# Implement EmbeddingServiceInterface if used as embedding provider
async def embed(self, texts): ...
async with BotKnows(
storage_class=MongoStorageRepository,
graphdb_class=Neo4jGraphRepository,
llm_class=MyLLMProvider,
llm_custom_config={"api_url": "http://localhost:8000", "model": "llama3"},
) as bk:
result = await bk.insert_chats("data.json", ChatGPTAdapter)
Configuration
Configuration is loaded from environment variables. See .env.example for all available options.
Key environment variables:
MONGODB_URI- MongoDB connection stringNEO4J_URI,NEO4J_USER,NEO4J_PASSWORD- Neo4j connectionOPENAI_API_KEY- OpenAI API keyANTHROPIC_API_KEY- Anthropic API keyDEDUP_HIGH_THRESHOLD,DEDUP_LOW_THRESHOLD- Deduplication thresholds
Architecture
Input Sources (ChatGPT, Claude, Custom JSON)
↓
Import Adapters (normalize to ChatIngest)
↓
Domain Processing
├── Chat identity resolution
├── One-time Chat classification
├── Message creation & ordering
↓
Topic Extraction
├── LLM-based extraction
├── Semantic deduplication
├── Evidence append
↓
Graph Updates (Neo4j)
Retrieval API
async with BotKnows(...) as bk:
# Get messages for a chat
messages = await bk.get_messages_for_chat(chat_id)
# Get topics for a chat
topic_ids = await bk.get_chat_topics(chat_id)
# Get related topics
related = await bk.get_related_topics(topic_id, limit=10)
# Get topic evidence
evidence = await bk.get_topic_evidence(topic_id)
# Spaced repetition recall
recall_state = await bk.get_recall_state(topic_id)
due_topics = await bk.get_due_topics(threshold=0.3)
all_states = await bk.get_all_recall_states()
Development
# Install with dev dependencies
uv sync --dev
# Run tests
uv run pytest
# Type checking
uv run mypy src/
# Linting
uv run ruff check src/
Future Plans
The built-in infrastructure will be extended with additional providers:
- Storage: PostgreSQL, SQLite
- Graph: Amazon Neptune, TigerGraph, MemGraph
- LLM: Google Gemini, Ollama, HuggingFace
Contributing
Contributions are welcome! If you'd like to add a new infrastructure implementation:
- Implement the appropriate interface (
StorageInterface,GraphServiceInterface,LLMInterface, orEmbeddingServiceInterface) - Add a
config_classfor environment-based configuration (or set toNonefor custom config) - Implement the
from_configclass method (orfrom_dictifconfig_classisNone) - Add tests for your implementation
- Submit a pull request
License
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file bot_knows-0.1.1.tar.gz.
File metadata
- Download URL: bot_knows-0.1.1.tar.gz
- Upload date:
- Size: 54.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edc8aedec51b74cd2b1e6374a44901646cf3b2a02fde76a89847f3b1a1c4650f
|
|
| MD5 |
1cb487d24b3c00411e957b6e776b8252
|
|
| BLAKE2b-256 |
dea35200d2f30aaa00d2a15f9e763513873a05f2e68176ab48e9fba616358209
|
File details
Details for the file bot_knows-0.1.1-py3-none-any.whl.
File metadata
- Download URL: bot_knows-0.1.1-py3-none-any.whl
- Upload date:
- Size: 66.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.26 {"installer":{"name":"uv","version":"0.9.26","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":null,"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7d115ef41df8627d944d314a03601c439bb0f59cdd301521b8d50755a2687bf3
|
|
| MD5 |
114aaaf94987f6ac99446ebcd3f8e70b
|
|
| BLAKE2b-256 |
d45c5f3b6873fa6ffe17b25f91b7d8bc8be081cf08b98b472706853b7859340c
|