Skip to main content

Framework-agnostic Python library for graph-backed personal knowledge bases from chat data

Project description

bot-knows

A framework-agnostic Python library for building graph-backed personal knowledge bases from chat data. Implemented with Claude Code (model: claude-opus-4-5).

Features

  • Multi-source Chat Ingestion: Import chats from ChatGPT, Claude, and custom JSON formats
  • Semantic Topic Extraction: LLM-powered topic extraction with confidence scores
  • Intelligent Deduplication: Embedding-based semantic deduplication with configurable thresholds
  • Graph-backed Knowledge Base: Neo4j-powered relationship graph for topics and messages
  • Evidence-weighted Recall: Spaced repetition-inspired recall system with decay and reinforcement
  • Pluggable Infrastructure: Bring your own storage, graph database, or LLM provider

Requirements

  • Python >= 3.13
  • MongoDB (storage) - or custom storage implementation
  • Neo4j (graph database) - or custom graph implementation
  • Redis (optional, for caching)
  • OpenAI or Anthropic API key (for LLM features) - or custom LLM implementation

Installation

pip install bot-knows

Or with uv:

uv add bot-knows

Optional Dependencies

Install with optional dependencies for specific infrastructure:

# With pip - install specific extras
pip install bot-knows[mongo,neo4j,openai]

# With uv
uv add bot-knows[mongo,neo4j,openai]

Available extras:

  • mongo - MongoDB storage (motor)
  • neo4j - Neo4j graph database
  • redis - Redis caching
  • taskiq - Task queue support
  • openai - OpenAI LLM provider
  • anthropic - Anthropic LLM provider

Quick Start

The BotKnows class is the main orchestrator that accepts implementation classes for storage, graph database, and LLM providers. Configuration is automatically loaded from environment variables.

Using Built-in Infrastructure

from bot_knows import (
    BotKnows,
    MongoStorageRepository,
    Neo4jGraphRepository,
    OpenAIProvider,
    ChatGPTAdapter,
)

async def main():
    # Config is loaded from .env automatically
    async with BotKnows(
        storage_class=MongoStorageRepository,
        graphdb_class=Neo4jGraphRepository,
        llm_class=OpenAIProvider,
    ) as bk:
        # Import ChatGPT conversations
        result = await bk.insert_chats("conversations.json", ChatGPTAdapter)
        print(f"Imported {result.chats_new} chats, {result.topics_created} topics")

        # Query the knowledge base
        topics = await bk.get_chat_topics(chat_id)
        due_topics = await bk.get_due_topics(threshold=0.3)

Available Implementations

Storage:

  • MongoStorageRepository - MongoDB-based storage

Graph Database:

  • Neo4jGraphRepository - Neo4j graph database

LLM Providers:

  • OpenAIProvider - OpenAI API (GPT models + embeddings)
  • AnthropicProvider - Anthropic API (Claude models)

Import Adapters:

  • ChatGPTAdapter - ChatGPT export format
  • ClaudeAdapter - Claude export format
  • GenericJSONAdapter - Custom JSON format

Custom Implementations

You can provide your own implementations by implementing the required interfaces. Set config_class = None on your class and pass configuration via the *_custom_config parameters.

Interfaces

  • StorageInterface - Persistent storage for chats, messages, topics, evidence, and recall state
  • GraphServiceInterface - Graph database operations for the knowledge graph
  • LLMInterface - LLM interactions for classification and topic extraction
  • EmbeddingServiceInterface - Text embedding generation

Example: Custom Storage Implementation

from bot_knows import BotKnows, StorageInterface, Neo4jGraphRepository, OpenAIProvider

class MyCustomStorage:
    """Custom storage implementation."""

    config_class = None  # Signals custom config

    @classmethod
    async def from_dict(cls, config: dict) -> "MyCustomStorage":
        """Factory method for custom config."""
        return cls(connection_string=config["connection_string"])

    def __init__(self, connection_string: str):
        self.conn = connection_string

    # Implement all StorageInterface methods...
    async def save_chat(self, chat): ...
    async def get_chat(self, chat_id): ...
    # ... etc

async with BotKnows(
    storage_class=MyCustomStorage,
    graphdb_class=Neo4jGraphRepository,
    llm_class=OpenAIProvider,
    storage_custom_config={"connection_string": "postgresql://..."},
) as bk:
    result = await bk.insert_chats("data.json", ChatGPTAdapter)

Example: Custom LLM Provider

from bot_knows import BotKnows, LLMInterface, MongoStorageRepository, Neo4jGraphRepository

class MyLLMProvider:
    """Custom LLM provider (e.g., local model, different API)."""

    config_class = None

    @classmethod
    async def from_dict(cls, config: dict) -> "MyLLMProvider":
        return cls(api_url=config["api_url"], model=config["model"])

    def __init__(self, api_url: str, model: str):
        self.api_url = api_url
        self.model = model

    # Implement LLMInterface methods
    async def classify_chat(self, first_pair, last_pair): ...
    async def extract_topics(self, user_content, assistant_content): ...
    async def normalize_topic_name(self, name): ...

    # Implement EmbeddingServiceInterface if used as embedding provider
    async def embed(self, texts): ...

async with BotKnows(
    storage_class=MongoStorageRepository,
    graphdb_class=Neo4jGraphRepository,
    llm_class=MyLLMProvider,
    llm_custom_config={"api_url": "http://localhost:8000", "model": "llama3"},
) as bk:
    result = await bk.insert_chats("data.json", ChatGPTAdapter)

Configuration

Configuration is loaded from environment variables. See .env.example for all available options.

Key environment variables:

  • MONGODB_URI - MongoDB connection string
  • NEO4J_URI, NEO4J_USER, NEO4J_PASSWORD - Neo4j connection
  • OPENAI_API_KEY - OpenAI API key
  • ANTHROPIC_API_KEY - Anthropic API key
  • DEDUP_HIGH_THRESHOLD, DEDUP_LOW_THRESHOLD - Deduplication thresholds

Architecture

Input Sources (ChatGPT, Claude, Custom JSON)
        ↓
Import Adapters (normalize to ChatIngest)
        ↓
Domain Processing
  ├── Chat identity resolution
  ├── One-time Chat classification
  ├── Message creation & ordering
        ↓
Topic Extraction
  ├── LLM-based extraction
  ├── Semantic deduplication
  ├── Evidence append
        ↓
Graph Updates (Neo4j)

Retrieval API

async with BotKnows(...) as bk:
    # Get messages for a chat
    messages = await bk.get_messages_for_chat(chat_id)

    # Get topics for a chat
    topic_ids = await bk.get_chat_topics(chat_id)

    # Get related topics
    related = await bk.get_related_topics(topic_id, limit=10)

    # Get topic evidence
    evidence = await bk.get_topic_evidence(topic_id)

    # Spaced repetition recall
    recall_state = await bk.get_recall_state(topic_id)
    due_topics = await bk.get_due_topics(threshold=0.3)
    all_states = await bk.get_all_recall_states()

Development

# Install with dev dependencies
uv sync --dev

# Install with dev and optional dependencies
uv sync --dev --extra mongo --extra neo4j --extra openai

# Install all extras
uv sync --dev --all-extras

# Run tests
uv run pytest

# Type checking
uv run mypy src/

# Linting
uv run ruff check src/

Future Plans

The built-in infrastructure will be extended with additional providers:

  • Storage: PostgreSQL, SQLite
  • Graph: Amazon Neptune, TigerGraph, MemGraph
  • LLM: Google Gemini, Ollama, HuggingFace

Contributing

Contributions are welcome! If you'd like to add a new infrastructure implementation:

  1. Implement the appropriate interface (StorageInterface, GraphServiceInterface, LLMInterface, or EmbeddingServiceInterface)
  2. Add a config_class for environment-based configuration (or set to None for custom config)
  3. Implement the from_config class method (or from_dict if config_class is None)
  4. Add tests for your implementation
  5. Submit a pull request

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bot_knows-0.1.2.tar.gz (54.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

bot_knows-0.1.2-py3-none-any.whl (68.1 kB view details)

Uploaded Python 3

File details

Details for the file bot_knows-0.1.2.tar.gz.

File metadata

  • Download URL: bot_knows-0.1.2.tar.gz
  • Upload date:
  • Size: 54.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bot_knows-0.1.2.tar.gz
Algorithm Hash digest
SHA256 fa70d80e39d026f96a0815476d4930d0d2940269266d9303836625cc6257c73f
MD5 0749d9b9f4de0213fc239e0f081fdecc
BLAKE2b-256 7e3c59af94a4be67cf8fc6bf3a0bdde01a54200fc92485db58a42344ab0f0cc9

See more details on using hashes here.

File details

Details for the file bot_knows-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: bot_knows-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 68.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.30 {"installer":{"name":"uv","version":"0.9.30","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for bot_knows-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 b417d7fdfa23d4bed243985441068a72579790055516eb9a93562e9a0c3e25c9
MD5 9e2662aff1844a15dc8b112e83804040
BLAKE2b-256 53f2460d459302cf196f4d3ad0d3ac428538dc39010f10ff40147fab817b4a16

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page