A library that enables a hallucination-free conversations with code repositories.

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

Doc2Talk

A fast, context-aware code documentation chat interface that provides intelligent responses based on your codebase and documentation.

Features

Knowledge Graph Indexing: Indexes both code and documentation for comprehensive understanding
Smart Context Management: Intelligently decides when to fetch new context or use existing context
Lazy Initialization: Fast startup with on-demand index building
Repository Caching: Efficiently caches GitHub repositories for faster repeated access
Custom LLM Support: Configure different models for decisions and responses
High-Performance Storage: Uses optimized serialization for 100x faster loading times
Terminal UI: Rich terminal interface with interactive streaming responses
Session Management: Save and restore chat sessions for continuous workflows

Installation

# Install from PyPI
pip install doc2talk

# Install from source
git clone https://github.com/unclecode/doc2talk.git
cd doc2talk
pip install -e .

Quick Start

Command Line Interface

# Specify code and documentation sources (required)
doc2talk --code /path/to/code --docs /path/to/docs

# Use a GitHub repository
doc2talk --code https://github.com/username/repo/tree/main/src --docs https://github.com/username/repo/tree/main/docs

# Exclude specific patterns
doc2talk --exclude "*/tests/*" --exclude "*/node_modules/*"

# Session management
doc2talk --list                     # List all chat sessions
doc2talk --continue SESSION_ID      # Continue existing session
doc2talk --delete SESSION_ID        # Delete a session

Python API

from doc2talk import Doc2Talk

# Create instance with sources (required)
doc2talk = Doc2Talk(
    code_source="https://github.com/unclecode/doc2talk/tree/main/src/doc2talk",
    docs_source="https://github.com/unclecode/doc2talk/tree/main/docs",
    exclude_patterns=["*/tests/*"]
)

# Lazy loading (faster initialization)
doc2talk = Doc2Talk(code_source="/path/to/code")
# Build index only when needed (chat operations automatically build the index)
doc2talk.build_index()  # Or explicitly build index before chatting

# Ask questions and get responses
response = doc2talk.chat("How does the crawler work?")
print(response)

# Streaming responses (synchronous)
for chunk in doc2talk.chat_stream("What are the main components?"):
    print(chunk, end="", flush=True)
    
# Streaming responses (asynchronous)
async for chunk in doc2talk.chat_stream_async("How does it work?"):
    print(chunk, end="", flush=True)

# Customize history and context limits
doc2talk = Doc2Talk(
    code_source="/path/to/code",
    max_history=100,  # Keep up to 100 messages (default is 50)
    max_contexts=10   # Keep up to 10 contexts (default is 5)
)

# Custom LLM configurations
from doc2talk.models import LLMConfig

# Configure custom models with specific parameters
doc2talk = Doc2Talk(
    code_source="/path/to/code",
    # Model for context decisions
    decision_llm_config=LLMConfig(
        model="gpt-3.5-turbo",  # Cheaper model for decisions
        temprature=0.2,         # Lower temperature for consistency
        max_tokens=200          # Limit token usage
    ),
    # Model for response generation
    generation_llm_config=LLMConfig(
        model="gpt-4o",         # Better model for responses
        temprature=0.7,         # Higher temperature for creativity
        max_tokens=1000         # Allow longer responses
    )
)

# Session management
print(f"Current session ID: {doc2talk.session_id}")
sessions = Doc2Talk.list_sessions()
Doc2Talk.delete_session("session_id")

# Loading from existing index
doc2talk = Doc2Talk.from_index(
    "/path/to/index.c4ai",
    max_history=75,
    max_contexts=3
)

How It Works

Doc2Talk combines several powerful components:

DocGraph: A knowledge graph engine that:
- Indexes Python code (classes, functions) and Markdown docs
- Uses BM25 search algorithm for accurate retrieval
- Supports both local paths and GitHub repositories
- Maintains optimized serialization for fast loading
Contextual Decision Making:
- Uses LLMs to determine when to fetch new context
- Decides between replacement, addition, or reuse of context
- Avoids redundant context fetching
Repository Management:
- Caches repositories at ~/.doctalk/repos/
- Intelligently handles multiple paths from the same repository
- Auto-cleans old repositories not accessed in 30 days
Index Caching:
- Stores indexed knowledge graphs at ~/.doctalk/index/
- Uses compressed binary format for 50-70% smaller files
- Memory-mapped files for efficient partial loading

Directory Structure

~/.doctalk/
├── index/          # Cached knowledge graphs
├── repos/          # Cached GitHub repositories
└── sessions/       # Saved chat sessions

Usage Examples

Creating a New Chat

Start a new chat session by running the CLI:

doc2talk --code /path/to/code --docs /path/to/docs

The first time you run it, it will:

Clone the repository if it's a GitHub URL (or use cached version if available)
Build the knowledge graph (or load from cache if available)
Present you with an interactive chat interface

Asking Questions

Simply type your questions in the terminal. Doc2Talk will:

Analyze your question
Determine if new context is needed
Search the knowledge graph for relevant information
Generate a response based on the retrieved context

Example questions:

"How does the crawling functionality work?"
"Can you explain the BM25 search implementation?"
"What's the structure of the DocGraph class?"

Advanced Context Management

Doc2Talk uses three context modes:

New: Completely replaces existing context with fresh search results
Additional: Adds new search results to existing context
None: Uses existing context without searching

The system automatically decides which mode to use based on your questions.

Advanced Features

Custom LLM Models

Doc2Talk supports using different models for different tasks:

Context Decisions: Use a cheaper model to decide when to update context
Response Generation: Use a more powerful model to generate responses

This allows you to balance cost and quality.

Lazy Loading

Doc2Talk initializes quickly by using lazy loading:

Creating a Doc2Talk instance is nearly instantaneous
The knowledge graph is only built when needed
You can control when to build the index with build_index()

This provides faster startup time for applications.

Performance Optimization

Doc2Talk is designed for speed and efficiency:

Cached repositories avoid repeated cloning
Optimized serialization for 100x faster loading
Memory-mapped reading for efficient resource usage
Smart context decisions to minimize redundant searches
Lazy initialization for faster application startup

Contributing

Contributions are welcome! Here are some areas for improvement:

Adding support for more repository types (GitLab, Bitbucket)
Extending language support beyond Python and Markdown
Improving the semantic search capabilities
Enhancing the terminal UI

License

MIT License

Project details

These details have not been verified by PyPI

Project links

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.1

Mar 14, 2025

0.1.0

Mar 14, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

doc2talk-0.1.1.tar.gz (31.6 kB view details)

Uploaded Mar 14, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

doc2talk-0.1.1-py3-none-any.whl (28.0 kB view details)

Uploaded Mar 14, 2025 Python 3

File details

Details for the file doc2talk-0.1.1.tar.gz.

File metadata

Download URL: doc2talk-0.1.1.tar.gz
Upload date: Mar 14, 2025
Size: 31.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for doc2talk-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`5ac28bc6a2577ea801b42be7c13cd90874b9b25fc5241be98bb188ea1d549370`
MD5	`31e59261e35422ab70f49c7ac23c9d28`
BLAKE2b-256	`a50bb643673cdb156f1640fa812f7fd5a670ce6a4652356e9af1cd65c6a952c1`

See more details on using hashes here.

File details

Details for the file doc2talk-0.1.1-py3-none-any.whl.

File metadata

Download URL: doc2talk-0.1.1-py3-none-any.whl
Upload date: Mar 14, 2025
Size: 28.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.10.13

File hashes

Hashes for doc2talk-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`76d72cf2882163ed26054c5df6c7b7a64b96d5515608ce7a016293816340fa52`
MD5	`f303206866e14c61987f6afcbff3800f`
BLAKE2b-256	`6260b10b796c7bdb47bfdb67734b148b0df45daa7aa169cce1fc18de8f2fe02c`

See more details on using hashes here.

doc2talk 0.1.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Doc2Talk

Features

Installation

Quick Start

Command Line Interface

Python API

How It Works

Directory Structure

Usage Examples

Creating a New Chat

Asking Questions

Advanced Context Management

Advanced Features

Custom LLM Models

Lazy Loading

Performance Optimization

Contributing

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes