Deep Knowledge introduces a powerful multi-agent system for creating deep, comprehensive summaries of complex content.
Project description
Deep Knowledge
The Problem
Current summarization tools in the market typically generate shallow, surface-level summaries that fail to capture the rich complexity of content. These summaries often:
- Miss the deeper connections between concepts
- Ignore the hierarchical structure of information
- Extract only the most obvious points
- Lack contextual understanding
- Fail to identify underlying frameworks or mental models
As a result, these tools provide little value for comprehensive learning, critical analysis, or deep content exploration.
Our Approach: Multi-Agent Summarization
Deep Knowledge introduces a powerful multi-agent system for creating deep, comprehensive summaries of complex content. Instead of treating summarization as a single task, we break it down into a coordinated pipeline of specialized agents:
- Mind Map Agent: Analyzes the structure and concepts of the content, creating both a structural and conceptual map
- Summary Architect: Designs a modular summary structure with specific instructions for each component
- Content Synthesizer: Generates each module following the architect's specifications
This approach ensures that summaries retain the original content's structure while revealing deeper patterns, frameworks, and connections.
Key Features
- Deep structured summarization: Creates summaries that capture both structure and conceptual depth
- Smart content handling: Automatically processes various document types with OCR detection
- Token-aware processing: Intelligently manages content to work within model context limitations
- Langchain integration: Seamlessly works with Langchain chat models
- Visual mind mapping: Generates comprehensive mind maps to visualize content structure
Installation
pip install deep-knowledge
Usage Example
from deep_knowledge.summary import Summary
# From a file path
summary = Summary(input_path="my_book.pdf")
summary.run()
print(summary.output)
# From text content
text_content = "..."
summary = Summary(input_content=text_content)
summary.run()
print(summary.output)
Streamlit Demo
We created a Streamlit demo to showcase the Deep Knowledge summarization pipeline. To run the demo, follow these steps:
export PYTHONPATH=$PYTHONPATH:"$(pwd)"
streamlit run demo/streamlit_app.py
Langchain Integration
Deep Knowledge integrates smoothly with Langchain chat models. You can provide your own Langchain chat model, or use the "auto" option which intelligently selects the best available model:
from langchain_openai import ChatOpenAI
from deep_knowledge.summary import Summary
# Using a specific Langchain model
llm = ChatOpenAI(model_name="gpt-4o", temperature=0)
summary = Summary(llm=llm, input_path="article.pdf")
# Using auto mode (automatically selects from available API keys)
summary = Summary(llm="auto", input_path="article.pdf")
In "auto" mode, the system prioritizes:
- Google Gemini models (if
GOOGLE_API_KEYis available) - OpenAI models (if
OPENAI_API_KEYis available)
This allows for easy experimentation with different LLM providers.
Flexible Input Options
Deep Knowledge accepts multiple input formats:
# From a file path
summary = Summary(input_path="document.pdf")
# From raw text content
summary = Summary(input_content="Your content here...")
# From Langchain Document objects
from langchain_core.documents import Document
documents = [Document(page_content="Content 1"), Document(page_content="Content 2")]
summary = Summary(input_documents=documents)
The library supports various file formats including PDF, DOCX, TXT, Markdown, and EPUB, with automatic OCR detection for scanned documents.
Roadmap
Token Management Optimization
Current implementation can be expensive as it sends the full content to each module creation step. Future improvements will include:
- Having the Summary Architect return metadata about specific content sections for each module
However, using Google Gemini Flash models allows for free/low-cost experimentation
Interactive Refinement and Follow-ups
The current pipeline operates as a single-pass process, but users often need to refine outputs based on initial results.
- Enabling conversational interactions to shape the summarization process
- Allowing targeted refinement of specific modules without regenerating the entire summary
Adding possibility to configure LLM based on step
Now, there's only one LLM that works for all steps. But it would be a good feature to allow experimentation with different LLMs for different steps. E.g.:
- gpt-4o for the Mind Map Agent
- o1 for the Summary Architect
- gemini-2.0-flash for the Content Synthesizer
Improved Reproducibility
We're working on enhancing the system's reliability when LLMs don't precisely follow the expected output format, including:
- More robust output parsing
- Fallback strategies for format deviations
- Better error handling and recovery
Enhanced OCR Capabilities
Plans to expand OCR options include:
- Supporting additional OCR providers
- Implementing local OCR processing options
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
This project is licensed under the Apache 2.0 License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file deep_knowledge-0.1.9.tar.gz.
File metadata
- Download URL: deep_knowledge-0.1.9.tar.gz
- Upload date:
- Size: 24.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9fbbfa74d86243c3f505c0fa39d02459ae8a733ea457dee09d57dfc011573987
|
|
| MD5 |
11508def40107c2ec269172eeb7984c4
|
|
| BLAKE2b-256 |
4fc7f918ba1d268af0f454b54729793394e86dd271dd86fb41821371309388a7
|
File details
Details for the file deep_knowledge-0.1.9-py3-none-any.whl.
File metadata
- Download URL: deep_knowledge-0.1.9-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.6.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6a943c5f102c433e8fa9cda6798792b1cfe0663b733cf66444f2177c038414af
|
|
| MD5 |
4d2aa9967d73c43fdc253deeb3d90c00
|
|
| BLAKE2b-256 |
7f5ef0659047da1915c13e4dfcf5b1b9b8aea76f074920539cf930d337527b1d
|