Pure-Python port of the Knowledge Engine backend
Project description
Knowledge Engine Backend
A powerful, pure-Python search and knowledge retrieval engine designed to run in-process. It combines focused web crawling, vector-based indexing, and LLM-powered answer generation.
👥 For Humans
Features
- In-Process Architecture: Runs directly within your Python application. No external server orchestration required.
- Smart Crawling: Includes a politeness-aware focused crawler with frontier management.
- Vector Search: Local vector store for semantic search capabilities.
- LLM Integration: Seamless integration with Ollama (local) and OpenAI (cloud) for RAG (Retrieval-Augmented Generation).
- Configurable: Fully customizable via environment variables or configuration objects.
Installation
-
Clone the repository:
git clone <repository-url> cd knowledge_engine_backend
-
Install dependencies: It is recommended to use a virtual environment.
pip install -e .
For development dependencies (testing, linting):
pip install -e ".[dev]"
Quick Start
from knowledge_engine_backend import KnowledgeEngineApp
# Initialize the application
app = KnowledgeEngineApp()
# 1. Start crawling a target website (runs in background)
app.start_crawl("https://example.com")
# 2. Search the indexed content
results = app.search("example domain")
print(results)
# 3. Generate an answer using LLM (RAG)
# Note: Ensure LLM environment variables are set correctly
# answer = app.generate("What is the main purpose of this website?")
# print(answer)
Configuration
Configure the engine using environment variables or a Config object.
| Variable | Default | Description |
|---|---|---|
LLM_PROVIDER |
ollama |
Provider backend (ollama or openai) |
LLM_BASE_URL |
http://localhost:11434 |
API Endpoint for the LLM (e.g. /api/generate for Ollama) |
LLM_MODEL |
qwen3:1.7b |
Model name to use |
LLM_API_KEY |
- | API Key (required for OpenAI) |
🤖 For AI Agents
This section provides structural and contextual information to assist AI agents in understanding, maintaining, and extending this codebase.
Project Structure
knowledge_engine_backend/: Core package source code.app.py: High-level entry point (KnowledgeEngineApp). Facade for the system.engine.py: Orchestrator theKnowledgeEngineclass connecting components.fetcher.py: Handles HTTP requests, content extraction, and parsing.frontier.py: Manages crawl queues, prioritization, and visited sets.storage.py: File-based persistence layer for raw data.search.py: Vector embedding and similarity search logic.provider.py: LLM interface implementations (OpenAI, Ollama).config.py: Configuration schema and loading logic.
scripts/: Utility and test scripts.e2e_ollama.py: End-to-end testing script proving the full pipeline with Ollama.
data/: Directory for storing raw JSON data artifacts from crawls.tests/: Pytest suite for unit and integration testing.
Component Architecture
The KnowledgeEngine follows a modular component-based architecture:
- Frontier
(frontier.py): Supplies URLs to be processed. - Fetcher
(fetcher.py): Downloads and parses HTML content. - Storage
(storage.py): Saves raw document data. - VectorStore
(search.py): Indexes document embeddings for retrieval. - LLMProvider
(provider.py): Interfaces with AI models for generation tasks.
Development & Maintenance Tasks
Running Tests:
Execute the test suite using pytest. Ensure dev dependencies are installed.
pytest
Running E2E Scripts: To verify the full pipeline with Ollama (requires Ollama running locally):
python scripts/e2e_ollama.py "What is the capital of France?" --source "https://en.wikipedia.org/wiki/France"
Code Style:
- The project allows
from __future__ import annotations. - Type hinting is encouraged for all new methods.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file knowledge_engine_backend-0.1.0.tar.gz.
File metadata
- Download URL: knowledge_engine_backend-0.1.0.tar.gz
- Upload date:
- Size: 15.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c5cfa74c4f2d2f61e3bd4f7d3965b9b869f2f58e951cc39676de7b510d2bad0a
|
|
| MD5 |
e03779318c8e344963327d196f9a749c
|
|
| BLAKE2b-256 |
8b14351db5c3c9eebbe89c38dd8ee723e6354b1661645772e77e27e9e4657807
|
File details
Details for the file knowledge_engine_backend-0.1.0-py3-none-any.whl.
File metadata
- Download URL: knowledge_engine_backend-0.1.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6bf4a69cb77543b68d75af6da97bbad01066d438ca3fba21750351e45372fb2c
|
|
| MD5 |
2e86bfa0cee98bcfc077f3b99980d26e
|
|
| BLAKE2b-256 |
29dbcbf4fa080f58e854f5a6d370838344a8fc2012f9a6b8b1c19d93c30e0f61
|