Skip to main content

RAG and knowledge base integration for wish

Project description

wish-knowledge

Knowledge base management and RAG (Retrieval-Augmented Generation) for the wish penetration testing platform.

Overview

wish-knowledge provides a comprehensive knowledge management system that integrates security knowledge bases like HackTricks into wish. It enables intelligent context-aware suggestions and technique recommendations during penetration testing.

Key Features

  • Knowledge Base Integration: Built-in support for HackTricks and custom knowledge sources
  • RAG Implementation: Retrieval-Augmented Generation for contextual AI responses
  • Vector Search: Efficient semantic search using embeddings
  • Dynamic Updates: Automatic knowledge base synchronization
  • Multi-source Support: Combine multiple knowledge sources

Installation

# Install dependencies in development environment
uv sync --dev

# Install as package (future release)
pip install wish-knowledge

Quick Start

Basic Usage Example

from wish_knowledge import KnowledgeManager, Retriever
from wish_models import SessionMetadata

# Initialize knowledge manager
knowledge_manager = KnowledgeManager()

# Initialize retriever
retriever = Retriever(knowledge_manager)

# Search for relevant knowledge
query = "SQL injection testing web application"
results = await retriever.search(query, top_k=5)

for result in results:
    print(f"Title: {result.title}")
    print(f"Score: {result.relevance_score}")
    print(f"Content: {result.content[:200]}...")
    print("---")

# Get context-aware suggestions
context = {
    "current_mode": "enum",
    "target_type": "web_application",
    "discovered_services": ["http", "https"]
}
suggestions = await retriever.get_contextual_suggestions(context)

Knowledge Base Management

from wish_knowledge import KnowledgeBase, HackTricksLoader

# Load HackTricks knowledge base
hacktricks = HackTricksLoader()
await hacktricks.load()

# Create custom knowledge base
custom_kb = KnowledgeBase(name="Custom Techniques")
custom_kb.add_article(
    title="Advanced SQLi Techniques",
    content="...",
    tags=["sql", "injection", "web"],
    category="exploitation"
)

# Combine knowledge bases
knowledge_manager.add_knowledge_base(hacktricks)
knowledge_manager.add_knowledge_base(custom_kb)

Architecture

Core Components

KnowledgeManager

Central manager for all knowledge bases and indexing.

from wish_knowledge import KnowledgeManager

manager = KnowledgeManager()

# Add knowledge bases
manager.add_knowledge_base(hacktricks_kb)
manager.add_knowledge_base(custom_kb)

# Update indices
await manager.update_indices()

# Get statistics
stats = manager.get_statistics()
print(f"Total articles: {stats.total_articles}")
print(f"Knowledge bases: {stats.knowledge_bases}")

Retriever

Intelligent retrieval system with RAG capabilities.

from wish_knowledge import Retriever

retriever = Retriever(manager)

# Basic search
results = await retriever.search("privilege escalation linux")

# Filtered search
results = await retriever.search(
    query="web vulnerabilities",
    filters={"category": "web", "tags": ["owasp"]}
)

# Similarity search with embeddings
similar = await retriever.find_similar(article_id="123")

Embedding System

Vector embeddings for semantic search.

from wish_knowledge.embeddings import EmbeddingGenerator

embedder = EmbeddingGenerator(model="text-embedding-ada-002")

# Generate embeddings
text = "SQL injection vulnerability in login form"
embedding = await embedder.generate(text)

# Batch processing
texts = ["XSS attack", "CSRF protection", "SQLi prevention"]
embeddings = await embedder.generate_batch(texts)

Knowledge Sources

Built-in Sources

HackTricks

  • Comprehensive pentesting methodology
  • Attack techniques and tools
  • Platform-specific guides
  • Cloud security techniques

OWASP Knowledge

  • Web application security
  • Top 10 vulnerabilities
  • Testing guides
  • Cheat sheets

Custom Sources

  • Internal playbooks
  • Team-specific techniques
  • Client-specific knowledge
  • Historical findings

Adding Custom Knowledge

from wish_knowledge import KnowledgeBase, Article

# Create knowledge base
kb = KnowledgeBase(name="Team Playbook")

# Add articles programmatically
article = Article(
    title="Internal Network Pentesting Guide",
    content="""
    ## Overview
    This guide covers our standard approach...
    
    ## Methodology
    1. Network discovery
    2. Service enumeration
    ...
    """,
    tags=["internal", "network", "methodology"],
    category="methodology"
)
kb.add_article(article)

# Load from files
kb.load_from_markdown("playbooks/")

# Load from URL
kb.load_from_url("https://example.com/techniques.json")

Search and Retrieval

Search Options

# Full-text search
results = await retriever.search(
    query="buffer overflow",
    search_type="full_text"
)

# Semantic search
results = await retriever.search(
    query="how to exploit weak authentication",
    search_type="semantic"
)

# Hybrid search (combines full-text and semantic)
results = await retriever.search(
    query="windows privilege escalation",
    search_type="hybrid",
    weights={"full_text": 0.3, "semantic": 0.7}
)

Filtering and Ranking

# Advanced filtering
results = await retriever.search(
    query="exploitation",
    filters={
        "category": ["web", "network"],
        "tags": {"include": ["owasp"], "exclude": ["outdated"]},
        "date_range": {"after": "2023-01-01"}
    },
    sort_by="relevance",
    top_k=10
)

# Custom ranking
def custom_ranker(results):
    # Boost results based on custom criteria
    for result in results:
        if "critical" in result.tags:
            result.score *= 1.5
    return sorted(results, key=lambda x: x.score, reverse=True)

results = await retriever.search(query, ranker=custom_ranker)

Integration with wish-ai

The knowledge system integrates seamlessly with wish-ai for enhanced AI responses:

from wish_ai import ContextBuilder
from wish_knowledge import Retriever

# Configure context builder with retriever
retriever = Retriever()
context_builder = ContextBuilder(retriever=retriever)

# Build context with relevant knowledge
context = await context_builder.build_context(
    user_input="How do I test for SQL injection?",
    engagement_state=engagement_state
)

# Context now includes relevant HackTricks articles

Development Guide

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=src --cov-report=html

# Run specific test file
uv run pytest tests/test_retriever.py

# Run integration tests
uv run pytest tests/integration/

Project Structure

packages/wish-knowledge/
├── src/wish_knowledge/
│   ├── __init__.py           # Package exports
│   ├── manager.py            # Knowledge manager
│   ├── retriever.py          # Retrieval system
│   ├── knowledge_base.py     # Base knowledge class
│   ├── loaders/              # Knowledge loaders
│   │   ├── __init__.py
│   │   ├── hacktricks.py
│   │   ├── markdown.py
│   │   └── json.py
│   ├── embeddings/           # Embedding system
│   │   ├── __init__.py
│   │   ├── generator.py
│   │   └── models.py
│   └── search/               # Search implementations
│       ├── __init__.py
│       ├── full_text.py
│       └── semantic.py
├── tests/
│   ├── test_manager.py
│   ├── test_retriever.py
│   └── fixtures/
└── README.md

Configuration

Knowledge Base Settings

Configure in ~/.wish/config.toml:

[knowledge]
# Default search settings
default_top_k = 5
default_search_type = "hybrid"

# Embedding settings
[knowledge.embeddings]
model = "text-embedding-ada-002"
cache_embeddings = true
batch_size = 100

# HackTricks settings
[knowledge.hacktricks]
auto_update = true
update_interval = 86400  # Daily updates
local_path = "~/.wish/knowledge/hacktricks"

# Custom knowledge bases
[knowledge.custom]
paths = [
    "~/pentest-playbooks",
    "/opt/wish/knowledge"
]

Performance Optimization

[knowledge.performance]
# Index settings
index_type = "faiss"  # or "annoy", "simple"
index_threads = 4

# Cache settings
cache_size_mb = 512
cache_ttl_seconds = 3600

# Search optimization
max_results_per_source = 100
parallel_search = true

API Reference

KnowledgeManager

  • add_knowledge_base(kb): Add a knowledge base
  • remove_knowledge_base(name): Remove a knowledge base
  • update_indices(): Update search indices
  • get_statistics(): Get knowledge base statistics
  • clear_cache(): Clear search cache

Retriever

  • search(query, **kwargs): Search across knowledge bases
  • get_contextual_suggestions(context): Get context-aware suggestions
  • find_similar(article_id): Find similar articles
  • get_article(article_id): Get specific article

KnowledgeBase

  • add_article(article): Add an article
  • remove_article(article_id): Remove an article
  • update_article(article_id, updates): Update an article
  • load_from_markdown(path): Load from markdown files
  • export_to_json(): Export knowledge base

License

This project is published under [appropriate license].

Related Packages

  • wish-models: Core data models and validation
  • wish-core: State management and event processing
  • wish-ai: AI-driven inference engine
  • wish-tools: Pentest tool integration
  • wish-c2: C2 server integration
  • wish-cli: Command line interface

Support

If you have issues or questions, you can get support through:

  • Issues: Bug reports and feature requests
  • Discussions: General questions and discussions
  • Documentation: Knowledge base integration guides

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wish_knowledge-0.7.0.tar.gz (18.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wish_knowledge-0.7.0-py3-none-any.whl (18.9 kB view details)

Uploaded Python 3

File details

Details for the file wish_knowledge-0.7.0.tar.gz.

File metadata

  • Download URL: wish_knowledge-0.7.0.tar.gz
  • Upload date:
  • Size: 18.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wish_knowledge-0.7.0.tar.gz
Algorithm Hash digest
SHA256 abd3cfd9501db01b4f7f49e25966d37665ab16eee62f627bc1636b18a725b3c8
MD5 bd3a0126f00835d3569dfb405dcf5c4d
BLAKE2b-256 52d8c56a8f5ffcd6121c9abd5c774b799199f533cc420fcb2b2fb2052e751abc

See more details on using hashes here.

File details

Details for the file wish_knowledge-0.7.0-py3-none-any.whl.

File metadata

  • Download URL: wish_knowledge-0.7.0-py3-none-any.whl
  • Upload date:
  • Size: 18.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.11

File hashes

Hashes for wish_knowledge-0.7.0-py3-none-any.whl
Algorithm Hash digest
SHA256 29b61bbfdeda93a4ba5fc83b82afafe92ba0b521ba6ecb9ff22cfb1f51074768
MD5 d82ff9c91872c44f9fb6372fe0cf411e
BLAKE2b-256 da9e759f6bb9539701a903e574c4f4a394bba3b08fd7778cd9187f8232e4de23

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page