Skip to main content

Local self-learning RAG system for Wireshark logs

Project description

🕵️ Wireshark Local RAG Analyst

A fully local, self-learning Retrieval-Augmented Generation (RAG) pipeline for analyzing Wireshark .pcap logs using local LLMs — with built-in support for Hugging Face models, learning feedback, and REST API access.


🚀 Features

  • 📁 Drop-in Folder Watcher: Automatically processes .pcap logs when dropped into a folder.
  • 🔍 Protocol-Aware Preprocessing: Extracts HTTP, DNS, TCP, and other traffic.
  • 🧠 Self-Learning Engine: Remembers helpful answers and improves responses over time.
  • 📚 RAG Pipeline: Uses local vector database (FAISS) with sentence-transformer embeddings.
  • 🤖 Local LLM Integration: Supports both:
    • 🧱 Ollama-based LLMs (e.g., LLaMA, Mistral, GPT4All)
    • 🤗 Hugging Face models (e.g., Mistral-7B, Zephyr, Falcon)
  • 🌐 API Access (MCP-style): Query the system over HTTP from other clients.
  • 💻 Extensible CLI: Flexible CLI and script interface for querying and development.
  • 📦 PyPI-Ready: Fully packageable and publishable as a pip package.
  • CI/CD Support: GitHub Actions pipelines for testing and PyPI publishing.

📂 Project Structure

. 
   ├── app/                # Core logic
   ├── scripts/            # CLI scripts
   ├── config/             # Config files
   ├── data/               # Vector DB & learned responses
   ├── logs/               # Input PCAPs
   ├── processed/          # Archived PCAPs
   ├── .github/workflows/  # CI/CD workflows

⚙️ Installation

  1. Clone the repo
git clone https://github.com/pkbythebay29/wireshark-local-rag-analyst.git
cd wireshark-local-rag-analyst
  1. Install Dependencies
pip install -r requirements.txt
  1. Configuration

Edit config/config.yaml:

protocol_filter: ["http", "dns"]
learning: true

llm_backend: "ollama"       # or "huggingface"
llm_model: "llama3"         # or "mistralai/Mistral-7B-Instruct-v0.1"

vector_db_path: "./data/faiss.index"
learned_store: "./data/learned_data.jsonl"
  1. Usage
wireshark-watch

or

python scripts/run_pipeline.py

Drop .pcap files into the logs/ folder — they will be processed automatically.

#5. Ask questions

wireshark-query

or

python scripts/query_logs.py

Example questions:

  • What HTTP requests failed with 404?
  • Show DNS queries to suspicious domains.
  • Were there any TCP handshakes that failed?
  1. Use as a REST API (MCP server)
python -m app.mcp_server
curl -X POST http://localhost:8080/query \
    -H "Content-Type: application/json" \
    -d '{"query": "Show me failed DNS requests"}'
  1. 🤗 Hugging Face Model Support

You can fork, fine-tune, or use any Hugging Face model by editing config.yaml

  1. Acknowledgements

🙏 Acknowledgements

This project stands on the shoulders of open-source giants:

   Wireshark/tshark – For deep packet inspection

   FAISS – Vector search from Meta

   Sentence Transformers – Fast embeddings from Hugging Face

   Ollama – Local model hosting made easy

   Transformers – Hugging Face's interface to modern LLMs

   Watchdog, FastAPI, Uvicorn, and many more

Thank you to all the contributors making open-source incredible.
  1. 🗺️ Roadmap

See ROADMAP.md for planned features and timeline.

#8. License 📜 License MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

wireshark_rag_analyst-0.1.0.tar.gz (8.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

wireshark_rag_analyst-0.1.0-py3-none-any.whl (8.6 kB view details)

Uploaded Python 3

File details

Details for the file wireshark_rag_analyst-0.1.0.tar.gz.

File metadata

  • Download URL: wireshark_rag_analyst-0.1.0.tar.gz
  • Upload date:
  • Size: 8.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.10.16

File hashes

Hashes for wireshark_rag_analyst-0.1.0.tar.gz
Algorithm Hash digest
SHA256 607dba8d902b94840d6c4494676c02ca3bd781c90a620f4376d6575adfd6e872
MD5 14adb5e318faf582c5f680b36a8170d1
BLAKE2b-256 806daca48f7caf9b228adb48207f798deae6ab7abf9151db8aaa6237bb2769f1

See more details on using hashes here.

File details

Details for the file wireshark_rag_analyst-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for wireshark_rag_analyst-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dc20bcdf8b69ef4c4afaaa242542985c07d92a0cc7df797f7bb8b03f6c7b7342
MD5 31cfbe6b625e4eb18f2e562298f19d11
BLAKE2b-256 458d9359de0ad3806f50e1464f8fcd0615a852f991594f91188d13ef18deeb6d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page