Skip to main content

A comprehensive security scanner and RAG-based vulnerability analyzer

Project description

CyberSec Scanner

Tests Python License

A comprehensive, modular security scanning toolkit for detecting secrets, vulnerabilities, and misconfigurations in Git repositories, web applications, and browser extensions. Features multi-scanner architecture, RAG-powered analysis, and both SDK and CLI interfaces.

Use Responsibly: This tool is for authorized security testing only. Always obtain proper permission before scanning applications you don't own.

Table of Contents

Features

Multi-Scanner Architecture

  • Git Scanner: Detect secrets in commit history using efficient pickaxe search
  • Web Crawler: Discover exposed endpoints, analyze JavaScript files and source maps
  • Browser Scanner: Inspect localStorage, sessionStorage, cookies via Playwright
  • Network Scanner: Real-time HTTPS traffic inspection with MITM proxy

RAG-Powered Analysis

  • Knowledge Graph: NetworkX-based relationship mapping between findings, files, and vulnerabilities
  • Semantic Search: Vector-based retrieval for similar security patterns
  • LLM Integration: Natural language queries powered by Ollama (Gemma, Llama, etc.)
  • CWE Enrichment: Automatic mapping to Common Weakness Enumeration

Detection Coverage

  • 58+ Built-in Patterns: AWS, OpenAI, Stripe, GitHub, Azure, Google Cloud, databases, and more
  • Entropy Analysis: High-entropy string detection for unknown secrets
  • Custom Patterns: Extensible regex-based pattern system via patterns.env
  • Contextual Severity: Smart severity assignment based on exposure context

Flexible Usage

  • CLI Application: Full-featured command-line interface with 7 commands
  • Python SDK: Use scanners independently or together in your own code
  • YAML Configuration: Simple config files replace long CLI arguments
  • Modular Design: Import only what you need, lazy loading for optional dependencies

Installation

From PyPI (Recommended)

pip install cybersec-scanner

From Source

git clone https://github.com/AnubhavChoudhery/cybersec-scanner.git
cd cybersec-scanner
pip install -e .

Optional Dependencies

# For MITM proxy (HTTPS traffic inspection)
pip install cybersec-scanner[mitm]

# For browser runtime inspection (Playwright)
pip install cybersec-scanner[browser]

# For vector search (RAG features)
pip install cybersec-scanner[vector]

# Install everything
pip install cybersec-scanner[all]

# For development
pip install cybersec-scanner[dev]

Note: The base installation includes Git scanning, web crawling, and RAG analysis. MITM and browser features require additional dependencies.

System Requirements

  • Python 3.11 or higher
  • Git (for git history scanning)
  • mitmproxy 10.0+ (for HTTPS inspection - optional)
  • Playwright (for browser inspection - optional)

Quick Start

Prerequisites for RAG Queries

To use the query command with LLM-powered analysis, you need Ollama:

# Install Ollama (https://ollama.com)
# Linux/Mac:
curl -fsSL https://ollama.com/install.sh | sh
# Windows: Download from https://ollama.com

# Pull the default model
ollama pull gemma3:1b

# Start Ollama (keep running in background)
ollama serve

Complete Scan-to-Query Workflow

# 1. Install the scanner
pip install cybersec-scanner

# 2. Download patterns file (required for detection)
curl -o patterns.env https://raw.githubusercontent.com/AnubhavChoudhery/cybersec-scanner/main/patterns.env

# 3. Scan your project (Git history)
cybersec-scanner scan --git --root . --output audit_report.json --enable-rag

# 4. Query the findings
cybersec-scanner query "What secrets were found?" --audit audit_report.json

# 5. Save response to file
cybersec-scanner query "Summarize critical findings" --output summary.txt

CLI Usage

# Show all available commands
cybersec-scanner --help

# Initialize configuration file
cybersec-scanner init-config

# Scan a Git repository
cybersec-scanner scan-git /path/to/repo --max-commits 50

# Scan a web application
cybersec-scanner scan-web http://localhost:8000 --max-pages 100

# Scan MITM traffic logs
cybersec-scanner scan-mitm mitm_traffic.ndjson

# Full multi-scanner workflow
cybersec-scanner scan \
  --git \
  --web \
  --mitm \
  --runtime \
  --root . \
  --target http://localhost:8000 \
  --max-commits 50 \
  --mitm-traffic mitm_traffic.ndjson \
  --output audit_report.json \
  --enable-rag

# Query findings with RAG
cybersec-scanner query "What API keys were found?" --audit audit_report.json

# Build knowledge graph from existing report
cybersec-scanner build-graph audit_report.json

# Check version
cybersec-scanner version

MITM Proxy Workflow

The scanner provides interactive MITM inspection that captures real HTTP/HTTPS traffic. Traffic files are automatically shared between scanner and backend via a temp directory.

1. Add MITM injection to your backend (one-time setup):

# backend/app/main.py - MUST BE FIRST IMPORT
from cybersec_scanner.scanners.inject_mitm_proxy import inject_mitm_proxy_advanced

# No path needed - uses shared temp location automatically
inject_mitm_proxy_advanced()

# Now import your framework
from fastapi import FastAPI
# ... rest of your code

2. Run the scanner with MITM enabled:

# No --mitm-traffic flag needed - auto-discovers shared file
cybersec-scanner scan --mitm --output audit_report.json

3. Start your backend when prompted:

The scanner will start the MITM proxy and wait for you to start your backend and exercise the app.

# In another terminal
uvicorn backend.app.main:app --reload

4. Test your application - make requests, test endpoints

5. Press Ctrl+C in the scanner terminal when done

The scanner will parse all captured traffic and generate the audit report.

Traffic File Location:

  • Windows: C:\Users\<user>\AppData\Local\Temp\cybersec_scanner\mitm_traffic.ndjson
  • Linux/Mac: /tmp/cybersec_scanner/mitm_traffic.ndjson

Advanced: You can override with --mitm-traffic /custom/path.ndjson if needed.

Python SDK Usage

from cybersec_scanner import scan_git, scan_web, scan_all

# Scan a Git repository
findings = scan_git("/path/to/repo", max_commits=100)
print(f"Found {len(findings)} secrets in Git history")

# Scan a web application
web_findings = scan_web("http://localhost:8000", max_pages=300)

# Full scan with custom config
config = {
    "git": {
        "enabled": True,
        "repositories": ["/path/to/repo"],
        "max_commits": 100
    },
    "web": {
        "enabled": True,
        "target": "http://localhost:8000",
        "max_pages": 300
    },
    "output": {
        "file": "security_report.json"
    }
}

results = scan_all(config)

CLI Command Reference

Available Commands

Command Description
scan Run comprehensive scan with multiple scanners
scan-git Scan Git repository for committed secrets
scan-web Scan web application endpoints
scan-mitm Parse MITM traffic logs
query Query findings using RAG/LLM
build-graph Build knowledge graph from audit report
init-config Create default YAML configuration
version Show version information
install-cert Install mitmproxy CA certificate
start-proxy Start MITM proxy daemon

Scan Command Options

cybersec-scanner scan [OPTIONS]

Scanner Flags:

  • --git - Enable Git history scanner
  • --web - Enable web application scanner
  • --mitm - Enable MITM traffic analysis
  • --runtime - Enable browser runtime inspector (Playwright)

Scanner Configuration:

  • --root PATH - Root directory for Git scan (default: .)
  • --target URL - Target URL for web scan
  • --max-commits N - Maximum Git commits to scan (default: 50)
  • --mitm-traffic PATH - Path to MITM traffic NDJSON file

Output Options:

  • --output PATH, -o PATH - Output audit report file (default: audit_report.json)
  • --config PATH, -c PATH - Load settings from YAML config file
  • --enable-rag - Build knowledge graph after scan for RAG queries

Example - Full Scan:

cybersec-scanner scan \
  --git \
  --web \
  --mitm \
  --root ~/myproject \
  --target http://localhost:8000 \
  --max-commits 100 \
  --mitm-traffic mitm_traffic.ndjson \
  --output security_audit.json \
  --enable-rag

Query Command

cybersec-scanner query "your question" [OPTIONS]

Options:

  • --audit PATH - Audit report to build graph from (if graph doesn't exist)
  • --graph PATH - Existing knowledge graph file (default: rag/graph.gpickle)
  • --model NAME - Ollama model to use (default: gemma3:1b)
  • --top-k N - Number of findings to retrieve (default: 5)
  • --output PATH, -o PATH - Save LLM response to file

Example:

# Query with existing graph
cybersec-scanner query "What AWS credentials were found?"

# Build graph and query
cybersec-scanner query "List all high severity findings" --audit audit_report.json

# Use different model
cybersec-scanner query "Explain the security risks" --model llama3:8b

# Save response to file
cybersec-scanner query "Summarize critical findings" --output security_summary.txt

Individual Scanner Commands

Git Scanner:

cybersec-scanner scan-git [REPO_PATH] [OPTIONS]

# Options:
#   --max-commits N     Max commits to scan (default: 50)
#   --output PATH       Output JSON file

# Example:
cybersec-scanner scan-git . --max-commits 100 --output git_findings.json

Web Scanner:

cybersec-scanner scan-web URL [OPTIONS]

# Options:
#   --max-pages N       Max pages to crawl (default: 50)
#   --output PATH       Output JSON file

# Example:
cybersec-scanner scan-web http://localhost:3000 --max-pages 200

MITM Scanner:

cybersec-scanner scan-mitm TRAFFIC_FILE [OPTIONS]

# Options:
#   --output PATH       Output JSON file

# Example:
cybersec-scanner scan-mitm mitm_traffic.ndjson --output mitm_findings.json

Configuration File

Create a YAML config to avoid long command lines:

cybersec-scanner init-config --output my-config.yaml

Example my-config.yaml:

scanner:
  git:
    enabled: true
    root: "."
    max_commits: 100
  
  web:
    enabled: true
    target: "http://localhost:8000"
    max_pages: 300
  
  mitm:
    enabled: true
    traffic_file: "mitm_traffic.ndjson"
  
  runtime:
    enabled: false

output:
  file: "audit_report.json"

rag:
  enabled: true
  model: "gemma3:1b"

Usage:

cybersec-scanner scan --config my-config.yaml

Utility Commands

Build Knowledge Graph:

cybersec-scanner build-graph AUDIT_FILE [OPTIONS]

# Options:
#   --output PATH, -o    Output graph file (default: rag/graph.gpickle)

# Example:
cybersec-scanner build-graph audit_report.json --output my_graph.gpickle

Initialize Config File:

cybersec-scanner init-config [OPTIONS]

# Options:
#   --output PATH, -o    Output config file path (default: cybersec-config.yaml)

# Example:
cybersec-scanner init-config --output my-config.yaml

Show Version:

cybersec-scanner version

Install MITM Certificate:

cybersec-scanner install-cert [OPTIONS]

# Options:
#   --port PORT          MITM proxy port (informational, default: 8082)
#   --no-download        Skip HTTP download, use local cert only

# Example:
cybersec-scanner install-cert --port 8082

Start MITM Proxy:

cybersec-scanner start-proxy [OPTIONS]

# Options:
#   --port PORT          Proxy listen port (default: 8082)
#   --traffic-file PATH  Traffic log file path (default: temp dir auto-shared)

# Example:
cybersec-scanner start-proxy --port 9000 --traffic-file ./my_traffic.ndjson

📋 Required Files

IMPORTANT: Before running scans, you need these files adjacent to your working directory (where you run the scanner):

1. patterns.env (REQUIRED)

This file contains regex patterns for detecting secrets. Copy it from the repository root:

# If you installed from source
cp patterns.env /your/project/directory/

# If you installed from PyPI, download from GitHub
curl -o patterns.env https://raw.githubusercontent.com/AnubhavChoudhery/cybersec-scanner/main/patterns.env

The file includes 58+ detection patterns for major providers:

AWS_ACCESS_KEY_ID=AKIA[0-9A-Z]{16}
OPENAI_API_KEY=sk-[a-zA-Z0-9]{20,}
STRIPE_SECRET_KEY=sk_live_[0-9a-zA-Z]{24,}
GITHUB_TOKEN=ghp_[0-9a-zA-Z]{36}
# ... and 54 more patterns

Security Note: This file is excluded from git by default to avoid triggering security scanners. Never commit actual secrets to this file!

Chrome_Ext/
├── local_check.py              # Main orchestrator
├── config.py                   # Configuration and patterns
├── utils.py                    # Utility functions
├── patterns.env                # Secret detection patterns (user-configured)
├── inject_mitm_proxy.py        # MITM proxy injection module
├── install_mitm_cert.py        # Certificate installation helper
├── scanners/
│   ├── git_scanner.py         # Git history analysis
│   ├── web_crawler.py         # HTTP endpoint scanning
│   ├── browser_scanner.py     # Playwright runtime inspection
│   └── network_scanner.py     # MITM proxy traffic analysis
└── audit_report.json          # Output report (generated)

Installation

System Requirements

  • Python 3.8 or higher
  • Git (for git history scanning)
  • mitmproxy 10.0+ (for HTTPS inspection)
  • Modern web browser (for Playwright scanner)

Required Dependencies

pip install -r requirements.txt

If requirements.txt is not available, install manually:

pip install requests colorama

Optional Dependencies

For HTTPS Traffic Inspection

# Install mitmproxy
pip install mitmproxy

# Verify installation
mitmdump --version

For Browser Runtime Inspection

pip install playwright
python -m playwright install

For Network Packet Capture (Advanced)

pip install scapy

# Windows: Install Npcap from https://npcap.com/
# Linux/Mac: May require libpcap

Quick Start

Initial Setup

  1. Clone or download the repository

  2. Set up pattern file (REQUIRED before first run)

# Copy the patterns file template
cp patterns.env.example patterns.env

# The file includes 58+ detection patterns for major providers
# Edit patterns.env to customize or add patterns (optional)
  1. Verify setup
python -c "from config import KNOWN_PATTERNS; print(f'Loaded {len(KNOWN_PATTERNS)} patterns')"

Expected output: Loaded 58 patterns (or similar)

Basic Usage

# Scan with default settings
python local_check.py --target http://localhost:8000 --root /path/to/project

# Generate audit report
cat audit_report.json

MITM Proxy Setup

The MITM (Man-in-the-Middle) proxy feature allows inspection of HTTPS traffic in real-time, including request/response headers and bodies.

Prerequisites

  1. Install mitmproxy
pip install mitmproxy

# Verify installation
mitmdump --version
  1. Copy required files to your backend
# From the Chrome_Ext directory
cp inject_mitm_proxy.py /path/to/your/backend/app/
cp patterns.env /path/to/your/backend/app/

Backend Integration

CRITICAL: Add the import statement as the VERY FIRST LINE of your main application file. This is not optional - the import MUST come before any other imports (Flask, FastAPI, Django, etc.) for the MITM proxy to properly intercept HTTP libraries.

The inject_mitm_proxy module automatically:

  1. Starts a proxy server on port 8082 (configurable via MITM_PROXY_PORT)
  2. Patches HTTP libraries (requests, httpx, urllib, urllib3, aiohttp) to route through proxy
  3. Inspects all outbound HTTP/HTTPS traffic for security issues
  4. Logs traffic to mitm_traffic.ndjson in the same directory
  5. Bypasses specific domains (AWS, OAuth, AI providers) to prevent authentication issues

For FastAPI:

# backend/app/main.py
import inject_mitm_proxy  # MUST BE FIRST IMPORT (before FastAPI, before everything!)

from fastapi import FastAPI  # This comes AFTER inject_mitm_proxy
from fastapi.middleware.cors import CORSMiddleware
# ... rest of your imports

app = FastAPI()
# ... rest of your code

For Flask:

# backend/app.py
import inject_mitm_proxy  # MUST BE FIRST IMPORT (before Flask, before everything!)

from flask import Flask  # This comes AFTER inject_mitm_proxy
from flask_cors import CORS
# ... rest of your imports

app = Flask(__name__)
# ... rest of your code

For Django:

# backend/manage.py or wsgi.py
import inject_mitm_proxy  # MUST BE FIRST IMPORT (before Django, before everything!)

import os  # This comes AFTER inject_mitm_proxy
from django.core.wsgi import get_wsgi_application
# ... rest of Django setup

Why FIRST import matters: The module patches HTTP libraries at import time. If Flask/FastAPI/Django import first, their HTTP clients won't be patched, and traffic won't be intercepted.

Running with MITM Proxy

  1. Start your backend application
# No environment variables needed - proxy is always enabled
# Just start your backend normally
uvicorn app.main:app --reload  # FastAPI example

You should see:

[MITM] Proxy active on http://127.0.0.1:8082
[MITM] Bypass mode: AWS, OAuth, AI providers, payments, CDNs
[MITM] Patched libraries: requests, httpx, urllib, urllib3, aiohttp
  1. Run the security scanner
# In a new terminal, run the scanner with MITM enabled
python local_check.py \
  --target http://localhost:8000 \
  --enable-mitm \
  --mitm-port 8082
  1. Interact with your application (make HTTP requests, use API endpoints, etc.)

  2. Stop the scanner (Ctrl+C) to generate the audit report

  3. Review results

# View audit report
cat audit_report.json

# View traffic log (raw NDJSON)
cat mitm_traffic.ndjson

MITM Proxy Detection Capabilities

The MITM proxy inspects both requests and responses for security issues:

Request-Side Detection:

  • Credentials embedded in URLs (user:pass@domain)
  • API keys in query parameters (?api_key=xxx)
  • Basic Authentication headers (base64 credentials)
  • API keys in Authorization headers (with context awareness)
  • Plaintext passwords in request bodies (excludes bcrypt/argon2 hashes)
  • Secrets matching any of the 58+ patterns

Response-Side Detection:

  • Secrets leaked in response headers
  • API keys in response bodies (JSON, HTML, JavaScript)
  • Credentials in error messages
  • Database connection strings in stack traces
  • Debug information containing sensitive data

Severity Levels:

  • CRITICAL: API keys in URLs, credentials over HTTP, plaintext passwords
  • HIGH: API keys in headers over HTTPS (with expected auth disclaimer)
  • INFO: Normal traffic logging (not a security issue)

MITM Proxy Configuration

The inject_mitm_proxy.py module works automatically when imported. The only optional configuration is:

# Set custom MITM proxy port (default: 8082)
export MITM_PROXY_PORT=9000

No other environment variables needed - the proxy runs in full mode by default with intelligent domain bypass.

Domain Bypass Configuration

By default, the following domains bypass the MITM proxy to prevent authentication and SSL issues:

OAuth Providers:

  • accounts.google.com, oauth2.googleapis.com, login.microsoftonline.com

AI Providers:

  • api.openai.com, openai.com
  • api.anthropic.com, anthropic.com
  • api.groq.com, groq.com
  • api.mistral.ai, mistral.ai
  • api-inference.huggingface.co, huggingface.co
  • api.cohere.ai, replicate.com, together.xyz, anyscale.com, perplexity.ai

AWS Services:

  • All *.amazonaws.com domains
  • API Gateway, Lambda, S3, CloudFront

Payment Providers:

  • stripe.com, paypal.com

CDNs:

  • cloudflare.com, cloudfront.net

Localhost:

  • 127.0.0.1, localhost

To modify bypass rules, edit the BYPASS_DOMAINS and AWS_SUFFIXES sets in inject_mitm_proxy.py.

Uninstalling MITM Proxy

To remove MITM proxy from your backend:

  1. Remove or comment out the import:
# import inject_mitm_proxy  # Disabled
  1. Restart your backend application

The proxy is only active when the module is imported.

Configuration

Pattern File (patterns.env)

The patterns.env file contains regular expressions for detecting secrets. This file is excluded from version control to prevent triggering GitHub security alerts.

Format:

PATTERN_NAME=regex_pattern

Adding custom patterns:

# Edit patterns.env
nano patterns.env

# Add your pattern
MY_CUSTOM_KEY=mykey_[0-9a-f]{32}

# Reload the scanner
python local_check.py --target http://localhost:8000

Configuration File (config.py)

Entropy Threshold:

ENTROPY_THRESHOLD = 3.5  # Shannon entropy for randomness detection

File Exclusions:

EXCLUDE_SUFFIXES = {
    '.png', '.jpg', '.jpeg', '.gif', '.bmp', '.ico',
    '.zip', '.tar', '.gz', '.pdf', '.exe', '.dll'
}

Probe Paths (for web crawler):

PROBE_PATHS = [
    '/.env', '/.env.local', '/.env.production',
    '/.git/config', '/.git/HEAD',
    '/config.php.bak', '/backup.sql'
]

Usage

Command-Line Options

python local_check.py [OPTIONS]

Core Options:

Option Type Default Description
--target, -t URL http://localhost:8000 Target application URL
--root, -r Path . Repository root for static analysis
--out, -o Path audit_report.json Output report filename

Scanner Options:

Option Type Default Description
--depth Integer 300 Maximum pages to crawl
--enable-playwright Flag False Enable browser runtime inspection
--enable-pcap Flag False Enable packet capture (requires root)
--pcap-timeout Integer 12 Packet capture duration (seconds)

MITM Proxy Options:

Option Type Default Description
--enable-mitm Flag False Enable MITM proxy for HTTPS inspection
--mitm-port Integer 8082 MITM proxy port
--mitm-duration Integer 0 Auto-stop after N seconds (0 = manual)
--mitm-traffic Path Auto-detect Custom path to traffic NDJSON file

Usage Examples

Basic scan:

python local_check.py --target http://localhost:8000 --root /path/to/project

Full scan with all features:

python local_check.py \
  --target http://localhost:3000 \
  --root ~/myapp \
  --enable-playwright \
  --enable-mitm \
  --depth 500 \
  --out security_report.json

MITM-only scan (skip static/git):

python local_check.py \
  --target http://localhost:8000 \
  --enable-mitm \
  --mitm-duration 30

Custom traffic log location:

python local_check.py \
  --target http://localhost:8000 \
  --enable-mitm \
  --mitm-traffic /custom/path/to/traffic.ndjson

Scanner Modules

1. Git Scanner (scanners/git_scanner.py)

Analyzes git commit history for leaked secrets using efficient pickaxe search.

Features:

  • Searches git history for known secret patterns
  • Uses git log -S<term> for 100x faster scanning than naive approaches
  • Examines up to 100 commits by default (configurable)
  • Scans added lines in diffs for pattern matches

Configuration:

scan_git_history(root, max_commits=100)

2. Web Crawler (scanners/web_crawler.py)

Crawls web application endpoints to discover exposed sensitive paths and analyze client-side code.

Features:

  • Discovers exposed .env, .git/config, backup files
  • Analyzes JavaScript files for hardcoded secrets
  • Extracts and scans source maps
  • Checks HTTP headers and cookies for leaked secrets
  • Detects catch-all responses (false positives)
  • Multi-threaded crawling with process pool for regex scanning

Configuration:

crawler = LocalCrawler(
    base="http://localhost:8000",
    timeout=6,
    max_pages=300,
    workers=8,
    max_js_size=500_000  # Skip large JS bundles
)

3. Browser Scanner (scanners/browser_scanner.py)

Uses Playwright to inspect browser runtime state and client-side storage.

Features:

  • Extracts localStorage contents
  • Extracts sessionStorage contents
  • Retrieves all cookies
  • Checks global variables (window.__ENV, window.config, window.API_KEY)

Requirements:

pip install playwright
python -m playwright install

Usage:

playwright_inspect("http://localhost:8000")

4. Network Scanner (scanners/network_scanner.py)

Runs mitmproxy addon for deep packet inspection (Layer 2).

Features:

  • Intercepts HTTP/HTTPS traffic at the proxy level
  • Pattern matching on request/response bodies
  • Security header validation
  • Works alongside inject_mitm_proxy.py (Layer 1)

Note: Most users will use inject_mitm_proxy.py for MITM inspection. This module provides additional addon-based analysis.

Output Format

Audit Report (audit_report.json)

{
  "timestamp": "2025-11-18T13:34:34.106644",
  "target": "http://localhost:8000",
  "stats": {
    "git_secrets": 0,
    "crawler_issues": 2,
    "browser_issues": 0,
    "mitm_proxied": 15,
    "mitm_bypassed": 3,
    "mitm_security_findings": 1
  },
  "severities": {
    "CRITICAL": 0,
    "HIGH": 1,
    "MEDIUM": 0,
    "LOW": 0,
    "INFO": 15
  },
  "findings": [
    {
      "type": "api_key_in_header",
      "severity": "HIGH",
      "timestamp": 1763494461,
      "timestamp_human": "2025-11-18 13:34:21",
      "description": "GROQ_API_KEY in Authorization header over HTTPS (expected for server-side API calls, review if unexpected)",
      "url": "https://api.groq.com/openai/v1/chat/completions",
      "client": "requests",
      "method": "post",
      "pattern": "GROQ_API_KEY",
      "header": "Authorization"
    }
  ]
}

Traffic Log (mitm_traffic.ndjson)

NDJSON (newline-delimited JSON) format for append-only logging:

{"ts": 1763494398, "timestamp": "2025-11-18 13:33:18", "stage": "mitm_outbound", "client": "requests", "method": "post", "url": "https://api.example.com/endpoint"}
{"ts": 1763494461, "timestamp": "2025-11-18 13:34:21", "stage": "security_finding", "severity": "HIGH", "type": "api_key_in_header", "pattern": "GROQ_API_KEY", "description": "...", "url": "...", "client": "requests", "method": "post", "header": "Authorization"}

Stages:

  • mitm_outbound: Request sent through proxy
  • mitm_bypass: Request bypassed proxy (OAuth, AWS, etc.)
  • security_finding: Security issue detected

Advanced Usage

Custom Pattern Detection

Create a custom pattern file:

# Create custom-patterns.env
cat > custom-patterns.env << EOF
CUSTOM_API_KEY=custom_[0-9a-f]{32}
INTERNAL_TOKEN=int_tok_[A-Za-z0-9]{24}
EOF

# Edit config.py to load from custom file
# (Modify PATTERNS_FILE path in config.py)

Integrating with CI/CD

# .github/workflows/security-scan.yml
name: Security Audit
on: [push, pull_request]
jobs:
  scan:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v3
      - uses: actions/setup-python@v4
        with:
          python-version: '3.10'
      - run: pip install -r requirements.txt
      - run: cp patterns.env.example patterns.env
      - run: python local_check.py --target http://localhost:8000 --root .
      - run: |
          if jq -e '.severities.CRITICAL > 0' audit_report.json; then
            echo "CRITICAL issues found!"
            exit 1
          fi

Programmatic Usage

from scanners import scan_git_history, LocalCrawler, playwright_inspect

# Git scanning
git_findings = scan_git_history("/path/to/repo", max_commits=100)

# Web crawling
crawler = LocalCrawler("http://localhost:8000", max_pages=200)
crawler.probe_common_paths()
crawler.crawl()
web_findings = crawler.findings

# Browser inspection
browser_data = playwright_inspect("http://localhost:8000")

# Combine results
all_findings = git_findings + web_findings

Troubleshooting

"No module named 'requests'"

pip install requests

"patterns.env not found"

cp patterns.env.example patterns.env

"playwright-not-installed"

pip install playwright
python -m playwright install

"MITM proxy not loading patterns"

Issue: Backend shows WARNING: patterns.env not found

Solution:

# Verify patterns.env is in the same directory as inject_mitm_proxy.py
ls -la /path/to/backend/app/patterns.env

# If missing, copy it
cp patterns.env /path/to/backend/app/

"MITM proxy not intercepting traffic"

Issue: No traffic logged in mitm_traffic.ndjson

Solutions:

  1. Verify import is present and FIRST:
import inject_mitm_proxy  # MUST BE FIRST
# ... other imports
python app.py
# Should see: "[MITM] Proxy active on http://127.0.0.1:8082"
  1. Check proxy port matches:
# Scanner
python local_check.py --enable-mitm --mitm-port 8082

# Backend
export MITM_PROXY_PORT=8082

"Permission denied during packet capture"

# Linux/Mac
sudo python local_check.py --enable-pcap

# Windows
# Run terminal as Administrator

"Git scan is very slow"

This is normal for large repositories (100k+ commits). The tool limits to 100 commits by default. To adjust:

# Modify scanners/git_scanner.py
scan_git_history(root, max_commits=50)  # Reduce commit limit

"Too many false positives"

  1. Adjust entropy threshold in config.py:
ENTROPY_THRESHOLD = 4.0  # Higher = fewer false positives
  1. Add exclusions for known patterns:
# In config.py
EXCLUDE_PATTERNS = [
    r'test_api_key_123',  # Test keys
    r'example\.com',      # Example domains
]
  1. Filter by severity in audit report:
# Only show CRITICAL issues
jq '.findings[] | select(.severity == "CRITICAL")' audit_report.json

Security Considerations

Testing Your Own Applications Only

This tool is designed for security testing of applications you own or have explicit permission to test. Unauthorized scanning may violate laws and terms of service.

MITM Proxy Security

The MITM proxy disables SSL verification for testing purposes. This should only be used in development/testing environments, never in production.

Do NOT:

  • Use MITM proxy in production environments
  • Commit inject_mitm_proxy.py import to production code
  • Share MITM proxy logs (may contain sensitive data)

Best Practices:

  • Use environment variables to control MITM activation
  • Keep mitm_traffic.ndjson and audit_report.json out of version control (add to .gitignore)
  • Review and sanitize audit reports before sharing

Pattern File Security

The patterns.env file is excluded from version control by default (.gitignore) to avoid triggering GitHub security alerts on pattern signatures.

Do NOT:

  • Commit patterns.env to public repositories
  • Include actual secret values in pattern files
  • Share pattern files with untrusted parties

Version History

Version Changes
1.0.5 Enhanced scan output with phase headers, fixed query output to be human-readable (extracts text from response)
1.0.4 Colored CLI output with colorama, --output flag for query command, simplified MITM traffic auto-sharing
1.0.3 Unified MITM traffic file location via temp directory, added colorama dependency
1.0.2 Fixed MITM traffic file path resolution bug
1.0.1 Initial PyPI release with full scanner suite

License

MIT License - See LICENSE file for details.

Contributing

Contributions are welcome! Please follow these guidelines:

  1. Test your changes with multiple target applications
  2. Update documentation for new features
  3. Follow existing code style and structure
  4. Add tests for new scanner modules
  5. Ensure no secrets are committed in test files

Disclaimer

This tool is provided for lawful security testing only. Users are responsible for ensuring they have proper authorization before scanning any application. The authors assume no liability for misuse or unauthorized access.

Testing

Quick Test Commands

# Run all tests (auto-detects Ollama)
python run_tests.py

# Run all tests including LLM (requires Ollama)
python run_tests.py --all

# Fast tests only (no LLM)
python run_tests.py --fast

# With coverage report
python run_tests.py --coverage

# Specific test file
python run_tests.py --file retriever

Test Prerequisites

Core tests (no additional setup):

pip install pytest pytest-cov
pytest tests/ -v -k "not llm_client"

LLM tests (requires Ollama):

# Install Ollama
curl -fsSL https://ollama.com/install.sh | sh  # Linux/Mac
# Or download from https://ollama.com for Windows

# Pull model
ollama pull gemma3:1b

# Run all tests
pytest tests/ -v

Test Coverage

Component Tests Coverage
Knowledge Graph PASS 1 test 100%
CWE Enrichment PASS 1 test 100%
Database Normalizer PASS 5 tests 95%
Graph Retriever PASS 8 tests 100%
LLM Client PASS 8 tests 85%
End-to-End Pipeline PASS 2 tests Full flow
Total 24 tests ~90%

See tests/README.md for detailed testing documentation.

Support

For issues, questions, or contributions:

  • Open an issue on GitHub
  • Review existing issues before creating new ones
  • Provide detailed information (OS, Python version, error messages, steps to reproduce)

Made by the JBAC EdtEch Team (Jai Ansh Bindra and Anubhav Choudhery)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cybersec_scanner-1.0.5.tar.gz (157.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cybersec_scanner-1.0.5-py3-none-any.whl (73.0 kB view details)

Uploaded Python 3

File details

Details for the file cybersec_scanner-1.0.5.tar.gz.

File metadata

  • Download URL: cybersec_scanner-1.0.5.tar.gz
  • Upload date:
  • Size: 157.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.9

File hashes

Hashes for cybersec_scanner-1.0.5.tar.gz
Algorithm Hash digest
SHA256 ec90686e96154a12fa95ffb83eb09c6c0b8a5d44b0a5a50597fe6280c34405c6
MD5 685d074b724b9442c6084f3d6290a2c2
BLAKE2b-256 a3bc44b6e4186feecae0ea32011aa0a33e347e650cdf9cdc83b2c36060532b83

See more details on using hashes here.

File details

Details for the file cybersec_scanner-1.0.5-py3-none-any.whl.

File metadata

File hashes

Hashes for cybersec_scanner-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 b5798f160cae57eb76e6e39caaf8c585c76dabe8294ea9397dc806e5150dfe37
MD5 978a1c78d889835ea63d4345b0a2db20
BLAKE2b-256 0b2c82c4bebee92d1e887151e97a6b9b44f5e11adf8310d4bc14970ba8832632

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page