Skip to main content

Security scanner for the AI development lifecycle

Project description

VigilML

Security scanner for the AI development lifecycle.

PyPI version Python versions License Tests

pip install vigilml && vigilml scan .

What it catches

VigilML runs 7 scanners over your project and reports findings with a file, a line number, a severity, and a remediation. The table below lists specific detectors from each scanner, not a general summary.

Category What triggers it Severity
Hardcoded API keys openai-api-key, aws-secret-key, mongodb-connection-string patterns in .py, .ipynb, .env, Dockerfile, and 20+ other file types CRITICAL
Private keys and tokens RSA/EC/OpenSSH private key headers, Slack tokens, generic SECRET/KEY/TOKEN-named variables CRITICAL/MEDIUM
Unsafe deserialisation .pkl/.pickle/.joblib/.dill files on disk, pickle.load(), torch.load() without weights_only=True HIGH
Arbitrary code execution trust_remote_code=True, eval()/exec() with a non-literal argument, yaml.load() without SafeLoader CRITICAL
Cloud misconfiguration S3 ACL="public-read", S3 uploads without server-side encryption, IAM "Action": "*" wildcards HIGH
Insecure serving/build Docker containers running as root, Flask/FastAPI routes with no auth check, .run(debug=True) HIGH
Known dependency CVEs 140+ ML packages (torch, numpy, transformers, langchain, and more) checked against OSV.dev CRITICAL-LOW
Supply chain risk Typosquatting (pytorch instead of torch), deprecated packages, unpinned security-critical dependencies HIGH/MEDIUM
Unvalidated LLM input sys.argv/input()/web response content flowing into an LLM call CRITICAL/HIGH
Exposed system prompts API keys or internal URLs embedded in a system_prompt string literal HIGH/MEDIUM
Risky data handling HTTP (non-HTTPS) dataset downloads, downloads with no checksum verification, unverified load_dataset() sources HIGH/MEDIUM
PII exposure PII-indicator DataFrame columns (ssn, email_address), PII values passed to print()/logging calls MEDIUM/HIGH
Leaked notebook outputs Credentials, stack traces, or PII DataFrame previews committed inside a notebook's OUTPUT cells CRITICAL/HIGH
Risky notebook cells !pip install, !wget http://, %env TOKEN=... setting a real secret HIGH/LOW

Quick start

# Scan the current directory
vigilml scan .
# Scan with JSON output for CI/CD pipelines
vigilml scan . --json
# Run only specific scanners
vigilml scan . --scanners credentials,model_files

All CLI options

Flag Description
--scanners Comma-separated scanner names to run, or all (default all)
--json Output findings as JSON to stdout
--no-colour Disable ANSI colour codes
--quiet Print only the one-line summary
--stats-only Print only the summary panel, with no individual findings
--config Path to a .vigilml.yml config file
--version Print the installed version and exit
--help Show usage and all available options

Suppressing findings

VigilML supports three suppression comments, checked directly in your source files. All three require an explicit comment — there is no way to silently disable a finding without leaving a trace in the code.

Inline — suppresses a single line. Use this for one isolated false positive, such as a test fixture value that happens to match a credential pattern.

# Known false positive: fixture value used only in tests
TEST_API_KEY = "sk-test-51H8xJ2KL9mN3pQrStUvWxYz12345"  # vigilml: ignore

Block — suppresses every line between the two markers. Use this for several consecutive lines that are all false positives, such as a block of demo credentials in a tutorial notebook.

# vigilml: ignore-start
# Demo credentials for the onboarding notebook. Never real, rotated
# before every workshop.
DEMO_HF_TOKEN = "hf_demoTokenNotARealSecret1234567890"
DEMO_OPENAI_KEY = "sk-demo-not-a-real-openai-key-000000000000"
# vigilml: ignore-end

File-level — suppresses every finding in the file. Use this only when an entire file exists to contain example patterns, such as a scanner's own test fixtures or its pattern definitions.

# vigilml: ignore-file
"""Fixtures for the credential scanner's unit tests.

Every string below is a synthetic pattern the scanner is meant to
detect, not a real secret.
"""

CI/CD integration

Basic version — fails the build on any finding, of any severity:

name: Security scan
on: [push, pull_request]
jobs:
  vigilml:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install vigilml
      - run: vigilml scan . --no-colour

Strict version — narrows to the scanners whose findings are most often CRITICAL/HIGH (--scanners), and writes a config that raises every rule's min_severity to HIGH so the exit code reflects severity, not just presence:

name: Security scan (strict)
on: [push, pull_request]
jobs:
  vigilml-strict:
    runs-on: ubuntu-latest
    steps:
      - uses: actions/checkout@v4
      - uses: actions/setup-python@v5
        with:
          python-version: "3.11"
      - run: pip install vigilml
      - name: Write a CRITICAL/HIGH-only config
        run: |
          cat > .vigilml-strict.yml << 'EOF'
          version: 1
          rules:
            credentials:
              min_severity: HIGH
            model_files:
              min_severity: HIGH
            dependencies:
              min_severity: HIGH
            prompt_injection:
              min_severity: HIGH
          EOF
      - run: >
          vigilml scan . --config .vigilml-strict.yml
          --scanners credentials,model_files,dependencies,prompt_injection
          --no-colour

Real findings on real repos

Repo Author Stars Total findings Most notable finding type
nanoGPT Andrej Karpathy 38K+ 42 pii-logging (14 occurrences)
Hands-On ML (handson-ml3) Aurelien Geron 28K+ 104 env-var-in-llm-prompt (23 occurrences)
PyTorch-GAN - 16K+ 83 torch-load-without-weights-only (37 occurrences)
Approaching (Almost) Any ML Problem Abhishek Thakur 11K+ 443 Every finding is a dependency CVE (443 of 443)

All repos scanned with vigilml scan . on unmodified public code.

Available scanners

Scanner name Flag value What it detects
Credentials credentials Hardcoded API keys, tokens, and connection strings across 20+ file types
Model files model_files Unsafe deserialisation: pickle/joblib/dill files, unsafe torch.load()/yaml.load() calls
Cloud & infrastructure cloud S3/GCS/Azure misconfigurations, insecure Dockerfiles, unauthenticated model-serving endpoints
Dependencies dependencies Known CVEs in 140+ ML packages via OSV.dev, typosquatting, deprecated packages
Prompt injection prompt_injection User-controlled input flowing into LLM calls, exposed system prompts
Data pipeline data_pipeline Insecure dataset downloads, PII in DataFrame columns or logs, data leakage
Notebook risks notebook_risks Credentials, PII, and stack traces leaked in notebook cell OUTPUTS, risky notebook cells

Contributing

Clone the repository and install the development dependencies with pip install -e ".[dev]". Run the test suite with pytest tests/unit/ before submitting a change. Open an issue at github.com/sharmaamanrajesh/vigilml/issues before starting any large change.

Licence

MIT. See the GitHub repository for the full licence text. Free forever for individual use.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vigilml-0.2.3.tar.gz (53.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vigilml-0.2.3-py3-none-any.whl (59.3 kB view details)

Uploaded Python 3

File details

Details for the file vigilml-0.2.3.tar.gz.

File metadata

  • Download URL: vigilml-0.2.3.tar.gz
  • Upload date:
  • Size: 53.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vigilml-0.2.3.tar.gz
Algorithm Hash digest
SHA256 5a25228e6b27c678e79725f40fdde407b34f8455557266fb0e40cca8b24dcd02
MD5 953c24ef99242a8088a039127fddc4ce
BLAKE2b-256 e7397a871bf9b7dbb0716437fbf5b892148eaeaf964642202bd9b78abe4e4023

See more details on using hashes here.

File details

Details for the file vigilml-0.2.3-py3-none-any.whl.

File metadata

  • Download URL: vigilml-0.2.3-py3-none-any.whl
  • Upload date:
  • Size: 59.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for vigilml-0.2.3-py3-none-any.whl
Algorithm Hash digest
SHA256 16cc77122969260a0ed2ef93123d1ac8b47e6e2913deaa7e5f64644a338d4fde
MD5 95080cc75e51fb02b84c38ea4920d68c
BLAKE2b-256 382e8ef42967156cf97bb01f4be25bbe034e95dbc3e70e033f528489122320be

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page