Security scanner for the AI development lifecycle
Project description
VigilML
Security scanner for the AI development lifecycle.
pip install vigilml && vigilml scan .
What it catches
VigilML runs 7 scanners over your project and reports findings with a file, a line number, a severity, and a remediation. The table below lists specific detectors from each scanner, not a general summary.
| Category | What triggers it | Severity |
|---|---|---|
| Hardcoded API keys | openai-api-key, aws-secret-key, mongodb-connection-string patterns in .py, .ipynb, .env, Dockerfile, and 20+ other file types |
CRITICAL |
| Private keys and tokens | RSA/EC/OpenSSH private key headers, Slack tokens, generic SECRET/KEY/TOKEN-named variables |
CRITICAL/MEDIUM |
| Unsafe deserialisation | .pkl/.pickle/.joblib/.dill files on disk, pickle.load(), torch.load() without weights_only=True |
HIGH |
| Arbitrary code execution | trust_remote_code=True, eval()/exec() with a non-literal argument, yaml.load() without SafeLoader |
CRITICAL |
| Cloud misconfiguration | S3 ACL="public-read", S3 uploads without server-side encryption, IAM "Action": "*" wildcards |
HIGH |
| Insecure serving/build | Docker containers running as root, Flask/FastAPI routes with no auth check, .run(debug=True) |
HIGH |
| Known dependency CVEs | 140+ ML packages (torch, numpy, transformers, langchain, and more) checked against OSV.dev | CRITICAL-LOW |
| Supply chain risk | Typosquatting (pytorch instead of torch), deprecated packages, unpinned security-critical dependencies |
HIGH/MEDIUM |
| Unvalidated LLM input | sys.argv/input()/web response content flowing into an LLM call |
CRITICAL/HIGH |
| Exposed system prompts | API keys or internal URLs embedded in a system_prompt string literal |
HIGH/MEDIUM |
| Risky data handling | HTTP (non-HTTPS) dataset downloads, downloads with no checksum verification, unverified load_dataset() sources |
HIGH/MEDIUM |
| PII exposure | PII-indicator DataFrame columns (ssn, email_address), PII values passed to print()/logging calls |
MEDIUM/HIGH |
| Leaked notebook outputs | Credentials, stack traces, or PII DataFrame previews committed inside a notebook's OUTPUT cells | CRITICAL/HIGH |
| Risky notebook cells | !pip install, !wget http://, %env TOKEN=... setting a real secret |
HIGH/LOW |
Quick start
# Scan the current directory
vigilml scan .
# Scan with JSON output for CI/CD pipelines
vigilml scan . --json
# Run only specific scanners
vigilml scan . --scanners credentials,model_files
All CLI options
| Flag | Description |
|---|---|
--scanners |
Comma-separated scanner names to run, or all (default all) |
--json |
Output findings as JSON to stdout |
--no-colour |
Disable ANSI colour codes |
--quiet |
Print only the one-line summary |
--stats-only |
Print only the summary panel, with no individual findings |
--config |
Path to a .vigilml.yml config file |
--version |
Print the installed version and exit |
--help |
Show usage and all available options |
Suppressing findings
VigilML supports three suppression comments, checked directly in your source files. All three require an explicit comment — there is no way to silently disable a finding without leaving a trace in the code.
Inline — suppresses a single line. Use this for one isolated false positive, such as a test fixture value that happens to match a credential pattern.
# Known false positive: fixture value used only in tests
TEST_API_KEY = "sk-test-51H8xJ2KL9mN3pQrStUvWxYz12345" # vigilml: ignore
Block — suppresses every line between the two markers. Use this for several consecutive lines that are all false positives, such as a block of demo credentials in a tutorial notebook.
# vigilml: ignore-start
# Demo credentials for the onboarding notebook. Never real, rotated
# before every workshop.
DEMO_HF_TOKEN = "hf_demoTokenNotARealSecret1234567890"
DEMO_OPENAI_KEY = "sk-demo-not-a-real-openai-key-000000000000"
# vigilml: ignore-end
File-level — suppresses every finding in the file. Use this only when an entire file exists to contain example patterns, such as a scanner's own test fixtures or its pattern definitions.
# vigilml: ignore-file
"""Fixtures for the credential scanner's unit tests.
Every string below is a synthetic pattern the scanner is meant to
detect, not a real secret.
"""
CI/CD integration
Basic version — fails the build on any finding, of any severity:
name: Security scan
on: [push, pull_request]
jobs:
vigilml:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install vigilml
- run: vigilml scan . --no-colour
Strict version — narrows to the scanners whose findings are most often
CRITICAL/HIGH (--scanners), and writes a config that raises every
rule's min_severity to HIGH so the exit code reflects severity, not
just presence:
name: Security scan (strict)
on: [push, pull_request]
jobs:
vigilml-strict:
runs-on: ubuntu-latest
steps:
- uses: actions/checkout@v4
- uses: actions/setup-python@v5
with:
python-version: "3.11"
- run: pip install vigilml
- name: Write a CRITICAL/HIGH-only config
run: |
cat > .vigilml-strict.yml << 'EOF'
version: 1
rules:
credentials:
min_severity: HIGH
model_files:
min_severity: HIGH
dependencies:
min_severity: HIGH
prompt_injection:
min_severity: HIGH
EOF
- run: >
vigilml scan . --config .vigilml-strict.yml
--scanners credentials,model_files,dependencies,prompt_injection
--no-colour
Real findings on real repos
| Repo | Author | Stars | Total findings | Most notable finding type |
|---|---|---|---|---|
| nanoGPT | Andrej Karpathy | 38K+ | 42 | pii-logging (14 occurrences) |
| Hands-On ML (handson-ml3) | Aurelien Geron | 28K+ | 104 | env-var-in-llm-prompt (23 occurrences) |
| PyTorch-GAN | - | 16K+ | 83 | torch-load-without-weights-only (37 occurrences) |
| Approaching (Almost) Any ML Problem | Abhishek Thakur | 11K+ | 443 | Every finding is a dependency CVE (443 of 443) |
All repos scanned with vigilml scan . on unmodified public code.
Available scanners
| Scanner name | Flag value | What it detects |
|---|---|---|
| Credentials | credentials |
Hardcoded API keys, tokens, and connection strings across 20+ file types |
| Model files | model_files |
Unsafe deserialisation: pickle/joblib/dill files, unsafe torch.load()/yaml.load() calls |
| Cloud & infrastructure | cloud |
S3/GCS/Azure misconfigurations, insecure Dockerfiles, unauthenticated model-serving endpoints |
| Dependencies | dependencies |
Known CVEs in 140+ ML packages via OSV.dev, typosquatting, deprecated packages |
| Prompt injection | prompt_injection |
User-controlled input flowing into LLM calls, exposed system prompts |
| Data pipeline | data_pipeline |
Insecure dataset downloads, PII in DataFrame columns or logs, data leakage |
| Notebook risks | notebook_risks |
Credentials, PII, and stack traces leaked in notebook cell OUTPUTS, risky notebook cells |
Contributing
Clone the repository and install the development dependencies with
pip install -e ".[dev]". Run the test suite with pytest tests/unit/ before submitting a change. Open an issue at
github.com/sharmaamanrajesh/vigilml/issues
before starting any large change.
Licence
MIT. See the GitHub repository for the full licence text. Free forever for individual use.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vigilml-0.2.3.tar.gz.
File metadata
- Download URL: vigilml-0.2.3.tar.gz
- Upload date:
- Size: 53.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5a25228e6b27c678e79725f40fdde407b34f8455557266fb0e40cca8b24dcd02
|
|
| MD5 |
953c24ef99242a8088a039127fddc4ce
|
|
| BLAKE2b-256 |
e7397a871bf9b7dbb0716437fbf5b892148eaeaf964642202bd9b78abe4e4023
|
File details
Details for the file vigilml-0.2.3-py3-none-any.whl.
File metadata
- Download URL: vigilml-0.2.3-py3-none-any.whl
- Upload date:
- Size: 59.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16cc77122969260a0ed2ef93123d1ac8b47e6e2913deaa7e5f64644a338d4fde
|
|
| MD5 |
95080cc75e51fb02b84c38ea4920d68c
|
|
| BLAKE2b-256 |
382e8ef42967156cf97bb01f4be25bbe034e95dbc3e70e033f528489122320be
|