Discover and inventory autonomous AI agents across your infrastructure - static analysis, runtime detection, and Kubernetes monitoring
Project description
AgentDiscover Scanner
Open-Source AI Agent Discovery for the Enterprise
Part of the DefendAI platform for autonomous AI governance
The Finding That Matters
๐ป GHOST AGENT DETECTED
Workload: trading-bot (Deployment/default)
Connected: api.openai.com โ LIVE
SaaS: openai โ confirmed active connection
Source code: None found in scanned repositories
Owner: Unknown โ no deployment record, no code review
๐ป GHOST AGENT DETECTED
Workload: shadow-agent (Pod/kube-system)
Connected: api.anthropic.com โ LIVE
SaaS: anthropic โ confirmed | gcp โ active socket
Blast radius: HIGH (cloud provider access confirmed)
Source code: None found in scanned repositories
Owner: Unknown โ no deployment record, no code review
An AI system is making real API calls โ consuming tokens, potentially accessing sensitive data โ and your engineering team has no record of it. No code, no deployment, no owner. AgentDiscover Scanner finds these in under 60 seconds.
That's the problem. Your engineering team thinks they know what AI systems are running. They don't.
What Makes This Different
Most security tools tell you what's in your code. AgentDiscover Scanner tells you what's actually running โ and crucially, what's running that has no business being there.
The GHOST classification is unique: an AI system observed making real API calls with zero corresponding source code. No other static analysis tool can find this. No SIEM will alert on it. It only appears when you watch the runtime and cross-reference it against your codebase simultaneously.
As of v2.3.0, every detected agent also carries a SaaS blast radius โ a live-observed map of which services it's actively connected to, derived from network traffic not just configuration files.
crewai-agent (CONFIRMED)
saas_connections:
anthropic: confirmed โ active_connection observed
github: medium โ open socket
risk_flags: [cloud_credentials_present]
blast_radius: 70/100
Agent Classifications
| Classification | What It Means | Risk |
|---|---|---|
| ๐ป GHOST | Runtime AI activity โ no source code found | Critical |
| โ CONFIRMED | Detected in code AND observed running | High |
| โ ๏ธ UNKNOWN | Found in code, not yet observed at runtime | Medium |
| ๐ฅ๏ธ SHADOW AI | Known app using AI without governance | Medium |
| โ ๏ธ ZOMBIE | Was active, no longer observed | Low |
GHOST agents are the most dangerous finding. An AI system is making real API calls โ consuming tokens, potentially accessing sensitivneering team has no record of it. No code, no deployment, no owner.
Quick Start
pip install agent-discover-scanner
agent-discover-scanner scan-all /path/to/your/code --duration 30
For Kubernetes environments:
curl -fsSL https://raw.githubusercontent.com/Defend-AI-Tech-Inc/agent-discover-scanner/main/install.sh | sudo bash
agent-discover-scanner scan-all /path/to/code --daemon --output /var/log/defendai
To upload results to the DefendAI platform:
agent-discover-scanner scan-all /path/to/code \
--platform \
--api-key YOUR_API_KEY
How It Works
AgentDiscover Scanner runs four detection layers simultaneously and correlates them into a single agent inventory. Each layer sees something the others can't.
Layer 1 โ Source Code Analysis
Static analysis of Python and JavaScript/TypeScript. Detects LangChain, LangGraph, CrewAI, AutoGen, direct OpenAI/Anthropic/Gemini API usage, and any HTTP client targeting LLM endpoints. Handles import aliasing and iirect usage patterns. Generates SARIF output for CI/CD integration.
Layer 2 โ Live Network Monitoring
Passive observation of outbound connections to AI providers โ OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure OpenAI, AWS Bedrock, and vector stores. No packet capture. Identifies which process is making each connection, enabling per-agent SaaS attribution.
Layer 3 โ Kubernetes Runtime (eBPF)
Kernel-level visibility into pod behavior via Tetragon. Identifies which workloads are actively making AI calls โ including workloads with no corresponding source code. Works with any CNI. Falls back to Kubernetes API discovery if Tetragon is unavailable.
Layer 4 โ Endpoint Discovery
Scans developer machines, CI/CD runners, and workstations via osquery. Finds installed AI packages, desktop AI applications (ChatGPT Desktop, Claude Desktop, Cursor, GitHub Copilot), active connections, browser-based AI usage, and VSCode extensions.
SaaS Blast Radius Detection (v2.3.0+)
After correlation, each agent receives a saas_connections profile built from all four layers:
{
"detected": ["anthropic", "gcp", "github"],
"confirmed": ["anthropic"],
"evidence": {
"anthropic": ["active_connection", "open_socket"],
"gcp": ["open_socket"],
"github": ["vscode_extension_detected"]
},
"confidence": {
"anthropic": "confirmed",
"gcp": "medium",
"github": "medium"
},
"has_cloud_provider": true,
"has_llm_provider": true
}
confirmed means the connection was live-observed during the scan โ not inferred from config files. This is the difference between "this agent is configured to use Anthropic" and "this agent is calling Anthropic right now."
Example Output
๐ Scanning for autonomous AI agents...
๐ Analyzing source code at ./my-repo
๐ Monitoring live network connections...
โธ๏ธ Monitoring Kubernetes workloads...
๐ป Scanning endpoints...
๐ Correlating findings...
โ Correlation complete
๐ค Autonomous Agent Inventory
Classification | Count | Description
-----------------|-------|--------------------------------------------------
CONFIRMED | 2 | Active โ detected in code and observed at runtime
UNKNOWN | 3 | Code found โ not yet observed at runtime
SHADOW AI | 0 | Known app using AI โ review for governance
ZOMBIE | 0 | Inactive โ code exists but no recent activity
GHOST | 1 | โ Critical โ runtime activity with no source code
Risk Breakdown:
โ Critical: 1
โ High: 2
โ Medium: 3
โ Low: 0
โ
Scan complete โ results saved to ./defendai-results
Daemon Mode
Run continuously as a background service, updating the agent inventory every 30 seconds:
agent-discover-scanner scan-all /path/to/code \
--daemon \
--output /var/log/defendai \
--platform \
--platform-interval 5 # upload to platform every ~2.5 minutes
With --platform, the daemon syncs to the DefendAI platform every N correlation
cycles (default: every 5 cycles โ 2.5 minutes) and always uploads a final snapshot
on shutdown.
Install as a systemd service:
sudo bash deployment/systemd/install-service.sh /path/to/code
systemctl status defendai-scanner
Customizing Known Applications
By default the scanner classifies common desktop applications (browsers, Office 365, Cursor, Slack, Claude Desktop, etc.) as Shadow AI rather than GHOST when they make AI API calls. To add your own internal tools:
mkdir -p ~/.defendai
echo "my-internal-ai-tool" >> ~/.defendai/known_apps.txt
echo "company-llm-client" >> ~/.defendai/known_apps.txt
See docs/known-apps-example.txt for the full format.
When connected to the DefendAI platform (--platform flag),
the tenant-managed list is downloaded automatically on startup
and merged with your local overrides.
DefendAI Platform Integration
The scanner is the discovery layer. The platform is where discovered agents become governed agents.
agent-discover-scanner scan-all /path/to/code \
--platform \
--api-key YOUR_KEY \
--duration 30
When connected to the platform, each scan triggers the correlation engine which builds a living identity map across every machine, every environment, and every scan:
- Agent Identity Resolution โ the same CrewAI agent on a laptop, in staging k8s, and in prod k8s is recognized as one agent at different lifecycle stages
- Behavioral Drift Detection โ agent added
has_code_execution=truesince last week? That's a signal. Platform tracks it. - Cross-Machine Intelligence รขagent seen on 3 machines and crossed from dev into prod? Automatic risk escalation
- SaaS Blast Radius โ platform aggregates confirmed SaaS connections across all scans and computes blast radius score
After aew scans, the DefendAI platform report shows:
Agent Inventory Report โ acme-corp
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
shadow-agent GHOST CRITICAL anthropic, github blast: 85 machines: 3
โ GHOST seen in prction required
crewai-agent SHADOW MEDIUM openai blast: 25 machines: 1
โ Unreviewed โ no governance record
langchain-agent KNOWN LOW openai blast: 15 machines: 1
โ Approved โ monitoring active
โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ
3 agents total ยท 1 critical ยท 1 unreviewed ยท 1 governed
CI/CD Integration
# .github/workflows/agent-scan.yml
- name: Scan for AI Agents
run: |
pip install agent-discover-scanner
agent-discover-f_file: results.sarif
Commands
# Full scan (recommended) โ all 4 layers + correlation
agent-discover-scanner scan-all PATH [OPTIONS]
--duration/-d SECONDS Network and K8s monitor observation window [default: 60]
--output/-o PATH Output directory for scan results [default: defendai-results]
--format/-f TEXT Output format: text|json|sarif [default: text]
--layer3-file PATH Use existing Tetragon JSONL output (skip live Layer 3)
-skip-layers TEXT Comma-separated layers to skip, e.g. '3' or '2,3'
--daemon Run continuously, re-scanning every 30 seconds
--platform Upload results to DefendAI platform after scan
--api-key TEXT DefendAI platform API key
--tenant-token TEXT DefendAI platform tenant token
--wawsdb-url TEXT DefendAI platform base URL [default: https://wauzeway.defendai.ai]
--platform-interval INT Upload every N correlation cycles in daemon mode [default: 5]
--max-log-size INT Rotate output files at this size in MB [default: 50]
--max-log-backups INT Rotated backup files to keep [default: 5]
# Individual layers
agent-discover-scanner scan PATH # Layer 1: source code only
agent-discover-scanner deps PATH # Dependency scanning
agent-discover-scanner monitor # Layer 2: network monitor only
agent-discover-scanner monitor-k8s # Layer 3: Kubernetes runtime only
agent-discover-scanner endpoint # Layer 4: endpoint scan only
agent-discover-scanner correlate # Correlate existing scan outputs
Detected Frameworks & Providers
AI Frameworks: LangChain, LangGraph, CrewAI, AutoGen, direct HTTP LLM clients
LLM Providers: OpenAI, Anthropic, Google Gemini / Google AI, Mistral, Cohere, Azure OpenAI, AWS Bedrock, Groq, DeepSeek
Vector Stores: Pinecone, Weaviate, Qdrant, Chroma
SaaS Blast Radius Detection (v2.3.0+): Salesforce, Slack, GitHub, GitLab, Jira, HubSpot, Notion, Airtable, Stripe, Twilio, Snowflake, Databricks, AWS, GCP, Azure, PostgreSQL, Redis, MongoDB
Try the Demo
git clone https://github.com/Defend-AI-Tech-Inc/agent-discover-scanner
cd agent-discover-scanner/demo
./setup.sh # deploys LangChain, CrewAI, and a shadow agent to local Kubernetes
agent-discover-scanner scan-all ./sample-repo --duration 60
Expected output: 2 CONFIRMED agents (crewai-agent, langchain-agent), 1 GHOST agent (shadow-agent โ ntime activity, no source code).
Requirements
| Capability | Requirement |
|---|---|
| Code scanning | Python 3.10+, no additional dependencies |
| Network monitoring | Python 3.10+, root/sudo |
| Kubernetes runtime | kubectl, Helm 3+, root/sudo |
| Endpoint discovery | Python 3.10+, osquery (optional โ graceful degradation) |
| Platform upload | DefendAI API key ([defendai.aihttps://defendai.ai)) |
Full Kubernetes setup: install.sh handles Helm, runtime monitoring setup, and permissions automatically.
DefendAI Platform
AgentDiscover Scanner is the discovery layer of the DefendAI platform.
| Component | Status | Description |
|---|---|---|
| AgentDiscover Scanner | โ Open Source | Discover and classify AI agents across your environment |
| defendai-agent | ๐งช Beta | MITM proxy for real-time AI traffic inspection and policy enforcement |
| Correlation Engine | โ Available | Cross-machine identity resolution and behavioral drift detection |
| Policy Engine | ๐ง Coming Soon | Define and enforce agent behavior rules |
| DefendAI Platform | ๐ผ Enterprise | Full lifecycle governance for autonomous AI |
defendai.ai ยท playground.defendai.ai ยท [support@defendai.ai](mailto:support@defen
Contributing
git clone https://github.com/Defend-AI-Tech-Inc/agent-discover-scanner.git
cd agent-discover-scanner
uv sync
uv run pytest tests/ -v
See CONTRIBUTING.md for guidelines. Issues and PRs welcome.
License
MIT โ free to use, deploy, and modify.
Built by DefendAI ยท Securing the future of autonomous AI
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file agent_discover_scanner-2.4.0.tar.gz.
File metadata
- Download URL: agent_discover_scanner-2.4.0.tar.gz
- Upload date:
- Size: 168.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
70a2d02e19ca636c02a8657a74067db57891d7332cfe9e72df628fabe2b31d14
|
|
| MD5 |
af3dca0b23924c7ba6b3cb3850730202
|
|
| BLAKE2b-256 |
882c21e01974433470f86ff64ba514d2e8e4ce9290872b2d3fe1d932e454c5ce
|
File details
Details for the file agent_discover_scanner-2.4.0-py3-none-any.whl.
File metadata
- Download URL: agent_discover_scanner-2.4.0-py3-none-any.whl
- Upload date:
- Size: 95.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b00939ef47645c8a2e16f3fec7fc4d44d0e90e4d2b686703e3ee458081f9956f
|
|
| MD5 |
6448be0737a000113c95b5abb6b474c3
|
|
| BLAKE2b-256 |
cf22a716113054082eea3fc4e24b4d929d0a2f246991e7876e1d5f3ee0a98e1f
|