Skip to main content

Discover and inventory autonomous AI agents across your infrastructure - static analysis, runtime detection, and Kubernetes monitoring

Project description

AgentDiscover Scanner

Open-Source AI Agent Discovery for the Enterprise

License: MIT Python 3.10+ PyPI PRs Welcome

Part of the DefendAI platform for autonomous AI governance


The Finding That Matters

๐Ÿ‘ป GHOST AGENT DETECTED
   Workload:   trading-bot (Deployment/default)
   Connected:  api.openai.com โ€” LIVE
   SaaS:       openai โ€” confirmed active connection
   Source code: None found in scanned repositories
   Owner:      Unknown โ€” no deployment record, no code review

๐Ÿ‘ป GHOST AGENT DETECTED
   Workload:   shadow-agent (Pod/kube-system)
   Connected:  api.anthropic.com โ€” LIVE
   SaaS:       anthropic โ€” confirmed  |  gcp โ€” active socket
   Blast radius: HIGH (cloud provider access confirmed)
   Source code: None found in scanned repositories
   Owner:      Unknown โ€” no deployment record, no code review

An AI system is making real API calls โ€” consuming tokens, potentially accessing sensitive data โ€” and your engineering team has no record of it. No code, no deployment, no owner. AgentDiscover Scanner finds these in under 60 seconds.

That's the problem. Your engineering team thinks they know what AI systems are running. They don't.


What Makes This Different

Most security tools tell you what's in your code. AgentDiscover Scanner tells you what's actually running โ€” and crucially, what's running that has no business being there.

The GHOST classification is unique: an AI system observed making real API calls with zero corresponding source code. No other static analysis tool can find this. No SIEM will alert on it. It only appears when you watch the runtime and cross-reference it against your codebase simultaneously.

As of v2.3.0, every detected agent also carries a SaaS blast radius โ€” a live-observed map of which services it's actively connected to, derived from network traffic not just configuration files.

crewai-agent (CONFIRMED)
  saas_connections:
    anthropic: confirmed  โ† active_connection observed
    github:    medium     โ† open socket
  risk_flags: [cloud_credentials_present]
  blast_radius: 70/100

Agent Classifications

Classification What It Means Risk
๐Ÿ‘ป GHOST Runtime AI activity โ€” no source code found Critical
โœ… CONFIRMED Detected in code AND observed running High
โš ๏ธ UNKNOWN Found in code, not yet observed at runtime Medium
๐Ÿ–ฅ๏ธ SHADOW AI Known app using AI without governance Medium
โ˜ ๏ธ ZOMBIE Was active, no longer observed Low

GHOST agents are the most dangerous finding. An AI system is making real API calls โ€” consuming tokens, potentially accessing sensitivneering team has no record of it. No code, no deployment, no owner.


Quick Start

pip install agent-discover-scanner
agent-discover-scanner scan-all /path/to/your/code --duration 30

For Kubernetes environments:

curl -fsSL https://raw.githubusercontent.com/Defend-AI-Tech-Inc/agent-discover-scanner/main/install.sh | sudo bash
agent-discover-scanner scan-all /path/to/code --daemon --output /var/log/defendai

To upload results to the DefendAI platform:

agent-discover-scanner scan-all /path/to/code \
  --platform \
  --api-key YOUR_API_KEY

How It Works

AgentDiscover Scanner runs four detection layers simultaneously and correlates them into a single agent inventory. Each layer sees something the others can't.

Layer 1 โ€” Source Code Analysis

Static analysis of Python and JavaScript/TypeScript. Detects LangChain, LangGraph, CrewAI, AutoGen, direct OpenAI/Anthropic/Gemini API usage, and any HTTP client targeting LLM endpoints. Handles import aliasing and iirect usage patterns. Generates SARIF output for CI/CD integration.

Layer 2 โ€” Live Network Monitoring

Passive observation of outbound connections to AI providers โ€” OpenAI, Anthropic, Google Gemini, Mistral, Cohere, Azure OpenAI, AWS Bedrock, and vector stores. No packet capture. Identifies which process is making each connection, enabling per-agent SaaS attribution.

Layer 3 โ€” Kubernetes Runtime (eBPF)

Kernel-level visibility into pod behavior via Tetragon. Identifies which workloads are actively making AI calls โ€” including workloads with no corresponding source code. Works with any CNI. Falls back to Kubernetes API discovery if Tetragon is unavailable.

Layer 4 โ€” Endpoint Discovery

Scans developer machines, CI/CD runners, and workstations via osquery. Finds installed AI packages, desktop AI applications (ChatGPT Desktop, Claude Desktop, Cursor, GitHub Copilot), active connections, browser-based AI usage, and VSCode extensions.

SaaS Blast Radius Detection (v2.3.0+)

After correlation, each agent receives a saas_connections profile built from all four layers:

{
  "detected":  ["anthropic", "gcp", "github"],
  "confirmed": ["anthropic"],
  "evidence": {
    "anthropic": ["active_connection", "open_socket"],
    "gcp":       ["open_socket"],
    "github":    ["vscode_extension_detected"]
  },
  "confidence": {
    "anthropic": "confirmed",
    "gcp":       "medium",
    "github":    "medium"
  },
  "has_cloud_provider": true,
  "has_llm_provider":   true
}

confirmed means the connection was live-observed during the scan โ€” not inferred from config files. This is the difference between "this agent is configured to use Anthropic" and "this agent is calling Anthropic right now."


Example Output

๐Ÿ” Scanning for autonomous AI agents...
๐Ÿ“‚ Analyzing source code at ./my-repo
๐ŸŒ Monitoring live network connections...
โ˜ธ๏ธ  Monitoring Kubernetes workloads...
๐Ÿ’ป Scanning endpoints...
๐Ÿ”— Correlating findings...
โœ“ Correlation complete

๐Ÿค– Autonomous Agent Inventory

 Classification  | Count | Description
-----------------|-------|--------------------------------------------------
 CONFIRMED       |   2   | Active โ€” detected in code and observed at runtime
 UNKNOWN         |   3   | Code found โ€” not yet observed at runtime
 SHADOW AI       |   0   | Known app using AI โ€” review for governance
 ZOMBIE          |   0   | Inactive โ€” code exists but no recent activity
 GHOST           |   1   | โš  Critical โ€” runtime activity with no source code

Risk Breakdown:
  โ— Critical: 1
  โ— High:     2
  โ— Medium:   3
  โ— Low:      0

โœ… Scan complete โ€” results saved to ./defendai-results

Daemon Mode

Run continuously as a background service, updating the agent inventory every 30 seconds:

agent-discover-scanner scan-all /path/to/code \
  --daemon \
  --output /var/log/defendai \
  --platform \
  --platform-interval 5    # upload to platform every ~2.5 minutes

With --platform, the daemon syncs to the DefendAI platform every N correlation cycles (default: every 5 cycles โ‰ˆ 2.5 minutes) and always uploads a final snapshot on shutdown.

Install as a systemd service:

sudo bash deployment/systemd/install-service.sh /path/to/code
systemctl status defendai-scanner

Customizing Known Applications

By default the scanner classifies common desktop applications (browsers, Office 365, Cursor, Slack, Claude Desktop, etc.) as Shadow AI rather than GHOST when they make AI API calls. To add your own internal tools:

mkdir -p ~/.defendai
echo "my-internal-ai-tool" >> ~/.defendai/known_apps.txt
echo "company-llm-client" >> ~/.defendai/known_apps.txt

See docs/known-apps-example.txt for the full format.

When connected to the DefendAI platform (--platform flag), the tenant-managed list is downloaded automatically on startup and merged with your local overrides.


DefendAI Platform Integration

The scanner is the discovery layer. The platform is where discovered agents become governed agents.

agent-discover-scanner scan-all /path/to/code \
  --platform \
  --api-key YOUR_KEY \
  --duration 30

When connected to the platform, each scan triggers the correlation engine which builds a living identity map across every machine, every environment, and every scan:

  • Agent Identity Resolution โ€” the same CrewAI agent on a laptop, in staging k8s, and in prod k8s is recognized as one agent at different lifecycle stages
  • Behavioral Drift Detection โ€” agent added has_code_execution=true since last week? That's a signal. Platform tracks it.
  • Cross-Machine Intelligence รขagent seen on 3 machines and crossed from dev into prod? Automatic risk escalation
  • SaaS Blast Radius โ€” platform aggregates confirmed SaaS connections across all scans and computes blast radius score

After aew scans, the DefendAI platform report shows:

Agent Inventory Report โ€” acme-corp
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 shadow-agent    GHOST     CRITICAL   anthropic, github   blast: 85   machines: 3
                           โ†‘ GHOST seen in prction required

 crewai-agent    SHADOW    MEDIUM     openai              blast: 25   machines: 1
                           โ†‘ Unreviewed โ€” no governance record

 langchain-agent KNOWN     LOW        openai              blast: 15   machines: 1
                           โ†‘ Approved โ€” monitoring active
โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€โ”€
 3 agents total ยท 1 critical ยท 1 unreviewed ยท 1 governed

CI/CD Integration

# .github/workflows/agent-scan.yml
- name: Scan for AI Agents
  run: |
    pip install agent-discover-scanner
    agent-discover-f_file: results.sarif

Commands

# Full scan (recommended) โ€” all 4 layers + correlation
agent-discover-scanner scan-all PATH [OPTIONS]
  --duration/-d SECONDS      Network and K8s monitor observation window [default: 60]
  --output/-o PATH           Output directory for scan results [default: defendai-results]
  --format/-f TEXT           Output format: text|json|sarif [default: text]
  --layer3-file PATH         Use existing Tetragon JSONL output (skip live Layer 3)
 -skip-layers TEXT         Comma-separated layers to skip, e.g. '3' or '2,3'
  --daemon                   Run continuously, re-scanning every 30 seconds
  --platform                 Upload results to DefendAI platform after scan
  --api-key TEXT             DefendAI platform API key
  --tenant-token TEXT        DefendAI platform tenant token
  --wawsdb-url TEXT          DefendAI platform base URL [default: https://wauzeway.defendai.ai]
  --platform-interval INT    Upload every N correlation cycles in daemon mode [default: 5]
  --max-log-size INT         Rotate output files at this size in MB [default: 50]
  --max-log-backups INT      Rotated backup files to keep [default: 5]

# Individual layers
agent-discover-scanner scan PATH              # Layer 1: source code only
agent-discover-scanner deps PATH              # Dependency scanning
agent-discover-scanner monitor                # Layer 2: network monitor only
agent-discover-scanner monitor-k8s            # Layer 3: Kubernetes runtime only
agent-discover-scanner endpoint               # Layer 4: endpoint scan only
agent-discover-scanner correlate              # Correlate existing scan outputs

Detected Frameworks & Providers

AI Frameworks: LangChain, LangGraph, CrewAI, AutoGen, direct HTTP LLM clients

LLM Providers: OpenAI, Anthropic, Google Gemini / Google AI, Mistral, Cohere, Azure OpenAI, AWS Bedrock, Groq, DeepSeek

Vector Stores: Pinecone, Weaviate, Qdrant, Chroma

SaaS Blast Radius Detection (v2.3.0+): Salesforce, Slack, GitHub, GitLab, Jira, HubSpot, Notion, Airtable, Stripe, Twilio, Snowflake, Databricks, AWS, GCP, Azure, PostgreSQL, Redis, MongoDB


Try the Demo

git clone https://github.com/Defend-AI-Tech-Inc/agent-discover-scanner
cd agent-discover-scanner/demo
./setup.sh    # deploys LangChain, CrewAI, and a shadow agent to local Kubernetes
agent-discover-scanner scan-all ./sample-repo --duration 60

Expected output: 2 CONFIRMED agents (crewai-agent, langchain-agent), 1 GHOST agent (shadow-agent โ€” ntime activity, no source code).


Requirements

Capability Requirement
Code scanning Python 3.10+, no additional dependencies
Network monitoring Python 3.10+, root/sudo
Kubernetes runtime kubectl, Helm 3+, root/sudo
Endpoint discovery Python 3.10+, osquery (optional โ€” graceful degradation)
Platform upload DefendAI API key ([defendai.aihttps://defendai.ai))

Full Kubernetes setup: install.sh handles Helm, runtime monitoring setup, and permissions automatically.


DefendAI Platform

AgentDiscover Scanner is the discovery layer of the DefendAI platform.

Component Status Description
AgentDiscover Scanner โœ… Open Source Discover and classify AI agents across your environment
defendai-agent ๐Ÿงช Beta MITM proxy for real-time AI traffic inspection and policy enforcement
Correlation Engine โœ… Available Cross-machine identity resolution and behavioral drift detection
Policy Engine ๐Ÿšง Coming Soon Define and enforce agent behavior rules
DefendAI Platform ๐Ÿ’ผ Enterprise Full lifecycle governance for autonomous AI

defendai.ai ยท playground.defendai.ai ยท [support@defendai.ai](mailto:support@defen

Contributing

git clone https://github.com/Defend-AI-Tech-Inc/agent-discover-scanner.git
cd agent-discover-scanner
uv sync
uv run pytest tests/ -v

See CONTRIBUTING.md for guidelines. Issues and PRs welcome.


License

MIT โ€” free to use, deploy, and modify.


Built by DefendAI ยท Securing the future of autonomous AI

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

agent_discover_scanner-2.4.0.tar.gz (168.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

agent_discover_scanner-2.4.0-py3-none-any.whl (95.7 kB view details)

Uploaded Python 3

File details

Details for the file agent_discover_scanner-2.4.0.tar.gz.

File metadata

  • Download URL: agent_discover_scanner-2.4.0.tar.gz
  • Upload date:
  • Size: 168.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_discover_scanner-2.4.0.tar.gz
Algorithm Hash digest
SHA256 70a2d02e19ca636c02a8657a74067db57891d7332cfe9e72df628fabe2b31d14
MD5 af3dca0b23924c7ba6b3cb3850730202
BLAKE2b-256 882c21e01974433470f86ff64ba514d2e8e4ce9290872b2d3fe1d932e454c5ce

See more details on using hashes here.

File details

Details for the file agent_discover_scanner-2.4.0-py3-none-any.whl.

File metadata

  • Download URL: agent_discover_scanner-2.4.0-py3-none-any.whl
  • Upload date:
  • Size: 95.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.9.18 {"installer":{"name":"uv","version":"0.9.18","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"macOS","version":null,"id":null,"libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for agent_discover_scanner-2.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b00939ef47645c8a2e16f3fec7fc4d44d0e90e4d2b686703e3ee458081f9956f
MD5 6448be0737a000113c95b5abb6b474c3
BLAKE2b-256 cf22a716113054082eea3fc4e24b4d929d0a2f246991e7876e1d5f3ee0a98e1f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page