Skip to main content

Comprehensive analysis system for VS Code/Copilot Chat sessions with behavioral signal extraction and heat scoring

Project description

VS Code Ark

Python Version PyPI License: MIT

A complete analysis system for VS Code + Copilot Chat sessions that turns raw editor activity into behavioral signals, semantic intelligence, and a local web dashboard.

โœจ Key Benefits

  • Behavioral signal intelligence for Copilot Chat sessions.
  • Heat scoring to surface friction, recovery points, and session quality.
  • Semantic search across session transcripts, code symbols, and tool calls.
  • Background web UI with structured panels, alerts, and session drilldown.
  • Live watcher daemon to keep session analytics current.
  • Exportable data for JSON, JSONL, and text workflows.

๐Ÿ“‹ Table of Contents

๐Ÿš€ Installation

Prerequisites

  • Python 3.8+
  • VS Code with the Copilot Chat extension installed

Install from PyPI

pip install vscode-ark

Install with pipx

pipx install vscode-ark

Install from source

git clone https://github.com/goCosmix/vscode-ark.git
cd vscode-ark
pip install -e .

Install development dependencies

pip install -e ".[dev]"
# or
make install-dev

The cda console command is installed into your active Python environment's bin directory. Activate your virtual environment before running cda.

โšก Quick Start

  1. Initialize the database
cda sync
  1. Start the watcher daemon
cda watch start
  1. Inspect the PMF runtime services
cda pmf services
  1. Build semantic intelligence
cda embed build
  1. Start the web UI
cda ui start
  1. Open your browser

Visit http://127.0.0.1:10001

๐ŸŒ Web UI

  • Background service: cda ui start
  • Stop service: cda ui stop
  • Service status: cda ui status
  • Foreground mode: cda serve

The web UI includes:

  • Session drilldown panels and charts
  • Behavioral signal summaries
  • Alert and recommendation views
  • Searchable transcript and tool-call detail
  • File/VFS browsing and raw session inspection

๐Ÿง  Core Features

  • Behavioral signals with 200+ keyword patterns across six categories
  • Frustration heat scoring and recovery analytics
  • Full-text search and semantic search with embeddings
  • Code symbol indexing for Python/JS/TS
  • Incremental ingestion with crash-resilient queue replay
  • Export workflows for JSON, JSONL, and text

๐Ÿ“ฆ Package and Release

  • Published on PyPI as vscode-ark
  • Current release version: 0.1.2
  • CLI entry point: cda
  • License: MIT

๐Ÿ›ฃ Roadmap

See docs/ROADMAP.md for product direction, milestone planning, and release priorities.

๐Ÿค Contributing

See CONTRIBUTING.md for development setup, test guidance, and PR workflow.

๐Ÿ“œ License

This project is licensed under the MIT License.

๐Ÿง  SQLite limits and mitigation

  • Single writer in WAL mode: the system uses one writer process for ingest/reconstruct/extract/embed and allows many concurrent readers via SQLite WAL.
  • Large VFS blob handling: for very large raw artifacts, the clean approach is chunked storage or external file references instead of a single enormous BLOB.
  • Default 8KB page size / cache: this code now sets PRAGMA cache_size=-2000, PRAGMA mmap_size=268435456, and PRAGMA temp_store=MEMORY to improve read/cache performance on larger databases.
  • Further tuning: rebuild the DB with a larger page size (e.g. PRAGMA page_size=32768) if you need more efficient storage for very large session history.

๐Ÿ”ง Configuration

  • VS Code Data Directory: By default, assumes macOS paths (~/Library/Application Support/Code/User). Override with export VSCODE_DATA_DIR=/path/to/vscode/data (e.g., on Linux: ~/.config/Code/User).
  • No other config needed: Everything is CLI-driven with local SQLite.

๐Ÿ—๏ธ Architecture

VS Code Storage โ†’ ingest.py โ†’ vfs + sessions + transcripts
                      โ†“
               reconstruct.py โ†’ exchanges (structured conversations)
                      โ†“
               extract.py โ†’ signals + tokens + heat scores + analysis
                      โ†“
               embed.py โ†’ semantic embeddings + summaries + alerts
                      โ†“
               watcher.py โ†’ live sync + FTS indexing + queue resilience
                      โ†“
               cda โ†’ query interface + policy enforcement

Core Components

Component Purpose Key Features
ingest.py Data ingestion VFS storage, gzip compression, session metadata
reconstruct.py Conversation processing Exchange threading, tool call linking, FTS indexing
extract.py Signal analysis Behavioral pattern recognition, heat scoring, token accounting
watcher.py Live monitoring File watching, incremental updates, crash recovery
cda Query interface 25+ commands, policy filtering, rich formatting

Database Schema

  • workspaces - VS Code workspace metadata
  • sessions - Chat session information and metadata
  • vfs - Gzip-compressed file storage with SHA256 hashes
  • exchanges - Structured conversation turns with tool calls
  • exchange_signals - Behavioral signal annotations
  • symbols - Code symbol index (functions, classes, etc.)
  • token_usage - Per-request token consumption tracking
  • compactions - Context window summarization events
  • session_analysis - Aggregated session metrics and heat scores

๐Ÿ–ฅ๏ธ CLI Reference

Core Commands

# System Management
cda status              # Show daemon status and queue information
cda stats               # System-wide statistics and coverage
cda sync                # Full data ingestion and rebuild
cda reconstruct         # Rebuild conversations and search index
cda pmf services        # List embedded PMF runtime services
cda pmf status [service] # Show runtime status for PMF services
cda pmf start <service>  # Start a PMF-managed Ark service
cda pmf stop <service>   # Stop a PMF-managed Ark service
cda pmf restart <service> # Restart a PMF-managed Ark service
cda pmf logs <service>   # Tail runtime logs for a PMF service

# Session Analysis
cda sessions            # List all sessions (newest first)
cda session <id>        # Show detailed session information
cda workspace <id>      # Show sessions for a workspace
cda workspaces          # List all workspaces

# Search & Query
cda search <query>      # Full-text search across conversations
cda code-search <pattern> [--symbol] [--regex]  # Search code symbols or code content
cda semantic-search <query> # Semantic search using embeddings
cda similar <session>     # Find sessions similar to a session
cda related <session>     # Alias for semantic related sessions
cda summarize <session>   # Show session summary, topics, and recommendations
cda topics                # List semantic topic tags
cda alerts <session>      # Show semantic anomaly alerts
cda recommend <session>   # Show session recommendations
cda tools <query>       # Search tool call arguments
cda memory              # Show memory files and global state

# Behavioral Analysis
cda signals [session]   # Show behavioral signals
cda heat [session]      # Frustration and heat analysis
cda behavior            # Aggregate behavioral intelligence
cda saved               # Sessions that recovered from high heat

# Data Export
cda export <session>    # Export session as JSON/JSONL/text
cda replay <session>    # Print conversation as readable text

# Advanced
cda query <sql>         # Execute raw SQL queries
cda tokens [session]    # Token usage analysis
cda compactions [session] # Context compaction events
cda edits               # Edit session analytics

# Policy Management
cda policy allow <pattern>   # Add allow pattern
cda policy deny <pattern>    # Add deny pattern
cda policy list              # Show current policies

# Live Monitoring
cda watch start             # Start watcher daemon
cda watch stop              # Stop watcher daemon
cda watch restart           # Restart watcher daemon
cda ui start                # Start web UI background service
cda ui stop                 # Stop web UI background service
cda ui status               # Show web UI background service status

Command Examples

# Search for error handling discussions
cda search "error handling" --limit 20

# Find sessions with high frustration
cda heat --limit 10

# Search for specific functions in code
cda code-search "def process_data" --symbol

# Search code content with regex or plain text
cda code-search "timeout" --regex

# Find semantically related sessions
cda related abc123

# Summarize a session with semantic topics and recommendations
cda summarize abc123

# Export a session for external analysis
cda export abc123 --format jsonl --output session.jsonl

# Monitor live sessions
cda watch start
cda status  # Check queue status

๐Ÿ“Š Data Analysis

Behavioral Signals

The system recognizes 6 signal types with 200+ keyword patterns:

Signal Type Weight Description Example Keywords
correction 3 User correcting agent behavior "stop", "wrong", "nope", "wait"
pre_correction 2 Early frustration signs "actually", "hold on", "slow down"
redirect 1 User changing direction "pivot", "change direction", "instead"
affirmation 0 Positive feedback "good", "right", "perfect", "thanks"
approval 0 Task completion approval "that works", "looks good", "approved"
frustration 5 Strong negative signals "this is broken", "not working", "terrible"

Heat Score Algorithm

Heat Score = min(100, ฮฃ(signal_weights))
  • Peak Heat: Maximum heat reached in session
  • Final Heat: Heat at session end
  • Recovery: Sessions that return to low heat after high peaks
  • Saved Sessions: High-heat sessions that recover with affirmations

Token Usage Tracking

  • Per-request token consumption (prompt + completion)
  • Model identification and version tracking
  • Context compaction event logging
  • Cost estimation capabilities

โš™๏ธ Configuration

Automatic Detection

VS Code Ark automatically detects paths using standard locations:

  • macOS: ~/Library/Application Support/Code/User/
  • Windows: %APPDATA%\Code\User\
  • Linux: ~/.config/Code/User/

Environment Variables

export VSCODE_ARK_DB=/path/to/custom.db    # Custom database location
export VSCODE_ARK_CONFIG=/path/to/config   # Custom config directory

Policy Configuration

Data access policies are stored in policy.txt:

ALLOW important-project
DENY sensitive-data
ALLOW *.py

๐Ÿ”ง Development

Setup Development Environment

make install-dev

Running Tests

make test              # Run test suite
make test-cov          # Run with coverage report

Code Quality

make lint              # Run flake8 and mypy
make format            # Format with black and isort

Building

make build             # Build distribution packages
make publish           # Publish to PyPI (requires credentials)

Project Structure

vscode-ark/
โ”œโ”€โ”€ vscode_ark/           # Main package
โ”‚   โ”œโ”€โ”€ __init__.py
โ”‚   โ””โ”€โ”€ cli.py           # Command-line interface
โ”œโ”€โ”€ scripts/             # Utility scripts
โ”‚   โ”œโ”€โ”€ ingest.py        # Data ingestion
โ”‚   โ”œโ”€โ”€ reconstruct.py   # Conversation processing
โ”‚   โ”œโ”€โ”€ extract.py       # Signal analysis
โ”‚   โ””โ”€โ”€ watcher.py       # Live monitoring
โ”œโ”€โ”€ tests/               # Test suite
โ”œโ”€โ”€ docs/                # Documentation
โ”œโ”€โ”€ pyproject.toml       # Package configuration
โ”œโ”€โ”€ setup.py            # Legacy setup
โ”œโ”€โ”€ Makefile            # Development tasks
โ””โ”€โ”€ README.md           # This file

๐Ÿค Contributing

  1. Fork the repository
  2. Create a feature branch: git checkout -b feature/amazing-feature
  3. Make your changes and add tests
  4. Run the test suite: make test
  5. Format code: make format
  6. Commit your changes: git commit -m 'Add amazing feature'
  7. Push to the branch: git push origin feature/amazing-feature
  8. Open a Pull Request

Development Guidelines

  • Type Hints: All functions should have type annotations
  • Docstrings: Comprehensive docstrings for public APIs
  • Tests: Unit tests for all new functionality
  • Linting: Code must pass flake8 and mypy checks
  • Formatting: Code must be formatted with black and isort

๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.

๐Ÿ™ Acknowledgments

  • Built for analyzing VS Code/Copilot Chat interaction patterns
  • Inspired by the need for better human-AI interaction insights
  • Uses SQLite FTS5 for high-performance full-text search
  • Implements behavioral signal processing for conversation analysis

VS Code Ark - Understanding the human side of AI conversations.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vscode_ark-2.0.0.tar.gz (141.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

vscode_ark-2.0.0-py3-none-any.whl (128.3 kB view details)

Uploaded Python 3

File details

Details for the file vscode_ark-2.0.0.tar.gz.

File metadata

  • Download URL: vscode_ark-2.0.0.tar.gz
  • Upload date:
  • Size: 141.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for vscode_ark-2.0.0.tar.gz
Algorithm Hash digest
SHA256 b96cf1e34708072dedc4803097503acba8196b7859229c0268ea91fc65bc8360
MD5 401bf90ecbd253c69cea5add98590ede
BLAKE2b-256 dda2c63336a7c8a9c77eced677fabd4cc627c1812f68b4a10804deb8cf4329bc

See more details on using hashes here.

File details

Details for the file vscode_ark-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: vscode_ark-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 128.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.4

File hashes

Hashes for vscode_ark-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 ad461fb2c5da4a16d53f67dbaf97d1946f0d80bb08a7ef1a83cd8af57a4ca6db
MD5 28a1815f1d48ca6b12f1ff7cfddedeec
BLAKE2b-256 daca4b5ca9944169e9e1b1b11ab9ff7648e59f3d26afc287990d711207c7c28b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page