Comprehensive analysis system for VS Code/Copilot Chat sessions with behavioral signal extraction and heat scoring
Project description
VS Code Ark
A comprehensive data pipeline and analysis system for VS Code/Copilot Chat sessions. Extract behavioral signals, compute heat scores, and gain deep insights into human-AI interaction patterns.
โจ Features
- Behavioral Signal Analysis - Extract 200+ keywords across 6 signal types (corrections, frustrations, affirmations, etc.)
- Heat Score Computation - Quantify user frustration and agent performance (0-100 scale)
- Real-time Monitoring - Live sync daemon with crash-resistant queue system
- Full-text Search - FTS5-powered search across all conversations
- Semantic Intelligence - miniLM embeddings, session summaries, related sessions, anomaly alerts, and recommendations
- Code Symbol Indexing - AST-backed symbol extraction for Python/JS/TS and content search across VFS blobs
- Incremental Sync - Watcher-driven session refreshes keep embeddings and session insight current as chat and tool outputs change
- Package-centric Layout - All runtime code lives under
vscode_ark/for a clean root. - Policy-based Access Control - Allow/deny patterns for data filtering
- Rich Analytics - Token usage, context compaction, session recovery analysis
- Export Capabilities - JSON, JSONL, and text export formats
- Professional CLI - Comprehensive command-line interface with 25+ commands
๐ Table of Contents
- Installation
- Quick Start
- Architecture
- CLI Reference
- Data Analysis
- Configuration
- Development
- Contributing
- License
๐ Installation
Prerequisites
- Python 3.8+
- VS Code with Copilot Chat extension
From Source
git clone https://github.com/yourusername/vscode-ark.git
cd vscode-ark
pip install -e .
With Development Dependencies
pip install -e ".[dev]"
From PyPI (Future)
pip install vscode-ark
โก Quick Start
-
Initialize the database:
cda sync -
Start live monitoring:
cda watch start
The watcher keeps VS Code updates, code symbols, and embeddings in sync.
-
Build semantic intelligence:
cda embed build
-
Explore your data:
cda stats # System overview cda sessions # Recent sessions cda serve # Start the local web UI on port 10001 (dashboard, heat analytics, keywords, alerts, and session drilldown) cda search "error" # Search conversations cda code-search "todo" --regex # Search code content cda code-search "def process" --symbol # Search code symbols cda semantic-search "confused" # Semantic search cda related <session> # Find related sessions cda summarize <session> # Session summary and recommendations cda heat # Frustration analysis
๐ง SQLite limits and mitigation
- Single writer in WAL mode: the system uses one writer process for ingest/reconstruct/extract/embed and allows many concurrent readers via SQLite WAL.
- Large VFS blob handling: for very large raw artifacts, the clean approach is chunked storage or external file references instead of a single enormous BLOB.
- Default 8KB page size / cache: this code now sets
PRAGMA cache_size=-2000,PRAGMA mmap_size=268435456, andPRAGMA temp_store=MEMORYto improve read/cache performance on larger databases. - Further tuning: rebuild the DB with a larger page size (e.g.
PRAGMA page_size=32768) if you need more efficient storage for very large session history.
๐ง Configuration
- VS Code Data Directory: By default, assumes macOS paths (
~/Library/Application Support/Code/User). Override withexport VSCODE_DATA_DIR=/path/to/vscode/data(e.g., on Linux:~/.config/Code/User). - No other config needed: Everything is CLI-driven with local SQLite.
๐๏ธ Architecture
VS Code Storage โ ingest.py โ vfs + sessions + transcripts
โ
reconstruct.py โ exchanges (structured conversations)
โ
extract.py โ signals + tokens + heat scores + analysis
โ
embed.py โ semantic embeddings + summaries + alerts
โ
watcher.py โ live sync + FTS indexing + queue resilience
โ
cda โ query interface + policy enforcement
Core Components
| Component | Purpose | Key Features |
|---|---|---|
| ingest.py | Data ingestion | VFS storage, gzip compression, session metadata |
| reconstruct.py | Conversation processing | Exchange threading, tool call linking, FTS indexing |
| extract.py | Signal analysis | Behavioral pattern recognition, heat scoring, token accounting |
| watcher.py | Live monitoring | File watching, incremental updates, crash recovery |
| cda | Query interface | 25+ commands, policy filtering, rich formatting |
Database Schema
- workspaces - VS Code workspace metadata
- sessions - Chat session information and metadata
- vfs - Gzip-compressed file storage with SHA256 hashes
- exchanges - Structured conversation turns with tool calls
- exchange_signals - Behavioral signal annotations
- symbols - Code symbol index (functions, classes, etc.)
- token_usage - Per-request token consumption tracking
- compactions - Context window summarization events
- session_analysis - Aggregated session metrics and heat scores
๐ฅ๏ธ CLI Reference
Core Commands
# System Management
cda status # Show daemon status and queue information
cda stats # System-wide statistics and coverage
cda sync # Full data ingestion and rebuild
cda reconstruct # Rebuild conversations and search index
# Session Analysis
cda sessions # List all sessions (newest first)
cda session <id> # Show detailed session information
cda workspace <id> # Show sessions for a workspace
cda workspaces # List all workspaces
# Search & Query
cda search <query> # Full-text search across conversations
cda code-search <pattern> [--symbol] [--regex] # Search code symbols or code content
cda semantic-search <query> # Semantic search using embeddings
cda similar <session> # Find sessions similar to a session
cda related <session> # Alias for semantic related sessions
cda summarize <session> # Show session summary, topics, and recommendations
cda topics # List semantic topic tags
cda alerts <session> # Show semantic anomaly alerts
cda recommend <session> # Show session recommendations
cda tools <query> # Search tool call arguments
cda memory # Show memory files and global state
# Behavioral Analysis
cda signals [session] # Show behavioral signals
cda heat [session] # Frustration and heat analysis
cda behavior # Aggregate behavioral intelligence
cda saved # Sessions that recovered from high heat
# Data Export
cda export <session> # Export session as JSON/JSONL/text
cda replay <session> # Print conversation as readable text
# Advanced
cda query <sql> # Execute raw SQL queries
cda tokens [session] # Token usage analysis
cda compactions [session] # Context compaction events
cda edits # Edit session analytics
# Policy Management
cda policy allow <pattern> # Add allow pattern
cda policy deny <pattern> # Add deny pattern
cda policy list # Show current policies
# Live Monitoring
cda watch start # Start watcher daemon
cda watch stop # Stop watcher daemon
cda watch restart # Restart watcher daemon
Command Examples
# Search for error handling discussions
cda search "error handling" --limit 20
# Find sessions with high frustration
cda heat --limit 10
# Search for specific functions in code
cda code-search "def process_data" --symbol
# Search code content with regex or plain text
cda code-search "timeout" --regex
# Find semantically related sessions
cda related abc123
# Summarize a session with semantic topics and recommendations
cda summarize abc123
# Export a session for external analysis
cda export abc123 --format jsonl --output session.jsonl
# Monitor live sessions
cda watch start
cda status # Check queue status
๐ Data Analysis
Behavioral Signals
The system recognizes 6 signal types with 200+ keyword patterns:
| Signal Type | Weight | Description | Example Keywords |
|---|---|---|---|
| correction | 3 | User correcting agent behavior | "stop", "wrong", "nope", "wait" |
| pre_correction | 2 | Early frustration signs | "actually", "hold on", "slow down" |
| redirect | 1 | User changing direction | "pivot", "change direction", "instead" |
| affirmation | 0 | Positive feedback | "good", "right", "perfect", "thanks" |
| approval | 0 | Task completion approval | "that works", "looks good", "approved" |
| frustration | 5 | Strong negative signals | "this is broken", "not working", "terrible" |
Heat Score Algorithm
Heat Score = min(100, ฮฃ(signal_weights))
- Peak Heat: Maximum heat reached in session
- Final Heat: Heat at session end
- Recovery: Sessions that return to low heat after high peaks
- Saved Sessions: High-heat sessions that recover with affirmations
Token Usage Tracking
- Per-request token consumption (prompt + completion)
- Model identification and version tracking
- Context compaction event logging
- Cost estimation capabilities
โ๏ธ Configuration
Automatic Detection
VS Code Ark automatically detects paths using standard locations:
- macOS:
~/Library/Application Support/Code/User/ - Windows:
%APPDATA%\Code\User\ - Linux:
~/.config/Code/User/
Environment Variables
export VSCODE_ARK_DB=/path/to/custom.db # Custom database location
export VSCODE_ARK_CONFIG=/path/to/config # Custom config directory
Policy Configuration
Data access policies are stored in policy.txt:
ALLOW important-project
DENY sensitive-data
ALLOW *.py
๐ง Development
Setup Development Environment
make install-dev
Running Tests
make test # Run test suite
make test-cov # Run with coverage report
Code Quality
make lint # Run flake8 and mypy
make format # Format with black and isort
Building
make build # Build distribution packages
make publish # Publish to PyPI (requires credentials)
Project Structure
vscode-ark/
โโโ vscode_ark/ # Main package
โ โโโ __init__.py
โ โโโ cli.py # Command-line interface
โโโ scripts/ # Utility scripts
โ โโโ ingest.py # Data ingestion
โ โโโ reconstruct.py # Conversation processing
โ โโโ extract.py # Signal analysis
โ โโโ watcher.py # Live monitoring
โโโ tests/ # Test suite
โโโ docs/ # Documentation
โโโ pyproject.toml # Package configuration
โโโ setup.py # Legacy setup
โโโ Makefile # Development tasks
โโโ README.md # This file
๐ค Contributing
- Fork the repository
- Create a feature branch:
git checkout -b feature/amazing-feature - Make your changes and add tests
- Run the test suite:
make test - Format code:
make format - Commit your changes:
git commit -m 'Add amazing feature' - Push to the branch:
git push origin feature/amazing-feature - Open a Pull Request
Development Guidelines
- Type Hints: All functions should have type annotations
- Docstrings: Comprehensive docstrings for public APIs
- Tests: Unit tests for all new functionality
- Linting: Code must pass flake8 and mypy checks
- Formatting: Code must be formatted with black and isort
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Acknowledgments
- Built for analyzing VS Code/Copilot Chat interaction patterns
- Inspired by the need for better human-AI interaction insights
- Uses SQLite FTS5 for high-performance full-text search
- Implements behavioral signal processing for conversation analysis
VS Code Ark - Understanding the human side of AI conversations.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file vscode_ark-0.1.2.tar.gz.
File metadata
- Download URL: vscode_ark-0.1.2.tar.gz
- Upload date:
- Size: 132.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cfa69137775c10e24648d2e2b08c89358645ff42e4de18e2b076fc9f5d4b77eb
|
|
| MD5 |
d9332daf20f9736f3dc8e3efa2a68fcc
|
|
| BLAKE2b-256 |
2088d47ef7c62e6edf752ea47615f488738aeb4a52ce89b5538d5bcc8bb0638e
|
File details
Details for the file vscode_ark-0.1.2-py3-none-any.whl.
File metadata
- Download URL: vscode_ark-0.1.2-py3-none-any.whl
- Upload date:
- Size: 123.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.14.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c2a96ef094da8229f8e65f248131cf541d5015a3a8adcf56b1af9f5cb911f63f
|
|
| MD5 |
529fcea56895605a4ccd799b3029bb5f
|
|
| BLAKE2b-256 |
d20fb9413f06843656d7d9d4d3e93a99bcd490a5736412ae4866a0cda9626c04
|