Comprehensive analysis system for VS Code/Copilot Chat sessions with behavioral signal extraction and heat scoring

These details have not been verified by PyPI

Project links

Project description

VS Code Ark

A comprehensive data pipeline and analysis system for VS Code/Copilot Chat sessions. Extract behavioral signals, compute heat scores, and gain deep insights into human-AI interaction patterns.

✨ Features

Behavioral Signal Analysis - Extract 200+ keywords across 6 signal types (corrections, frustrations, affirmations, etc.)
Heat Score Computation - Quantify user frustration and agent performance (0-100 scale)
Real-time Monitoring - Live sync daemon with crash-resistant queue system
Full-text Search - FTS5-powered search across all conversations
Semantic Intelligence - miniLM embeddings, session summaries, related sessions, anomaly alerts, and recommendations
Code Symbol Indexing - AST-backed symbol extraction for Python/JS/TS and content search across VFS blobs
Incremental Sync - Watcher-driven session refreshes keep embeddings and session insight current as chat and tool outputs change
Package-centric Layout - All runtime code lives under vscode_ark/ for a clean root.
Policy-based Access Control - Allow/deny patterns for data filtering
Rich Analytics - Token usage, context compaction, session recovery analysis
Export Capabilities - JSON, JSONL, and text export formats
Professional CLI - Comprehensive command-line interface with 25+ commands

🚀 Installation

Prerequisites

Python 3.8+
VS Code with Copilot Chat extension

From Source

git clone https://github.com/yourusername/vscode-ark.git
cd vscode-ark
pip install -e .

With Development Dependencies

pip install -e ".[dev]"

From PyPI (Future)

pip install vscode-ark

⚡ Quick Start

Initialize the database:
```
cda sync
```
Start live monitoring:
```
cda watch start
```
The watcher keeps VS Code updates, code symbols, and embeddings in sync.
Build semantic intelligence:
```
cda embed build
```

Explore your data:

cda stats                    # System overview
cda sessions                 # Recent sessions
cda search "error"          # Search conversations
cda code-search "todo" --regex  # Search code content
cda code-search "def process" --symbol  # Search code symbols
cda semantic-search "confused" # Semantic search
cda related <session>        # Find related sessions
cda summarize <session>      # Session summary and recommendations
cda heat                     # Frustration analysis

🧠 SQLite limits and mitigation

Single writer in WAL mode: the system uses one writer process for ingest/reconstruct/extract/embed and allows many concurrent readers via SQLite WAL.
Large VFS blob handling: for very large raw artifacts, the clean approach is chunked storage or external file references instead of a single enormous BLOB.
Default 8KB page size / cache: this code now sets PRAGMA cache_size=-2000, PRAGMA mmap_size=268435456, and PRAGMA temp_store=MEMORY to improve read/cache performance on larger databases.
Further tuning: rebuild the DB with a larger page size (e.g. PRAGMA page_size=32768) if you need more efficient storage for very large session history.

🔧 Configuration

VS Code Data Directory: By default, assumes macOS paths (~/Library/Application Support/Code/User). Override with export VSCODE_DATA_DIR=/path/to/vscode/data (e.g., on Linux: ~/.config/Code/User).
No other config needed: Everything is CLI-driven with local SQLite.

🏗️ Architecture

VS Code Storage → ingest.py → vfs + sessions + transcripts
                      ↓
               reconstruct.py → exchanges (structured conversations)
                      ↓
               extract.py → signals + tokens + heat scores + analysis
                      ↓
               embed.py → semantic embeddings + summaries + alerts
                      ↓
               watcher.py → live sync + FTS indexing + queue resilience
                      ↓
               cda → query interface + policy enforcement

Core Components

Component	Purpose	Key Features
ingest.py	Data ingestion	VFS storage, gzip compression, session metadata
reconstruct.py	Conversation processing	Exchange threading, tool call linking, FTS indexing
extract.py	Signal analysis	Behavioral pattern recognition, heat scoring, token accounting
watcher.py	Live monitoring	File watching, incremental updates, crash recovery
cda	Query interface	25+ commands, policy filtering, rich formatting

Database Schema

workspaces - VS Code workspace metadata
sessions - Chat session information and metadata
vfs - Gzip-compressed file storage with SHA256 hashes
exchanges - Structured conversation turns with tool calls
exchange_signals - Behavioral signal annotations
symbols - Code symbol index (functions, classes, etc.)
token_usage - Per-request token consumption tracking
compactions - Context window summarization events
session_analysis - Aggregated session metrics and heat scores

🖥️ CLI Reference

Core Commands

# System Management
cda status              # Show daemon status and queue information
cda stats               # System-wide statistics and coverage
cda sync                # Full data ingestion and rebuild
cda reconstruct         # Rebuild conversations and search index

# Session Analysis
cda sessions            # List all sessions (newest first)
cda session <id>        # Show detailed session information
cda workspace <id>      # Show sessions for a workspace
cda workspaces          # List all workspaces

# Search & Query
cda search <query>      # Full-text search across conversations
cda code-search <pattern> [--symbol] [--regex]  # Search code symbols or code content
cda semantic-search <query> # Semantic search using embeddings
cda similar <session>     # Find sessions similar to a session
cda related <session>     # Alias for semantic related sessions
cda summarize <session>   # Show session summary, topics, and recommendations
cda topics                # List semantic topic tags
cda alerts <session>      # Show semantic anomaly alerts
cda recommend <session>   # Show session recommendations
cda tools <query>       # Search tool call arguments
cda memory              # Show memory files and global state

# Behavioral Analysis
cda signals [session]   # Show behavioral signals
cda heat [session]      # Frustration and heat analysis
cda behavior            # Aggregate behavioral intelligence
cda saved               # Sessions that recovered from high heat

# Data Export
cda export <session>    # Export session as JSON/JSONL/text
cda replay <session>    # Print conversation as readable text

# Advanced
cda query <sql>         # Execute raw SQL queries
cda tokens [session]    # Token usage analysis
cda compactions [session] # Context compaction events
cda edits               # Edit session analytics

# Policy Management
cda policy allow <pattern>   # Add allow pattern
cda policy deny <pattern>    # Add deny pattern
cda policy list              # Show current policies

# Live Monitoring
cda watch start             # Start watcher daemon
cda watch stop              # Stop watcher daemon
cda watch restart           # Restart watcher daemon

Command Examples

# Search for error handling discussions
cda search "error handling" --limit 20

# Find sessions with high frustration
cda heat --limit 10

# Search for specific functions in code
cda code-search "def process_data" --symbol

# Search code content with regex or plain text
cda code-search "timeout" --regex

# Find semantically related sessions
cda related abc123

# Summarize a session with semantic topics and recommendations
cda summarize abc123

# Export a session for external analysis
cda export abc123 --format jsonl --output session.jsonl

# Monitor live sessions
cda watch start
cda status  # Check queue status

📊 Data Analysis

Behavioral Signals

The system recognizes 6 signal types with 200+ keyword patterns:

Signal Type	Weight	Description	Example Keywords
correction	3	User correcting agent behavior	"stop", "wrong", "nope", "wait"
pre_correction	2	Early frustration signs	"actually", "hold on", "slow down"
redirect	1	User changing direction	"pivot", "change direction", "instead"
affirmation	0	Positive feedback	"good", "right", "perfect", "thanks"
approval	0	Task completion approval	"that works", "looks good", "approved"
frustration	5	Strong negative signals	"this is broken", "not working", "terrible"

Heat Score Algorithm

Heat Score = min(100, Σ(signal_weights))

Peak Heat: Maximum heat reached in session
Final Heat: Heat at session end
Recovery: Sessions that return to low heat after high peaks
Saved Sessions: High-heat sessions that recover with affirmations

Token Usage Tracking

Per-request token consumption (prompt + completion)
Model identification and version tracking
Context compaction event logging
Cost estimation capabilities

⚙️ Configuration

Automatic Detection

VS Code Ark automatically detects paths using standard locations:

macOS: ~/Library/Application Support/Code/User/
Windows: %APPDATA%\Code\User\
Linux: ~/.config/Code/User/

Environment Variables

export VSCODE_ARK_DB=/path/to/custom.db    # Custom database location
export VSCODE_ARK_CONFIG=/path/to/config   # Custom config directory

Policy Configuration

Data access policies are stored in policy.txt:

ALLOW important-project
DENY sensitive-data
ALLOW *.py

🔧 Development

Setup Development Environment

make install-dev

Running Tests

make test              # Run test suite
make test-cov          # Run with coverage report

Code Quality

make lint              # Run flake8 and mypy
make format            # Format with black and isort

Building

make build             # Build distribution packages
make publish           # Publish to PyPI (requires credentials)

Project Structure

vscode-ark/
├── vscode_ark/           # Main package
│   ├── __init__.py
│   └── cli.py           # Command-line interface
├── scripts/             # Utility scripts
│   ├── ingest.py        # Data ingestion
│   ├── reconstruct.py   # Conversation processing
│   ├── extract.py       # Signal analysis
│   └── watcher.py       # Live monitoring
├── tests/               # Test suite
├── docs/                # Documentation
├── pyproject.toml       # Package configuration
├── setup.py            # Legacy setup
├── Makefile            # Development tasks
└── README.md           # This file

🤝 Contributing

Fork the repository
Create a feature branch: git checkout -b feature/amazing-feature
Make your changes and add tests
Run the test suite: make test
Format code: make format
Commit your changes: git commit -m 'Add amazing feature'
Push to the branch: git push origin feature/amazing-feature
Open a Pull Request

Development Guidelines

Type Hints: All functions should have type annotations
Docstrings: Comprehensive docstrings for public APIs
Tests: Unit tests for all new functionality
Linting: Code must pass flake8 and mypy checks
Formatting: Code must be formatted with black and isort

📝 License

This project is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built for analyzing VS Code/Copilot Chat interaction patterns
Inspired by the need for better human-AI interaction insights
Uses SQLite FTS5 for high-performance full-text search
Implements behavioral signal processing for conversation analysis

VS Code Ark - Understanding the human side of AI conversations.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

2.0.2

May 11, 2026

2.0.1

May 11, 2026

2.0.0

May 10, 2026

0.1.2

May 10, 2026

0.1.1

May 10, 2026

This version

0.1.0

May 10, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vscode_ark-0.1.0.tar.gz (66.9 kB view details)

Uploaded May 10, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

vscode_ark-0.1.0-py3-none-any.whl (64.7 kB view details)

Uploaded May 10, 2026 Python 3

File details

Details for the file vscode_ark-0.1.0.tar.gz.

File metadata

Download URL: vscode_ark-0.1.0.tar.gz
Upload date: May 10, 2026
Size: 66.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vscode_ark-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`24301677588618563a678f9e833cf61d11959e4a251e7a03ce933d071fc8f556`
MD5	`1eade6fd52a5e7ec283f8a9ad103e19f`
BLAKE2b-256	`aa628dad58f8936f66691057dddae55445c8e30bf319585a6f05eff340861edd`

See more details on using hashes here.

File details

Details for the file vscode_ark-0.1.0-py3-none-any.whl.

File metadata

Download URL: vscode_ark-0.1.0-py3-none-any.whl
Upload date: May 10, 2026
Size: 64.7 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.9.6

File hashes

Hashes for vscode_ark-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`b01bd2e3fe8299026ce949e5bb53f02f672a463f3d54c14237d8057bdd8de8cf`
MD5	`391383e7f5c151e7a27b5ea99a79097f`
BLAKE2b-256	`ceadbb97e371504573a3faafa9c47b3637b61e066a9aed1195db7526b68b3d2c`

See more details on using hashes here.

vscode-ark 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

VS Code Ark

✨ Features

📋 Table of Contents

🚀 Installation

Prerequisites

From Source

With Development Dependencies

From PyPI (Future)

⚡ Quick Start

🧠 SQLite limits and mitigation

🔧 Configuration

🏗️ Architecture

Core Components

Database Schema

🖥️ CLI Reference

Core Commands

Command Examples

📊 Data Analysis

Behavioral Signals

Heat Score Algorithm

Token Usage Tracking

⚙️ Configuration

Automatic Detection

Environment Variables

Policy Configuration

🔧 Development

Setup Development Environment

Running Tests

Code Quality

Building

Project Structure

🤝 Contributing

Development Guidelines

📝 License

🙏 Acknowledgments

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes