Python SDK for Claude Code with Rust core
Project description
Claude SDK for Python
A high-performance Python library for parsing and analyzing Claude Code session data. Built with Rust for speed, designed with Python developers in mind.
Table of Contents
- Installation
- Quick Start
- Core Concepts
- API Reference
- Examples
- Performance
- Troubleshooting
- Development
Installation
Prerequisites
- Python 3.8 or higher
- pip or uv package manager
Install from PyPI (when published)
pip install claude-code-analytics
Install from source
# Clone the repository
git clone https://github.com/darinkishore/claude-code-analytics.git
cd claude-code-analytics
# Or using uv (recommended)
uv pip install ./python
Development installation
cd python
uv build
Quick Start
import claude_code_analytics
# Load a session from a JSONL file
session = claude_code_analytics.load("~/.claude/projects/myproject/session_20240101_120000.jsonl")
# Basic session info
print(f"Session ID: {session.session_id}")
print(f"Total cost: ${session.total_cost:.4f}")
print(f"Message count: {len(session.messages)}")
print(f"Tools used: {', '.join(session.tools_used)}")
# Iterate through messages
for message in session:
print(f"{message.role}: {message.text[:100]}...")
# Find all your sessions
sessions = claude_code_analytics.find_sessions()
for session_path in sessions:
print(f"Found session: {session_path}")
Core Concepts
Sessions
A Session represents a complete conversation with Claude, loaded from a JSONL file. Each session contains:
- Messages exchanged between user and assistant
- Tool executions and their results
- Token usage and cost information
- Conversation structure (including branches and sidechains)
- Metadata and statistics
Messages
Messages are the individual exchanges in a conversation. Each message has:
role
: Either "user" or "assistant"text
: The complete text contenttools
: List of tools used (if any)cost
: Cost in USD for this specific messagetimestamp
: When the message was created- Threading information (
uuid
,parent_uuid
)
Projects
A Project is a collection of related sessions, typically stored in the same directory. Projects provide aggregate statistics across all sessions.
Conversation Trees
The SDK automatically reconstructs the conversation structure, handling:
- Linear conversations
- Branching (when you retry or edit messages)
- Sidechains (alternate conversation paths)
- Orphaned messages (missing parents)
API Reference
Functions
load(file_path: str | Path) -> Session
Load a Claude Code session from a JSONL file.
session = claude_code_analytics.load("path/to/session.jsonl")
Parameters:
file_path
: Path to the JSONL session file
Returns: Session
object
Raises:
FileNotFoundError
: If the file doesn't existParseError
: If the JSONL is malformedValidationError
: If the session data is invalid
find_sessions(base_path: Optional[str] = None, project: Optional[str] = None) -> List[Path]
Discover Claude Code session files.
# Find all sessions
all_sessions = claude_code_analytics.find_sessions()
# Find sessions in a specific project
project_sessions = claude_code_analytics.find_sessions(project="myproject")
# Search in a custom location
custom_sessions = claude_code_analytics.find_sessions(base_path="/custom/path")
Parameters:
base_path
: Root directory to search (default:~/.claude/projects/
)project
: Filter by specific project name
Returns: List of Path
objects to session files
find_projects(base_path: Optional[str] = None) -> List[Path]
Find all Claude Code projects.
projects = claude_code_analytics.find_projects()
for project_path in projects:
print(f"Project: {project_path.name}")
Parameters:
base_path
: Root directory to search (default:~/.claude/projects/
)
Returns: List of Path
objects to project directories
load_project(project_identifier: str | Path, base_path: Optional[str] = None) -> Project
Load an entire project with all its sessions.
# Load by project name
project = claude_code_analytics.load_project("myproject")
# Load by path
project = claude_code_analytics.load_project("/path/to/project")
print(f"Total sessions: {len(project.sessions)}")
print(f"Total cost: ${project.total_cost:.2f}")
Parameters:
project_identifier
: Project name or pathbase_path
: Base path for project lookup (if using name)
Returns: Project
object
Raises:
FileNotFoundError
: If project doesn't existSessionError
: If no valid sessions found
Classes
Session
Primary container for Claude Code session data.
Properties:
Property | Type | Description |
---|---|---|
session_id |
str |
Unique session identifier |
messages |
List[Message] |
All messages in conversation order |
total_cost |
float |
Total cost in USD |
tools_used |
List[str] |
Unique tool names used |
duration |
Optional[float] |
Session duration in seconds |
conversation_tree |
ConversationTree |
Message threading structure |
metadata |
SessionMetadata |
Detailed statistics |
tool_executions |
List[ToolExecution] |
All tool runs |
tool_costs |
Dict[str, float] |
Cost breakdown by tool |
cost_by_turn |
List[float] |
Cost per message |
project_path |
Optional[Path] |
Project directory |
project_name |
Optional[str] |
Project name |
Methods:
# Get main conversation (excluding sidechains)
main_messages = session.get_main_chain()
# Filter by role
user_messages = session.get_messages_by_role("user")
assistant_messages = session.get_messages_by_role("assistant")
# Find messages using specific tools
bash_messages = session.get_messages_by_tool("bash")
# Get a specific message
message = session.get_message_by_uuid("msg-uuid-123")
# Custom filtering
long_messages = session.filter_messages(lambda m: len(m.text) > 1000)
# Get conversation thread
thread = session.get_thread("msg-uuid-789") # Returns path from root
# Iteration and length
for msg in session:
print(msg.text)
print(f"Total messages: {len(session)}")
Message
Represents a single message in the conversation.
Properties:
Property | Type | Description |
---|---|---|
role |
str |
"user" or "assistant" |
text |
str |
Complete text content |
model |
Optional[str] |
Model used (e.g., "claude-3-sonnet-20240229") |
cost |
Optional[float] |
Cost in USD |
tools |
List[str] |
Tool names used |
stop_reason |
Optional[str] |
Why generation stopped |
usage |
Optional[TokenUsage] |
Token usage details |
timestamp |
str |
RFC3339 timestamp |
uuid |
str |
Unique identifier |
parent_uuid |
Optional[str] |
Parent message UUID |
is_sidechain |
bool |
Whether part of a sidechain |
cwd |
Optional[Path] |
Working directory |
total_tokens |
int |
Total token count |
input_tokens |
int |
Input token count |
output_tokens |
int |
Output token count |
Methods:
# Check for tool usage
if message.has_tool_use():
tools = message.get_tool_blocks()
for tool in tools:
print(f"Tool: {tool.name}, Input: {tool.input}")
# Get text content blocks
text_blocks = message.get_text_blocks()
# Get all content blocks with proper typing
for block in message.get_content_blocks():
if isinstance(block, claude_code_analytics.TextBlock):
print(f"Text: {block.text}")
elif isinstance(block, claude_code_analytics.ToolUseBlock):
print(f"Tool: {block.name}")
Project
Container for multiple sessions in a project.
Properties:
Property | Type | Description |
---|---|---|
name |
str |
Project name |
sessions |
List[Session] |
All sessions in project |
total_cost |
float |
Aggregate cost |
total_messages |
int |
Total message count |
tool_usage_count |
Dict[str, int] |
Tool usage frequency |
total_duration |
Optional[float] |
Total time in seconds |
project = claude_code_analytics.load_project("myproject")
# Analyze tool usage patterns
for tool, count in project.tool_usage_count.items():
avg_per_session = count / len(project.sessions)
print(f"{tool}: {count} uses ({avg_per_session:.1f} per session)")
# Find expensive sessions
expensive = [s for s in project.sessions if s.total_cost > 1.0]
ToolExecution
Complete record of a tool invocation.
Properties:
Property | Type | Description |
---|---|---|
tool_name |
str |
Name of the tool |
input |
Dict[str, Any] |
Input parameters |
output |
ToolResult |
Execution result |
duration_ms |
Optional[int] |
Execution time |
timestamp |
str |
When executed |
Methods:
# Check success
if execution.is_success():
print(f"{execution.tool_name} completed in {execution.duration_ms}ms")
else:
print(f"Failed: {execution.output.stderr}")
ConversationTree
Tree structure representing conversation flow.
Properties:
Property | Type | Description |
---|---|---|
root_messages |
List[ConversationNode] |
Root nodes |
orphaned_messages |
List[str] |
Messages with missing parents |
circular_references |
List[str] |
Circular reference UUIDs |
stats |
ConversationStats |
Tree statistics |
Methods:
tree = session.conversation_tree
# Get tree metrics
print(f"Max depth: {tree.max_depth()}")
print(f"Branch points: {tree.count_branches()}")
# Traverse tree
def walk_tree(node, depth=0):
print(" " * depth + node.message.text[:50])
for child in node.children:
walk_tree(child, depth + 1)
for root in tree.root_messages:
walk_tree(root)
Exceptions
# Exception hierarchy
claude_code_analytics.ClaudeSDKError # Base exception
├── claude_code_analytics.ParseError # JSONL parsing failed
├── claude_code_analytics.ValidationError # Invalid data
└── claude_code_analytics.SessionError # Session-specific issues
# Example handling
try:
session = claude_code_analytics.load("session.jsonl")
except claude_code_analytics.ParseError as e:
print(f"Failed to parse: {e}")
except claude_code_analytics.ClaudeSDKError as e:
print(f"SDK error: {e}")
Examples
Basic Session Analysis
import claude_code_analytics
# Load session
session = claude_code_analytics.load("session.jsonl")
# Print summary
print(f"Session: {session.session_id}")
print(f"Duration: {session.duration / 60:.1f} minutes" if session.duration else "Duration unknown")
print(f"Messages: {len(session)} ({len(session.get_messages_by_role('user'))} from user)")
print(f"Cost: ${session.total_cost:.4f}")
print(f"Tools: {', '.join(session.tools_used) or 'None'}")
# Analyze token usage
total_tokens = sum(msg.total_tokens for msg in session.messages)
print(f"Total tokens: {total_tokens:,}")
Tool Usage Patterns
import claude_code_analytics
from collections import defaultdict
session = claude_code_analytics.load("session.jsonl")
# Count tool usage by message
tool_messages = defaultdict(list)
for msg in session.messages:
if msg.has_tool_use():
for tool in msg.tools:
tool_messages[tool].append(msg)
# Print tool usage summary
for tool, messages in sorted(tool_messages.items()):
print(f"\n{tool}: {len(messages)} uses")
# Show first few uses
for msg in messages[:3]:
preview = msg.text[:100].replace('\n', ' ')
print(f" - {preview}...")
Cost Analysis Across Projects
import claude_code_analytics
# Find all projects
projects = claude_code_analytics.find_projects()
# Analyze costs
project_costs = []
for project_path in projects:
try:
project = claude_code_analytics.load_project(project_path)
project_costs.append((project.name, project.total_cost, len(project.sessions)))
except Exception as e:
print(f"Failed to load {project_path}: {e}")
# Sort by cost
project_costs.sort(key=lambda x: x[1], reverse=True)
# Print report
print("Project Cost Analysis")
print("-" * 50)
for name, cost, session_count in project_costs:
avg_cost = cost / session_count if session_count > 0 else 0
print(f"{name:20} ${cost:8.2f} ({session_count:3} sessions, avg ${avg_cost:.2f})")
Conversation Flow Analysis
import claude_code_analytics
session = claude_code_analytics.load("session.jsonl")
tree = session.conversation_tree
# Find branching points
for root in tree.root_messages:
def find_branches(node, path=[]):
current_path = path + [node.message.uuid]
if len(node.children) > 1:
print(f"\nBranch point at message {len(current_path)}:")
print(f" {node.message.text[:100]}...")
print(f" Branches into {len(node.children)} paths")
for child in node.children:
find_branches(child, current_path)
find_branches(root)
# Analyze sidechains
sidechain_messages = [m for m in session.messages if m.is_sidechain]
if sidechain_messages:
print(f"\nFound {len(sidechain_messages)} sidechain messages")
Exporting Session Data
import claude_code_analytics
import json
import csv
session = claude_code_analytics.load("session.jsonl")
# Export to JSON
export_data = {
"session_id": session.session_id,
"total_cost": session.total_cost,
"messages": [
{
"role": msg.role,
"text": msg.text,
"cost": msg.cost,
"timestamp": msg.timestamp,
"tools": msg.tools
}
for msg in session.messages
]
}
with open("session_export.json", "w") as f:
json.dump(export_data, f, indent=2)
# Export tool usage to CSV
with open("tool_usage.csv", "w", newline="") as f:
writer = csv.writer(f)
writer.writerow(["Timestamp", "Tool", "Duration (ms)", "Success"])
for exec in session.tool_executions:
writer.writerow([
exec.timestamp,
exec.tool_name,
exec.duration_ms or "N/A",
exec.is_success()
])
Performance
The Claude SDK is built with Rust for exceptional performance:
- Parsing speed: 1000+ messages per second
- Memory efficient: Streaming parser for large files
- Zero-copy strings: Minimal memory allocation
- Thread safe: Can be used in multi-threaded applications
Benchmarks
File Size | Messages | Parse Time | Memory Usage |
---|---|---|---|
100 KB | 50 | <10ms | 2 MB |
1 MB | 500 | <50ms | 8 MB |
10 MB | 5000 | <300ms | 35 MB |
100 MB | 50000 | <3s | 350 MB |
Troubleshooting
Common Issues
ImportError: No module named 'claude_code_analytics'
Solution: Ensure you've installed the package:
pip install claude-code-analytics
# or for development
uv build
FileNotFoundError when loading sessions
Solution: Check the file path and ensure you have read permissions:
import os
path = os.path.expanduser("~/.claude/projects/myproject/session.jsonl")
if os.path.exists(path):
session = claude_code_analytics.load(path)
ParseError: Invalid JSONL format
Solution: Ensure the file is a valid Claude Code session file:
# Check first few lines
head -n 5 session.jsonl
# Validate JSON
python -m json.tool session.jsonl
High memory usage with large files
Solution: Process sessions in batches:
# Instead of loading all sessions at once
sessions = []
for path in claude_code_analytics.find_sessions(project="large_project"):
session = claude_code_analytics.load(path)
# Process session
del session # Free memory
Debug Mode
Enable detailed logging for troubleshooting:
import logging
logging.basicConfig(level=logging.DEBUG)
# Now SDK operations will print debug info
session = claude_code_analytics.load("session.jsonl")
Development
Building from source
# Clone repository
git clone https://github.com/yourusername/claude-code-analytics.git
cd claude-code-analytics
# Build Rust library
cargo build --release
# Build Python package
uv build
Running tests
# Rust tests
cargo test
# Python tests
uv build
uv run -m pytest tests/
The Python test suite includes fixtures for malformed JSONL and a multi-megabyte
session to ensure ParseError
is raised correctly and large files load
successfully.
Contributing
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature
) - Commit your changes (
git commit -m 'Add amazing feature'
) - Push to the branch (
git push origin feature/amazing-feature
) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
Built with:
- PyO3 - Rust bindings for Python
- Maturin - Build and publish Rust Python extensions
- Serde - Serialization framework for Rust
Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Documentation: Full API Docs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file claude_code_analytics-0.1.1.tar.gz
.
File metadata
- Download URL: claude_code_analytics-0.1.1.tar.gz
- Upload date:
- Size: 938.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
d67837156e955e6d67193465c0e824168d0b35b09ff685263c9407bcabf6e5c2
|
|
MD5 |
8027ffeec65a2e971185191585e5b71a
|
|
BLAKE2b-256 |
c81a9f75f27017c43b2c644187f479525748aeab67489843fc8b82041b771ac0
|
File details
Details for the file claude_code_analytics-0.1.1-cp311-abi3-macosx_11_0_arm64.whl
.
File metadata
- Download URL: claude_code_analytics-0.1.1-cp311-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 469.4 kB
- Tags: CPython 3.11+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.7.9
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 |
dc28ceedbc4df872468ded97b6afcce052c01209ba29b89a17644192a8dc75fa
|
|
MD5 |
a01e233a47f73cbef0a94a91964e8f17
|
|
BLAKE2b-256 |
8e49294890d887d30c4e1018a67bc46180b3ef30803d2ff0ccdfe33e1aacd771
|