Skip to main content

Export your Claude Code and Codex conversations to Hugging Face as structured training data

Project description

CodeClaw

CodeClaw exports Claude Code and Codex sessions into privacy-safe training datasets, with gated publish controls, automated sync workflows, and optional MCP memory tooling.

Tests Release License: MIT

Why CodeClaw

  • Turn day-to-day coding sessions into structured, reusable training data.
  • Keep privacy controls first-class with redaction and manual review gates.
  • Preserve historical problem-solving context through MCP-accessible session memory.

Core Capabilities

  • Multi-source ingestion:
    • Claude Code and Codex session discovery and parsing.
  • Privacy-aware export:
    • Secret and PII redaction, username anonymization, and project-level exclusions.
  • Controlled publishing workflow:
    • Local export, user review attestations, confirm gate, then push.
  • Continuous mode:
    • Background watch daemon for incremental sync.
  • Memory tooling:
    • MCP server with search, project patterns, trajectory stats, session lookup, graph similarity retrieval, and index refresh.

Install

pip install codeclaw

From source:

git clone https://github.com/ychampion/codeclaw.git
cd codeclaw
pip install -e ".[dev]"

Quick Start

# Authenticate once
huggingface-cli login --token YOUR_HF_TOKEN

# Discover and configure
codeclaw prep
codeclaw config --source both
codeclaw list --source both
codeclaw config --repo username/cc-logs

# Export locally first
codeclaw export --no-push

# Review and confirm
codeclaw confirm \
  --full-name "YOUR FULL NAME" \
  --attest-full-name "Asked for full name and scanned export." \
  --attest-sensitive "Reviewed for company/client/private identifiers." \
  --attest-manual-scan "Manually reviewed representative sessions."

# Publish only after explicit approval
codeclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."

Commands

Command Description
codeclaw status Show current stage and next steps (JSON)
codeclaw prep Discover projects and auth state
codeclaw list List projects with source, size, and exclusion state
codeclaw config ... Configure repo, sources, exclusions, and redactions
codeclaw export --no-push Export locally for review
codeclaw confirm ... Run checks and unlock push gate
codeclaw export --publish-attestation "..." Push dataset after approval
`codeclaw watch --start --stop
codeclaw serve Start MCP server over stdio
codeclaw install-mcp Register MCP server in Claude config
codeclaw synthesize --project <name> Generate CODECLAW.md from synced sessions
codeclaw update-skill claude Install/update local CodeClaw skill

MCP Memory Server

Install optional MCP dependency:

pip install "codeclaw[mcp]"
codeclaw install-mcp

Available MCP tools:

  • search_past_solutions(query, max_results=5)
  • get_project_patterns(project=None)
  • get_trajectory_stats()
  • get_session(session_id)
  • find_similar_sessions(context, max_results=5)
  • refresh_index()

Privacy and Safety

CodeClaw is designed for private-by-default workflows:

  • path and username anonymization
  • secret and high-entropy token detection
  • custom redaction lists
  • manual confirmation and attestation gates before publish

Automated redaction is not perfect. Always review local exports before publishing.

Community

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeclaw-0.4.0.tar.gz (74.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codeclaw-0.4.0-py3-none-any.whl (58.9 kB view details)

Uploaded Python 3

File details

Details for the file codeclaw-0.4.0.tar.gz.

File metadata

  • Download URL: codeclaw-0.4.0.tar.gz
  • Upload date:
  • Size: 74.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codeclaw-0.4.0.tar.gz
Algorithm Hash digest
SHA256 c6dd19dd82d427ca6043e5152e428a75746bc10f24e13eb741c47194e2c0931a
MD5 48fda11209b3b61db7e44abaa9df7513
BLAKE2b-256 1ecc43022c02902386bc236ca8ec2569f2a65df5055fd34e4106c11ac8872d02

See more details on using hashes here.

File details

Details for the file codeclaw-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: codeclaw-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 58.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codeclaw-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 bed79408af5abbc10f51845c53e51babb9084c687184663a1b46041367eeb475
MD5 03f3dca8cc134b6e89f85d9269f8ad5d
BLAKE2b-256 39103f444be4d61fab457cde589fba198db50bb93e94e21e29d795232654906b

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page