Skip to main content

Export your Claude Code and Codex conversations to Hugging Face as structured training data

Project description

CodeClaw

CodeClaw exports Claude Code and Codex sessions into privacy-safe training datasets, with gated publish controls, automated sync workflows, and optional MCP memory tooling.

Tests Release License: MIT

Why CodeClaw

  • Turn day-to-day coding sessions into structured, reusable training data.
  • Keep privacy controls first-class with redaction and manual review gates.
  • Preserve historical problem-solving context through MCP-accessible session memory.

Core Capabilities

  • Multi-source ingestion:
    • Claude Code and Codex session discovery and parsing.
    • Experimental adapter routing for Cursor, Windsurf, Aider, Continue.dev, Antigravity, VS Code, Zed, and Xcode beta logs.
  • Privacy-aware export:
    • Secret and PII redaction, username anonymization, and project-level exclusions.
    • Layered privacy engine: regex baseline + optional ML NER (codeclaw[pii-ml]).
  • Controlled publishing workflow:
    • Local export, user review attestations, confirm gate, then push.
    • Immutable dataset version snapshots + dedupe index on publish.
  • Continuous mode:
    • Background watch daemon for incremental sync.
  • Memory tooling:
    • MCP server with search, project patterns, trajectory stats, session lookup, graph similarity retrieval, and index refresh.

Install

pip install codeclaw

Optional extras:

pip install "codeclaw[pii-ml]"    # Presidio + spaCy detection layer
pip install "codeclaw[mcp]"       # MCP server runtime
pip install "codeclaw[finetune]"  # Experimental local fine-tune scaffolding

From source:

git clone https://github.com/ychampion/codeclaw.git
cd codeclaw
pip install -e ".[dev]"

Quick Start

# Guided onboarding (HF auth help, repo setup, project scope, MCP, watcher)
codeclaw setup

# Verify environment and connected scope
codeclaw doctor
codeclaw projects --source both
codeclaw stats
codeclaw diff --format json
codeclaw config --encryption status

# Export locally first
codeclaw export --no-push

# Review and confirm
codeclaw confirm \
  --full-name "YOUR FULL NAME" \
  --attest-full-name "Asked for full name and scanned export." \
  --attest-sensitive "Reviewed for company/client/private identifiers." \
  --attest-manual-scan "Manually reviewed representative sessions."

# Publish only after explicit approval
codeclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."

# Optional one-command sharing flow
codeclaw share --publish --publish-attestation "User explicitly approved publishing to Hugging Face."

Commands

Command Description
codeclaw status Show current stage and next steps (JSON)
codeclaw prep Discover projects and auth state
codeclaw setup Guided onboarding (HF, dataset repo, projects, MCP, watcher)
codeclaw doctor Verify logs, HF auth, and MCP registration
codeclaw stats Show session, token, redaction, and export metrics
codeclaw stats --skill Include trajectory-based growth metrics
codeclaw diff Preview exactly what would be redacted before confirm
codeclaw projects Manage connected project scope
codeclaw list List projects with source, size, and exclusion state
codeclaw config ... Configure repo, sources, exclusions, and redactions
`codeclaw config --encryption on off
codeclaw export --no-push Export locally for review
codeclaw confirm ... Run checks and unlock push gate
codeclaw export --publish-attestation "..." Push dataset after approval
codeclaw share [--publish] Fast export flow with optional publish + dataset card update
`codeclaw watch --start --stop
codeclaw serve Start MCP server over stdio
codeclaw install-mcp Register MCP server in Claude config
codeclaw synthesize --project <name> Generate CODECLAW.md from synced sessions
codeclaw update-skill claude Install/update local CodeClaw skill

Experimental preview command:

  • codeclaw finetune --experimental

Additional source filters are available for adapter-backed ingestion:

  • cursor, windsurf, aider, continue, antigravity, vscode, zed, xcode-beta

MCP Memory Server

Install optional MCP dependency:

pip install "codeclaw[mcp]"
codeclaw install-mcp

Available MCP tools:

  • search_past_solutions(query, max_results=5)
  • get_project_patterns(project=None)
  • get_trajectory_stats()
  • get_session(session_id)
  • find_similar_sessions(context, max_results=5)
  • refresh_index()

Privacy and Safety

CodeClaw is designed for private-by-default workflows:

  • path and username anonymization
  • secret and high-entropy token detection
  • custom redaction lists
  • manual confirmation and attestation gates before publish
  • encryption-at-rest support for local artifacts with keyring-backed key management

Automated redaction is not perfect. Always review local exports before publishing.

Package Distribution

  • Primary: PyPI (pip install codeclaw)
  • Additional: GitHub Packages publish workflow is included for org/internal registry consumption.

Install from GitHub Packages:

pip install codeclaw \
  --index-url https://pypi.pkg.github.com/ychampion/simple/ \
  --extra-index-url https://pypi.org/simple

Community

License

MIT - see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

codeclaw-0.4.1.tar.gz (98.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

codeclaw-0.4.1-py3-none-any.whl (82.7 kB view details)

Uploaded Python 3

File details

Details for the file codeclaw-0.4.1.tar.gz.

File metadata

  • Download URL: codeclaw-0.4.1.tar.gz
  • Upload date:
  • Size: 98.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codeclaw-0.4.1.tar.gz
Algorithm Hash digest
SHA256 7c52ce7cb00efabce3cb4e0f957319f7aa4d692516d131bac398f36bb6ba0514
MD5 a88247ee8150dbc8eba1f83d5f12a0e5
BLAKE2b-256 64bfee2982d4d8f590455c237ea2513b5dbd48880748f8cf9b18c18f35f0c9e0

See more details on using hashes here.

File details

Details for the file codeclaw-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: codeclaw-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 82.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for codeclaw-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 476d030db012d95eafac018c96b2cf0bb1e142b5b2ec0f31edc403c95df1412e
MD5 0b44e1bbfd2951b15805fd2126fa5c26
BLAKE2b-256 50ed73ab245750c2445e7801764b96130230c5a8ca0718b7f23d646027a4dc0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page