Export your Claude Code and Codex conversations to Hugging Face as structured training data
Project description
CodeClaw
CodeClaw exports Claude Code and Codex sessions into privacy-safe training datasets, with gated publish controls, automated sync workflows, and optional MCP memory tooling.
Why CodeClaw
- Turn day-to-day coding sessions into structured, reusable training data.
- Keep privacy controls first-class with redaction and manual review gates.
- Preserve historical problem-solving context through MCP-accessible session memory.
Core Capabilities
- Multi-source ingestion:
- Claude Code and Codex session discovery and parsing.
- Privacy-aware export:
- Secret and PII redaction, username anonymization, and project-level exclusions.
- Controlled publishing workflow:
- Local export, user review attestations, confirm gate, then push.
- Continuous mode:
- Background watch daemon for incremental sync.
- Memory tooling:
- MCP server with search, project patterns, trajectory stats, session lookup, graph similarity retrieval, and index refresh.
Install
pip install codeclaw
From source:
git clone https://github.com/ychampion/codeclaw.git
cd codeclaw
pip install -e ".[dev]"
Quick Start
# Authenticate once
huggingface-cli login --token YOUR_HF_TOKEN
# Discover and configure
codeclaw prep
codeclaw config --source both
codeclaw list --source both
codeclaw config --repo username/cc-logs
# Export locally first
codeclaw export --no-push
# Review and confirm
codeclaw confirm \
--full-name "YOUR FULL NAME" \
--attest-full-name "Asked for full name and scanned export." \
--attest-sensitive "Reviewed for company/client/private identifiers." \
--attest-manual-scan "Manually reviewed representative sessions."
# Publish only after explicit approval
codeclaw export --publish-attestation "User explicitly approved publishing to Hugging Face."
Commands
| Command | Description |
|---|---|
codeclaw status |
Show current stage and next steps (JSON) |
codeclaw prep |
Discover projects and auth state |
codeclaw list |
List projects with source, size, and exclusion state |
codeclaw config ... |
Configure repo, sources, exclusions, and redactions |
codeclaw export --no-push |
Export locally for review |
codeclaw confirm ... |
Run checks and unlock push gate |
codeclaw export --publish-attestation "..." |
Push dataset after approval |
| `codeclaw watch --start | --stop |
codeclaw serve |
Start MCP server over stdio |
codeclaw install-mcp |
Register MCP server in Claude config |
codeclaw synthesize --project <name> |
Generate CODECLAW.md from synced sessions |
codeclaw update-skill claude |
Install/update local CodeClaw skill |
MCP Memory Server
Install optional MCP dependency:
pip install "codeclaw[mcp]"
codeclaw install-mcp
Available MCP tools:
search_past_solutions(query, max_results=5)get_project_patterns(project=None)get_trajectory_stats()get_session(session_id)find_similar_sessions(context, max_results=5)refresh_index()
Privacy and Safety
CodeClaw is designed for private-by-default workflows:
- path and username anonymization
- secret and high-entropy token detection
- custom redaction lists
- manual confirmation and attestation gates before publish
Automated redaction is not perfect. Always review local exports before publishing.
Community
- Contribution guide: CONTRIBUTING.md
- Security policy: SECURITY.md
- Support channels: SUPPORT.md
- Code of conduct: CODE_OF_CONDUCT.md
- Release process: RELEASE.md
License
MIT - see LICENSE.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file codeclaw-0.4.0.tar.gz.
File metadata
- Download URL: codeclaw-0.4.0.tar.gz
- Upload date:
- Size: 74.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6dd19dd82d427ca6043e5152e428a75746bc10f24e13eb741c47194e2c0931a
|
|
| MD5 |
48fda11209b3b61db7e44abaa9df7513
|
|
| BLAKE2b-256 |
1ecc43022c02902386bc236ca8ec2569f2a65df5055fd34e4106c11ac8872d02
|
File details
Details for the file codeclaw-0.4.0-py3-none-any.whl.
File metadata
- Download URL: codeclaw-0.4.0-py3-none-any.whl
- Upload date:
- Size: 58.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bed79408af5abbc10f51845c53e51babb9084c687184663a1b46041367eeb475
|
|
| MD5 |
03f3dca8cc134b6e89f85d9269f8ad5d
|
|
| BLAKE2b-256 |
39103f444be4d61fab457cde589fba198db50bb93e94e21e29d795232654906b
|