Python-first repository subtree export tool.
Project description
Copybarista :coffee:
Publish and sync clean standalone repositories from private or monorepo source trees.
Config reference · Tutorial · Examples · GitHub setup · Changelog · Blog post
Why should I use this?
Copybarista is for teams running Python that publish OSS packages from private or monorepo source trees:
- Primary source code is embedded inside another repository, often private.
- The standalone OSS repository should be automatically assembled, not hand-maintained.
- Public fixes should sync and flow back through normal pull requests.
- The team wants Python-native tooling that fits Python packaging and CI.
Copybarista turns a selected source subtree into a clean repository tree. It copies only the files you choose, rewrites text deterministically, and writes either a local folder or one squash commit on a Git branch.
Use it when the private or monorepo checkout should stay canonical, but a package, tool, or library needs to live in a separate repository. Syncs are reviewed as pull requests, and public changes can be imported back only when Copybarista can map and verify them safely.
What Copybarista Does
- Publish a package from a larger repository without moving files by hand or maintaining custom sync scripts.
- Keep the source checkout canonical while exporting an exact public tree.
- Rewrite imports, docs, or generated blocks as part of the export.
- Use the example GitHub workflows to review syncs as pull requests instead of
pushing to
main. - Bring public fixes back into source only when the reverse mapping verifies.
- Reject unsupported config instead of guessing and producing a surprising export.
Generated export PRs are workflow-owned and can auto-merge after required checks. Public changes flow back through separate source PRs so maintainers can review what enters the private or monorepo source of truth.
Why not just use an Alternative Tool?
Two adjacent tools solve related parts of this problem:
- Git subtree and
git-filter-repowith custom tooling for code transformations. - Copybara, a mature general-purpose migration tool with a Java-based runtime.
Start with those tools first when their tradeoffs match your requirements, see the table below for the specific differences.
Copybarista is intentionally narrowly scoped: it's a Python program which publishes clean OSS packages from private or monorepo sources while rewriting the exported tree and syncing through GitHub pull requests. Built and maintained by rekursiv.ai, it manages syncs between repositories while fitting a Python and GitHub toolchain.
Short version:
| Need | Copybarista | Copybara | Git subtree / git-filter-repo |
|---|---|---|---|
| Python package workflow | :white_check_mark: Built for Python repos, TOML config, Python CI | :warning: Powerful, but Java/Starlark-based | :warning: Git-first; project behavior becomes scripts |
| Rewrite imports/docs/private blocks | :white_check_mark: Built in | :white_check_mark: Broad transform model | :x: Requires custom scripts |
| GitHub PR sync in both directions | :white_check_mark: Example export/import workflows | :warning: Requires workflow glue | :x: No built-in PR workflow |
| Preserve full history | :x: Squash-style export focus | :white_check_mark: Yes | :white_check_mark: Yes |
| General migration engine | :x: Intentionally scoped | :white_check_mark: Yes | :x: No |
- Use Copybarista when the hard part is not splitting history, but producing a clean OSS package repository with deterministic rewrites, private-name checks, and GitHub PR syncs.
- Use Copybara when you need a more general purpose, broader migration engine.
- Use Git subtree or
git-filter-repowhen history preservation is the main goal and the subtree is already self-contained.
The detailed comparison below lists the specific workflow capabilities.
Install
uv tool install copybarista
copybarista --help
Other options:
uv add copybarista
pipx install copybarista
pip install copybarista
Copybarista requires Python 3.12. Git exports also require the system
git executable.
Quick Start
Create copy.barista.toml at the root of the source checkout:
[workflow]
name = "widget"
mode = "squash"
source_root = "packages/widget"
[destination.folder]
path = "/tmp/widget-oss"
[files]
include = ["**"]
exclude = [
".pytest_cache/**",
"**/.pytest_cache/**",
".ruff_cache/**",
"**/.ruff_cache/**",
".venv/**",
"**/__pycache__/**",
"*.pyc",
"**/*.pyc",
"dist/**",
]
[[transform]]
type = "replace"
path = "tests/test_widget.py"
before = "from monorepo.packages.widget import"
after = "from widget import"
Validate the config and export the standalone tree:
copybarista validate copy.barista.toml
copybarista export copy.barista.toml /path/to/source \
--folder-dir /tmp/widget-oss
If /tmp/widget-oss already exists, pass --force to replace it after
Copybarista's destination safety checks:
copybarista export copy.barista.toml /path/to/source \
--folder-dir /tmp/widget-oss \
--force \
--json
The source_ref argument is the checkout root. workflow.source_root is
resolved relative to that root, and exported files land at the destination
root. Use [[files.copy]] when a public package should also include selected
shared files from elsewhere in the same checkout.
Common Workflows
Set Up Package Sync
Use init-sync to create the package-local sync files that every exported
package can keep public:
copybarista init-sync . \
--package-name configgle \
--sync-label Configgle \
--source-root packages/configgle \
--public-repo example/configgle \
--source-repo example/source \
--copybarista-project-path tools/copybarista \
--smoke-import configgle \
--type-check-target configgle \
--type-check-target tests
This writes copy.barista.toml, copybarista.sync.toml, the public import
workflow .github/workflows/sync-to-source.yml, and the public package
validation workflow .github/workflows/package-validation.yml. The package name
lives in copybarista.sync.toml; script and workflow names stay stable so new
packages do not need sync_<package>.py files or package-specific environment
names. Use init-sync --overwrite only when intentionally regenerating existing
sync files.
The validation workflow runs package-owned commands from copybarista.sync.toml.
By default it syncs dependencies, runs Ruff, basedpyright, pytest, a smoke import,
and uv build. Override it at setup time when a package needs different public
correctness gates:
copybarista init-sync . \
... \
--validation-python-version 3.12 \
--validation-command 'uv sync --all-groups' \
--validation-command 'uv run pytest'
Validate the scaffolding before wiring GitHub Actions:
copybarista check-sync-config .
copybarista write-export-workflow copybarista.sync.toml \
--output configgle-export.yml
Export To A Folder
Use folder export for local inspection, release checks, and tests:
copybarista export copy.barista.toml /path/to/source \
--folder-dir /tmp/widget-oss \
--force
Folder export replaces the destination contents after safety checks. Existing
destinations require --force; the flag never disables those safety checks.
Export To Git
Use Git export when the standalone repository should receive one clean sync
commit. GitHub PR sync usually uses folder export into a checked-out public
repository and then opens a PR branch; publish-git is for local mirrors,
unprotected destinations, or deliberate non-PR workflows.
Add a Git destination:
[destination.git]
url = "file:///tmp/widget-oss.git"
branch = "main"
committer_name = "Widget Export"
committer_email = "opensource@example.com"
Then run:
copybarista publish-git copy.barista.toml /path/to/source
Git export updates a cached bare mirror, copies the transformed tree into a
temporary checkout, creates one commit when the destination changes, and pushes
it to the configured branch. Generated commits include Copybarista-Source-Rev when
the source checkout has a Git HEAD. Local file:// remotes must already be
Git repositories or existing empty directories that Copybarista can initialize
as bare repositories.
Import Public Changes
Use import-change when a public repository change needs to move back into the
source checkout:
copybarista import-change copy.barista.toml \
--public-base /tmp/public-base \
--public-head /tmp/public-head \
--source-base /tmp/source-base \
--destination /tmp/source-worktree
Verification is enabled by default. The source base must reproduce the public base, and the imported destination must re-export to the public head. Failed imports roll back touched destination paths.
Use A Supported copy.bara.sky
Copybarista can directly run the supported subset of copy.bara.sky
workflows:
copybarista export copy.bara.sky /path/to/source \
--folder-dir /tmp/out \
--force
Or translate the workflow to TOML first:
copybarista translate copy.bara.sky --workflow export \
--output copy.barista.toml
copybarista validate copy.barista.toml
What It Supports
- Local source checkouts, passed as a CLI path.
- Local folder exports with explicit
--forcefor existing destinations. - Single-commit Git exports.
- Include/exclude globs using
*,**,?, braces, character classes, and escaped literals. - Source-root selection that moves the selected subtree to destination root.
- Multi-source assembly with
[[files.copy]]for selected shared files. - Literal whole-file text replacements.
- Staged file or directory moves for public tree layout adjustments.
- Marker-delimited block stripping for exact file paths.
- Native TOML configs for Copybarista workflows.
- Direct export from supported
copy.bara.skyworkflows via internal translation. - Manual translation from supported
copy.bara.skyworkflows to TOML. - Local change-request import for public repository edits.
- JSON export manifests with file hashes and transform reports.
- Strict config validation so unsupported fields fail loudly.
Copybarista supports a documented copy.bara.sky subset for common repository
export workflows. It is not a full migration-engine clone. Unsupported
constructs fail with explicit errors instead of being ignored.
CLI
copybarista validate CONFIG [--workflow NAME]
copybarista translate COPY_BARA_SKY [--workflow NAME] [--output CONFIG]
copybarista export CONFIG SOURCE_REF [--workflow NAME] \
[--folder-dir DIR] [--force] [--json]
copybarista publish-git CONFIG SOURCE_REF [--workflow NAME] [--json]
copybarista check-leaks CONFIG ROOT [--workflow NAME]
copybarista import-change CONFIG --public-base DIR --public-head DIR \
--source-base DIR --destination DIR [--workflow NAME] [--no-verify] [--json]
copybarista init-sync ROOT --package-name NAME --source-root PATH \
--public-repo OWNER/REPO --source-repo OWNER/REPO \
--copybarista-project-path PATH --smoke-import MODULE \
[--sync-label LABEL] [--type-check-target PATH] \
[--forbidden-pr-text TEXT] [--validation-python-version VERSION] \
[--validation-command COMMAND] [--sync-user-name NAME] \
[--sync-user-email EMAIL] [--overwrite]
copybarista check-sync-config ROOT
copybarista write-export-workflow copybarista.sync.toml [--output PATH]
CONFIG can be a Copybarista TOML file or a supported copy.bara.sky file.
export uses --folder-dir when supplied, otherwise
destination.folder.path from the config. Folder export replaces destination
contents after safety checks. Existing destinations require --force; the flag
never disables safety checks.
import-change imports a public change-request tree into a source-of-truth
checkout using the supported reversible transform subset. It requires
local public base, public head, source base, and destination checkouts so tests
and workflows can run without network access. Verification is enabled by
default: the source base must reproduce the public base, and the imported
destination must re-export to the public head. Failed imports roll back touched
destination paths.
--workflow selects a named workflow from copy.bara.sky; publish-git
defaults to export_git, and other commands default to export.
check-leaks runs the config's [leak_check] policy against an existing
exported tree.
Bidirectional Sync Model
Copybarista sync is PR-based in both directions:
- Source to public: the source workflow exports the configured source root into
a temporary tree, validates the exported checkout, and opens a PR in
the public repository from a generated package export branch, e.g.
configgle/export/main. Reruns can update the same branch so there is one active export PR per project branch. The generated branch is replaced withgit push --force-with-lease. - Public to source: the public workflow checks out a public base and public
head, runs
copybarista import-change, validates the target checkout, and opens a source PR from a generated package import branch, e.g.configgle/import/sha-<public-sha>. Regenerating that import branch also usesgit push --force-with-lease.
The public repository CI checks release-tree policy, lint, formatting, types,
unit tests, package build, and installed-wheel import. The reverse-sync
workflow also runs on trusted public PRs as an import validation check, except
for generated export PRs whose source of truth is the export workflow. Public
main pushes from merged generated export PRs are skipped for the same reason
when the push commit is authored by copybarista or the commit message
identifies a generated export branch. Reverse sync only opens a
source PR for direct public changes or manual-dispatch runs.
No workflow should push directly to a protected default branch. Generated sync
branches are workflow-owned artifacts; do not push manual commits to them.
--force-with-lease lets reruns replace generated commits while refusing to
overwrite unexpected remote updates. Conflicts are handled in two layers:
GitHub blocks textual PR conflicts, and
copybarista import-change blocks semantic conflicts such as unmapped files,
excluded paths, metadata writes, non-reversible transforms, public-base
mismatches, and re-export mismatches. VCS and .copybarista metadata are
ignored while diffing and refused as write targets. If both repositories
changed the same public file, import the public PR to source first or
re-export source and resolve the PR diff explicitly.
Source-to-public auto-merge is safe only as PR auto-merge after required checks pass, not as direct default-branch pushes. Keep public-to-source imports manual unless your project has a separate review policy for accepting public changes. For protected branches, required checks, bot-authored PRs, and token permissions, see GitHub setup.
Security And Privacy Model
Copybarista is designed for the case where the source repository may contain private code, names, paths, or workflows that must not appear in the public repository. The main protections are:
- Explicit file selection: only configured paths under
source_rootare exported, and source-only paths such as private docs, local config, caches, build outputs, and release tooling can be excluded. - Deterministic transforms: private blocks, imports, package names, and generated Python formatting are handled from checked-in config rather than ad hoc scripts.
- Leak checks: optional config rules reject forbidden paths and forbidden text in the transformed export tree before folder or Git destinations are mutated.
- Release-tree checks: public CI can reject private directories, source-only config files, generated caches, bytecode, build artifacts, nested VCS metadata, and unstripped private README markers before release.
- PR-only sync: generated branches are reviewed through GitHub pull requests; workflows should not push directly to protected default branches.
- Token isolation: public-to-source validation runs without write tokens, and token-bearing PR creation should run trusted workflow code captured before imported public changes can modify source files.
- Reverse verification: imports must reproduce the public base before applying changes and re-export to the public head afterward. Ambiguous or unsafe reverse mappings are rejected.
Copybarista does not replace normal review. Treat copy.barista.toml, GitHub
workflow files, and release-tree policy as part of the security boundary, and
review them whenever the source/public sync scope changes.
Compatibility
Copybarista's native config format is TOML. The preferred filename is
copy.barista.toml. It can also accept a supported copy.bara.sky workflow:
the CLI translates the supported subset internally, validates the generated
Copybarista config, and then runs the same export engine.
| Feature | Native TOML | copy.bara.sky import |
Limits |
|---|---|---|---|
| Workflow mode | mode = "squash" |
mode = "SQUASH" |
Change/history modes fail |
| Source checkout | CLI SOURCE_REF |
folder.origin() |
Remote origins fail |
| Root move | source_root |
core.move(ROOT, "") |
Non-root destinations fail |
| Extra source files | [[files.copy]] |
Not imported | Native TOML only |
| File globs | include / exclude |
glob(..., exclude=...) |
Unsupported glob constructs fail |
| Folder export | [destination.folder] |
folder.destination() |
Needs --force for existing folders |
| Git export | [destination.git] |
git.destination(...) |
Single-commit export |
| Literal replace | type = "replace" |
core.replace(...) |
Regex/options fail |
| Staged move | type = "move" |
Not imported | Native TOML only |
| Ruff formatting | type = "ruff_format" |
Not imported | Runs ruff check --fix --no-cache and ruff format --no-cache |
| Strip block | type = "strip_block" |
Empty multiline replace | Marker form only |
| Arbitrary logic | Not supported | Rejected | No Starlark execution |
| Copybara review flows | Not supported | Rejected | Use local import-change |
The import path is static translation, not full Starlark interpretation. When Copybarista sees an unsupported origin, destination, mode, transform, option, or expression, it reports a config error instead of silently changing behavior.
Supported copy.bara.sky forms include authoring.pass_thru(default=...),
positional git.destination("url", push="main"), omitted origin_files and
destination_files, core.transform([...]), explicit replace reversal, and
core.reverse([...]) for literal replacements.
Detailed Comparison
| Need | Copybarista | Copybara | Git subtree / git-filter-repo |
|---|---|---|---|
| Python packaging ecosystem | :white_check_mark: Yes: Python package, installable with uv, pipx, or pip |
:x: No: Java-based toolchain | :warning: Partial: Git-first workflow with optional Python git-filter-repo install |
| Python project ergonomics | :white_check_mark: Yes: TOML config, pytest-friendly helper scripts, Python CI fit | :warning: Partial: powerful, but configured through Starlark and a separate runtime | :warning: Partial: easy to shell out, but project-specific behavior becomes custom scripts |
| GitHub ecosystem fit | :white_check_mark: Yes: example workflows open export/import PRs for review | :warning: Possible, but requires workflow glue | :x: No built-in PR workflow |
| Select a subtree and publish it as a standalone repository | :white_check_mark: Yes: selected files land at repository root | :white_check_mark: Yes: broader repository migration model | :white_check_mark: Yes: this is the core Git subtree / filtering use case |
| Assemble a public tree from selected files and directories | :white_check_mark: Yes: include/exclude globs select the exported tree | :white_check_mark: Yes: broader file-selection model | :warning: Partial: possible with path filters, but awkward for multiple locations |
| Keep full source history | :x: No: squash-style export is the focus | :white_check_mark: Yes | :white_check_mark: Yes: this is where Git subtree and git-filter-repo fit best |
| Rewrite absolute Python imports for the public package | :white_check_mark: Yes: supported literal replacements | :white_check_mark: Yes: broader transform model | :x: No: Git leaves file contents unchanged |
| Normalize generated Python after rewrites | :white_check_mark: Yes: optional Ruff transform | :warning: Partial: use custom workflow commands | :x: No: Git leaves file contents unchanged |
| Strip private README sections, generated blocks, or internal names | :white_check_mark: Yes: block stripping and release-tree checks | :white_check_mark: Yes: broader transform model | :x: No: requires custom scripts and leak checks |
| Leave private files out of the public repo | :white_check_mark: Yes: explicit include/exclude globs plus export validation | :white_check_mark: Yes: broader file-selection model | :warning: Partial: path filtering helps, but deeper cleanup is custom |
| Import public fixes back into the source checkout | :white_check_mark: Yes: reverse import verifies by re-exporting | :white_check_mark: Yes: supports bidirectional repository movement | :warning: Partial: subtree can move history, but not verify semantic rewrites |
| Fail loudly on unsupported rewrites | :white_check_mark: Yes: unsupported config is rejected | :white_check_mark: Yes: full config parser and migration engine | :x: No transform model to validate |
| Full migration engine | :x: No: intentionally scoped to package sync | :white_check_mark: Yes | :x: No |
Intentional Scope
Copybarista is designed for GitHub-oriented package sync. The supported model is:
- A private or internal checkout is the source of truth.
- A selected subtree is exported as an exact standalone repository tree.
- GitHub Actions can open or update PRs in either direction.
- Public PR imports are accepted only when paths and transforms can be mapped back safely and verified by re-exporting.
The following are intentional non-goals until a real workflow needs them:
- Running arbitrary Starlark.
- Supporting a full origin and destination plugin model.
- Preserving per-commit history or iterative migration modes.
- Destination-file scoped partial cleanup instead of exact-tree replacement.
- Regex-template transforms beyond the literal replacement subset.
- A general transform plugin API.
- External implementation details such as cache directory layout.
Documentation
- Tutorial: build and run a minimal folder export.
- Examples: set up a full source-to-public and public-to-source GitHub PR workflow.
- Config reference: TOML fields, transforms, CLI behavior, and manifest shape.
- Architecture: implementation boundaries and internal APIs.
- GitHub setup: recommended repository rules, branch protection, and release publishing defaults.
Development
python -B scripts/check_release_tree.py . --allow-root-git
uv sync --all-groups
uv run --all-groups pre-commit install
uv run --all-groups pre-commit run --all-files
uv run --all-groups ruff check --no-fix --no-cache .
uv run --all-groups ruff format --check --no-cache .
uv run --all-groups codespell .
uv run --all-groups ty check
uv run --all-groups basedpyright copybarista tests scripts
uv run --all-groups pytest
uv build --out-dir /tmp/copybarista-dist-check
Clean generated local artifacts:
uv run python scripts/clean.py
uv run python scripts/clean.py --venv
Run the local benchmark helper when changing file selection, glob matching, or copy logic:
uv run python scripts/bench.py \
examples/python-package/source-repo/copy.barista.toml \
examples/python-package/source-repo \
--runs 5 \
--json
Unit tests live next to the modules they cover as *_test.py. The top-level
tests/ directory is reserved for integration tests plus fixtures.
Contributing
Copybarista is intentionally conservative. When adding behavior, document the config surface, add focused tests, keep exports deterministic, and reject unsupported config instead of guessing.
Acknowledgements
Copybarista's repository-sync model is inspired by Copybara, Google's open-source tool for transforming and moving code between repositories. Copybara is licensed under the Apache License 2.0.
Copybarista is an independent Python implementation focused on package-oriented GitHub PR workflows. It does not vendor or copy Copybara source code, documentation, logos, or test data, and it is not affiliated with or endorsed by Google or the Copybara project.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file copybarista-0.1.2.tar.gz.
File metadata
- Download URL: copybarista-0.1.2.tar.gz
- Upload date:
- Size: 426.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
49eae67ef40e45eacf9dacf7e3913bc0d4ae3ba2043c5aa91c127244ef4978b1
|
|
| MD5 |
6dcbf44d12c8be2015c77e84fc876f57
|
|
| BLAKE2b-256 |
fb6ea83ebbdd1f3568b26bf1f8eea92b38abbb1e3750824549990faa31ad3039
|
Provenance
The following attestation bundles were made for copybarista-0.1.2.tar.gz:
Publisher:
publish-pypi.yml on rekursiv-ai/copybarista
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
copybarista-0.1.2.tar.gz -
Subject digest:
49eae67ef40e45eacf9dacf7e3913bc0d4ae3ba2043c5aa91c127244ef4978b1 - Sigstore transparency entry: 1429481184
- Sigstore integration time:
-
Permalink:
rekursiv-ai/copybarista@0dae5e704a66e3995ae497041abcb59ef57f8837 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/rekursiv-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@0dae5e704a66e3995ae497041abcb59ef57f8837 -
Trigger Event:
release
-
Statement type:
File details
Details for the file copybarista-0.1.2-py3-none-any.whl.
File metadata
- Download URL: copybarista-0.1.2-py3-none-any.whl
- Upload date:
- Size: 66.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
03d14cb38d0ac70e74c8104e657cc71d519660e5ad590969b28ac1ef2834a5d3
|
|
| MD5 |
67c1381df45ff378dc915fe45276eb9d
|
|
| BLAKE2b-256 |
45df4b56d0c0c615deed362c6ac9f773abfcd0532f2fabce69ca00e3d5898307
|
Provenance
The following attestation bundles were made for copybarista-0.1.2-py3-none-any.whl:
Publisher:
publish-pypi.yml on rekursiv-ai/copybarista
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
copybarista-0.1.2-py3-none-any.whl -
Subject digest:
03d14cb38d0ac70e74c8104e657cc71d519660e5ad590969b28ac1ef2834a5d3 - Sigstore transparency entry: 1429481191
- Sigstore integration time:
-
Permalink:
rekursiv-ai/copybarista@0dae5e704a66e3995ae497041abcb59ef57f8837 -
Branch / Tag:
refs/tags/v0.1.2 - Owner: https://github.com/rekursiv-ai
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@0dae5e704a66e3995ae497041abcb59ef57f8837 -
Trigger Event:
release
-
Statement type: