Backup and restore individual Confluence Cloud spaces via REST API
Project description
Confluence Space Backup & Restore
Backup and restore individual Confluence Cloud spaces via REST API — pages, hierarchy, attachments, comments, labels, properties, restrictions, and blog posts. Fully resumable, with an interactive menu and CLI mode. The Confluence sibling of jira-project-backup-restore.
⚠️ A REST restore is content-faithful, not forensic. It rebuilds pages, hierarchy, attachments, comments and labels — but cannot restore original authors, timestamps, or version history (Confluence Cloud has no API to set them). Read Known Limitations before relying on this for disaster recovery.
🤔 Why?
Confluence Cloud has no supported public API for native space export or import (CONFCLOUD-40457, open for years):
- Native import is UI-only (Settings → Data management → Import spaces, site-admin) and cannot overwrite an existing space key — there's no way to automate or verify it end-to-end.
- Native export is reachable only through undocumented
.actionendpoints that Atlassian can change at any time.
So an automated, verifiable backup and restore must be built on the REST API. That's the backbone here. A native XML export is offered as an optional, best-effort, off-by-default high-fidelity artifact for manual import — never the primary guarantee.
✨ Features
| Feature | Description |
|---|---|
| Full space backup | Pages (storage format) + hierarchy, blog posts, attachments, footer/inline comments, labels, content & space properties, restrictions, permissions |
| 9-phase restore | Space → pages (parent-first) → blog posts → macro/ID remap → attachments → comments → labels → properties → restrictions |
| Two-pass ID remap | Rewrites ri:content-id references after new page IDs are minted, so include/excerpt/pagetree macros don't break |
| New-space default | Restore creates a new space; never clobbers a live space without --overwrite + typed confirmation |
| Homepage adoption | Reuses the space's auto-created homepage instead of leaving a duplicate |
| Multi-space | Back up several spaces in a single run |
| Resumable | Re-run after interruption — completed phases and items are skipped; a phase completes only when fail count is 0 |
| Memory efficient | Pages/comments/attachments stream to disk — large spaces won't OOM a small host |
| Dry-run mode | Preview every restore action without making changes |
| Rate-limit aware | Exponential backoff with 429 / Retry-After detection |
| CSV export | Export space content to CSV for reporting and sharing |
| Backup inspection | Content-type breakdown, page-status counts, disk size |
| Integrity validation | sha256 manifest verification of every backed-up file |
| Connection test | Pre-flight: authentication + space listing |
| Interactive menu + CLI | Guided workflow, or --backup / --restore / --export-csv flags for scripts and cron |
| Native XML export | Optional best-effort high-fidelity ZIP for manual UI import (--native-export) |
🚀 Quick Start
1. Install
Via PyPI (recommended) — provides the confluence-backup command:
pip install confluence-space-backup-restore
confluence-backup
Or clone for development:
git clone https://github.com/davidmalko87/confluence-space-backup-restore.git
cd confluence-space-backup-restore
pip install -r requirements.txt
python main.py
2. Configure
cp .env.example .env
Edit .env with your Confluence Cloud credentials:
CONFLUENCE_URL=https://your-domain.atlassian.net/wiki # must include /wiki
CONFLUENCE_EMAIL=you@example.com
CONFLUENCE_API_TOKEN=your-api-token
Generate an API token at id.atlassian.com/manage-api-tokens. Auth is Basic (email + token) — no session-cookie refresh toil.
3. Run
Interactive menu:
python main.py
==============================================================
Confluence Space Backup & Restore v1.0.0
==============================================================
Site: https://your-domain.atlassian.net Auth: API token Backups: ./backups
--------------------------------------------------------------
--- Backup & Restore ---
1) Backup space(s)
2) Restore space from backup
--- Browse & Analyze ---
3) List existing backups
4) Validate backup integrity
5) Export backup to CSV
6) Inspect backup details
--- Settings & Tools ---
7) Test Confluence connection
8) Show current configuration
9) Cleanup incomplete backups
0) Exit
CLI — backup:
python main.py --backup DOCS
python main.py --backup DOCS,TEAM
python main.py --backup DOCS --native-export
CLI — restore:
python main.py --restore backups/DOCS_20260602_091819 --target-key DOCSR --dry-run
python main.py --restore backups/DOCS_20260602_091819 --target-key DOCSR
CLI — inspect & export:
python main.py --list
python main.py --validate backups/DOCS_20260602_091819
python main.py --export-csv backups/DOCS_20260602_091819
Exit codes: 0 success · 1 failure · 2 bad/insufficient arguments.
📦 What Gets Backed Up
| File | Contents |
|---|---|
space.json |
Space metadata + description |
pages.json |
Pages with storage-format body (streamed) |
blogposts.json |
Blog posts (streamed) |
attachments.json |
Attachment metadata index (streamed) |
attachments/<id>/ |
Attachment binary files, streamed to disk |
comments/footer.json |
Footer comments |
comments/inline.json |
Inline comments (metadata; see limitations) |
labels.json |
Page, blog, and space labels |
properties/*.json |
Content properties + space properties |
restrictions.json |
Per-page restrictions (v1) |
permissions.json |
Space permissions |
versions/<pageId>.json |
Optional page version-metadata sidecar |
native/<KEY>_native.xml.zip |
Optional native XML export |
manifest.json |
File index + sha256 + "complete": true — presence marks the backup complete |
🔄 Restore Phases
Each phase is resumable via restore_progress.json, and is marked complete only when it finishes with zero failures:
| # | Phase | What happens | Endpoint |
|---|---|---|---|
| 1 | Space | Create the target space (new key by default) | POST /rest/api/space |
| 2 | Pages | Create parent-before-child; record old→new ID map | POST /wiki/api/v2/pages |
| 3 | Blog posts | Create flat blog posts | POST /wiki/api/v2/blogposts |
| 4 | Remap | Rewrite ri:content-id + source-space ri:space-key references |
PUT /wiki/api/v2/pages |
| 5 | Attachments | Upload binaries (idempotent PUT) | PUT /rest/api/content/{id}/child/attachment |
| 6 | Comments | Footer comments; author/date prepended as text | POST /wiki/api/v2/footer-comments |
| 7 | Labels | Re-apply page/blog labels | POST /rest/api/content/{id}/label |
| 8 | Properties | Recreate content & space properties | POST /wiki/api/v2/{type}/{id}/properties |
| 9 | Restrictions | Re-apply page restrictions (best-effort) | PUT /rest/api/content/{id}/restriction |
Old→new content-ID mapping is saved in id_maps.json inside the backup directory.
⚠️ Known Limitations
These are Confluence Cloud REST API constraints — not tool bugs. The tool preserves everything it can and records the rest.
| Data | Status | Notes / degrades to |
|---|---|---|
| Page bodies (storage format) | ✅ Restored | round-trippable |
| Page hierarchy (parent/child) | ✅ Restored | rebuilt via parentId, parent-before-child |
| Page links (same space) | ✅ Restored | Cloud stores links by title, which is preserved — they resolve natively in the new space, no remap needed |
| Cross-space links to the source space | ✅ Restored | ri:space-key rewritten source→target in the remap pass; links to other spaces untouched |
| Blog posts | ✅ Restored | flat |
| Attachments (latest version) | ✅ Restored | v1 content download/upload; original filename kept |
| Footer comments | ✅ Restored | original author + date added as a footer note |
| Labels (page/blog) | ✅ Restored | v1 |
| Page restrictions | ⚠️ Best-effort | identities must resolve in the target tenant |
| Content / space properties | ⚠️ Best-effort | system-managed properties may reject writes |
| Inline comments | ❌ Backup only | text re-anchoring is unreliable via API; kept in backup |
| Space labels | ❌ Not restored | no API to set space-level labels |
| Space permissions | ❌ Manual | cross-tenant identity remap; saved for review |
| Original author / creator | ❌ Not settable | becomes the API user; original → footer note + original_provenance property |
| Original created / updated dates | ❌ Not settable | become the restore run time |
| Version history | ❌ Not replayed | optional metadata sidecar only |
| Page / content IDs | ♻️ Reassigned | new IDs minted; old→new map kept |
ID-referencing macros (include, excerpt-include, ID-rooted children/pagetree) |
⚠️ Remapped (defensive) | ri:content-id rewritten in a 2nd pass; unmapped refs break. Most Cloud links use titles (above), so this mainly covers macros/migrated content that embed a numeric content ID |
🛡️ Restore Safety
- Default: a NEW space is created. The tool refuses to modify an existing space key.
- Touching an existing space requires
--overwriteand typing the space key to confirm (the menu always prompts; non-interactive CLI honors the flag). Even then, restore is additive — it never deletes content. - Dry-run (
--dry-run) prints the full plan and writes nothing. - A trashed (not-yet-purged) space key is detected — restore stops and tells you to purge it (Settings → Data Management → Trashed Spaces) or pick another key.
🗜️ Native XML Export (optional, off by default)
With NATIVE_EXPORT=true / --native-export, each backup also attempts a native XML space export and stores the ZIP under native/. This is a high-fidelity DR artifact (preserves history/authors/timestamps) that you import manually via the Confluence UI ("Import a space").
⚠️ It drives undocumented endpoints that Atlassian can change without notice. It is best-effort (failure is logged, never fails the REST backup) and unverified in this build — confirm it works on a non-prod site before relying on it.
🔒 Data Handling & Security
- Backups are stored UNENCRYPTED — plain JSON plus attachment binaries (and, if enabled, a native XML ZIP). They contain real space content; securing/encrypting the backup directory is your responsibility.
- Gitignored by default — never commit:
backups/,*.log,csv_export/, native*.zip/*.xml, and.env. - Logs can leak content: the DEBUG file log records truncated API response bodies (page text). Treat log files as sensitive.
- Credentials live only in
.env(gitignored). No site, space, email, or token is ever hardcoded.
✅ Round-trip Verified
The REST backup→restore round-trip has been proven end-to-end against a live Confluence Cloud site: a space was backed up, restored into a fresh space, and diffed via the API — page count, hierarchy, and attachment bytes all matched. A backup is only proven once it has been restored end-to-end and verified; structural checks alone are necessary but not sufficient.
To prove it yourself on a non-prod site: --backup SOURCE → --restore <dir> --target-key SCRATCH --dry-run → --restore <dir> --target-key SCRATCH, then compare page count + hierarchy, bodies, attachment count + sizes, comments, and labels.
🗂️ Project Structure
confluence-space-backup-restore/
├── main.py # Entry point — interactive menu + CLI flags
├── .env.example # Configuration template
├── requirements.txt # Python dependencies
│
├── confluence_tool/
│ ├── config.py # .env loader and validation
│ ├── auth.py # Session builder (API token / cookie auth)
│ ├── api_client.py # HTTP client with retry + rate-limit handling
│ ├── backup.py # BackupManager — per-space backup
│ ├── restore.py # RestoreManager — 9-phase restore
│ ├── macros.py # Storage-format content-ID remapper
│ ├── native_export.py # Optional native XML export (best-effort)
│ ├── manifest.py # Manifest build/validate (sha256 + complete flag)
│ ├── progress.py # Resumability tracker (old→new ID maps, phases)
│ ├── export.py # CSV export and backup statistics
│ ├── menu.py # Interactive CLI menu
│ ├── cli.py # Console-script entry point
│ └── utils.py # Logging, JSON streaming I/O, utilities
│
└── backups/ # Backup output directory (gitignored)
└── DOCS_20260602_091819/
├── manifest.json # Completion marker + file index
├── pages.json
├── attachments/
└── ...
⚙️ Configuration Reference
All settings live in .env (copy from .env.example):
| Variable | Required | Default | Description |
|---|---|---|---|
CONFLUENCE_URL |
Yes | — | Cloud base URL — must include /wiki |
CONFLUENCE_EMAIL |
Yes* | — | Account email for API token auth |
CONFLUENCE_API_TOKEN |
Yes* | — | API token — generate here |
CONFLUENCE_COOKIE_HEADER |
Alt* | — | Full Cookie: header value for SSO auth |
CONFLUENCE_VERIFY_SSL |
No | true |
Set false to skip SSL verification |
BACKUP_ROOT |
No | ./backups |
Directory where backups are written |
PAGE_SIZE |
No | 250 |
Items per API page (Cloud v2 max 250) |
MAX_RETRIES |
No | 5 |
Retry count on transient failures |
READ_TIMEOUT |
No | 30 |
HTTP read timeout in seconds |
API_DELAY |
No | 0.2 |
Seconds to wait between API calls |
CHUNK_SIZE |
No | 8388608 |
Bytes per chunk for streaming downloads |
BODY_FORMAT |
No | storage |
storage (recommended) or atlas_doc_format |
INCLUDE_ATTACHMENTS |
No | true |
Download attachment binary files |
INCLUDE_COMMENTS |
No | true |
Back up footer + inline comments |
INCLUDE_BLOGPOSTS |
No | true |
Back up blog posts |
INCLUDE_RESTRICTIONS |
No | true |
Back up per-page restrictions |
INCLUDE_VERSIONS |
No | false |
Save version-metadata sidecar (reference only) |
NATIVE_EXPORT |
No | false |
Also attempt a native XML export (best-effort) |
NATIVE_EXPORT_TIMEOUT |
No | 1800 |
Max seconds to wait for a native export |
* Either
CONFLUENCE_EMAIL+CONFLUENCE_API_TOKENorCONFLUENCE_COOKIE_HEADERis required.
🐍 Requirements
- Python 3.10+ (tested on 3.10–3.13)
requests>= 2.28python-dotenv>= 1.0- Optional:
richfor colored output (pip install .[ui])
📝 Changelog
See CHANGELOG.md for the full version history.
📄 License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file confluence_space_backup_restore-1.0.1.tar.gz.
File metadata
- Download URL: confluence_space_backup_restore-1.0.1.tar.gz
- Upload date:
- Size: 48.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
58b78aa28d2071df45168579003d4afaf110415536cad516f5d50d117a18d131
|
|
| MD5 |
810212a3b6378ce8cc672983492e5ac3
|
|
| BLAKE2b-256 |
ad37a2266a720fcd18f8ad1a8ff2aba8387a1b6803ce30fb6d3d48567037d0b2
|
Provenance
The following attestation bundles were made for confluence_space_backup_restore-1.0.1.tar.gz:
Publisher:
publish.yml on davidmalko87/confluence-space-backup-restore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
confluence_space_backup_restore-1.0.1.tar.gz -
Subject digest:
58b78aa28d2071df45168579003d4afaf110415536cad516f5d50d117a18d131 - Sigstore transparency entry: 1703031988
- Sigstore integration time:
-
Permalink:
davidmalko87/confluence-space-backup-restore@d6b96e47df68a6a30af66955d8e5492c205e3b61 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/davidmalko87
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6b96e47df68a6a30af66955d8e5492c205e3b61 -
Trigger Event:
push
-
Statement type:
File details
Details for the file confluence_space_backup_restore-1.0.1-py3-none-any.whl.
File metadata
- Download URL: confluence_space_backup_restore-1.0.1-py3-none-any.whl
- Upload date:
- Size: 51.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f3f858eccf5245b9bbc3d11bd8b981a856c1b8682ebcff9940e464eb525f9f9
|
|
| MD5 |
528e66222f3f4516c5afdeea948d9b98
|
|
| BLAKE2b-256 |
a8c23f8192c2e574b8f8be0ad7064555f901e83015368e7a9a96f716a3fe46ff
|
Provenance
The following attestation bundles were made for confluence_space_backup_restore-1.0.1-py3-none-any.whl:
Publisher:
publish.yml on davidmalko87/confluence-space-backup-restore
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
confluence_space_backup_restore-1.0.1-py3-none-any.whl -
Subject digest:
3f3f858eccf5245b9bbc3d11bd8b981a856c1b8682ebcff9940e464eb525f9f9 - Sigstore transparency entry: 1703032138
- Sigstore integration time:
-
Permalink:
davidmalko87/confluence-space-backup-restore@d6b96e47df68a6a30af66955d8e5492c205e3b61 -
Branch / Tag:
refs/tags/v1.0.1 - Owner: https://github.com/davidmalko87
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish.yml@d6b96e47df68a6a30af66955d8e5492c205e3b61 -
Trigger Event:
push
-
Statement type: