Backup and restore individual Confluence Cloud spaces via REST API

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

Confluence Space Backup & Restore

Backup and restore individual Confluence Cloud spaces via REST API — pages, hierarchy, attachments, comments, labels, properties, restrictions, and blog posts. Fully resumable, with an interactive menu and CLI mode. The Confluence sibling of jira-project-backup-restore.

⚠️ A REST restore is content-faithful, not forensic. It rebuilds pages, hierarchy, attachments, comments and labels — but cannot restore original authors, timestamps, or version history (Confluence Cloud has no API to set them). Read Known Limitations before relying on this for disaster recovery.

🤔 Why?

Confluence Cloud has no supported public API for native space export or import (CONFCLOUD-40457, open for years):

Native import is UI-only (Settings → Data management → Import spaces, site-admin) and cannot overwrite an existing space key — there's no way to automate or verify it end-to-end.
Native export is reachable only through undocumented .action endpoints that Atlassian can change at any time.

So an automated, verifiable backup and restore must be built on the REST API. That's the backbone here. A native XML export is offered as an optional, best-effort, off-by-default high-fidelity artifact for manual import — never the primary guarantee.

✨ Features

Feature	Description
Full space backup	Pages (storage format) + hierarchy, blog posts, attachments, footer/inline comments, labels, content & space properties, restrictions, permissions
9-phase restore	Space → pages (parent-first) → blog posts → macro/ID remap → attachments → comments → labels → properties → restrictions
Two-pass ID remap	Rewrites `ri:content-id` references after new page IDs are minted, so include/excerpt/pagetree macros don't break
New-space default	Restore creates a new space; never clobbers a live space without `--overwrite` + typed confirmation
Homepage adoption	Reuses the space's auto-created homepage instead of leaving a duplicate
Multi-space	Back up several spaces in a single run
Resumable	Re-run after interruption — completed phases and items are skipped; a phase completes only when fail count is 0
Memory efficient	Pages/comments/attachments stream to disk — large spaces won't OOM a small host
Dry-run mode	Preview every restore action without making changes
Rate-limit aware	Exponential backoff with `429 / Retry-After` detection
CSV export	Export space content to CSV for reporting and sharing
Backup inspection	Content-type breakdown, page-status counts, disk size
Integrity validation	sha256 manifest verification of every backed-up file
Connection test	Pre-flight: authentication + space listing
Interactive menu + CLI	Guided workflow, or `--backup` / `--restore` / `--export-csv` flags for scripts and cron
Native XML export	Optional best-effort high-fidelity ZIP for manual UI import (`--native-export`)

🚀 Quick Start

1. Install

Via PyPI (recommended) — provides the confluence-backup command:

pip install confluence-space-backup-restore
confluence-backup

Or clone for development:

git clone https://github.com/davidmalko87/confluence-space-backup-restore.git
cd confluence-space-backup-restore
pip install -r requirements.txt
python main.py

2. Configure

cp .env.example .env

Edit .env with your Confluence Cloud credentials:

CONFLUENCE_URL=https://your-domain.atlassian.net/wiki   # must include /wiki
CONFLUENCE_EMAIL=you@example.com
CONFLUENCE_API_TOKEN=your-api-token

Generate an API token at id.atlassian.com/manage-api-tokens. Auth is Basic (email + token) — no session-cookie refresh toil.

3. Run

Interactive menu:

python main.py

==============================================================
  Confluence Space Backup & Restore  v1.0.0
==============================================================
  Site: https://your-domain.atlassian.net   Auth: API token   Backups: ./backups
--------------------------------------------------------------
  --- Backup & Restore ---
   1) Backup space(s)
   2) Restore space from backup
  --- Browse & Analyze ---
   3) List existing backups
   4) Validate backup integrity
   5) Export backup to CSV
   6) Inspect backup details
  --- Settings & Tools ---
   7) Test Confluence connection
   8) Show current configuration
   9) Cleanup incomplete backups
   0) Exit

CLI — backup:

python main.py --backup DOCS
python main.py --backup DOCS,TEAM
python main.py --backup DOCS --native-export

CLI — restore:

python main.py --restore backups/DOCS_20260602_091819 --target-key DOCSR --dry-run
python main.py --restore backups/DOCS_20260602_091819 --target-key DOCSR

CLI — inspect & export:

python main.py --list
python main.py --validate backups/DOCS_20260602_091819
python main.py --export-csv backups/DOCS_20260602_091819

Exit codes: 0 success · 1 failure · 2 bad/insufficient arguments.

📦 What Gets Backed Up

File	Contents
`space.json`	Space metadata + description
`pages.json`	Pages with storage-format body (streamed)
`blogposts.json`	Blog posts (streamed)
`attachments.json`	Attachment metadata index (streamed)
`attachments/<id>/`	Attachment binary files, streamed to disk
`comments/footer.json`	Footer comments
`comments/inline.json`	Inline comments (metadata; see limitations)
`labels.json`	Page, blog, and space labels
`properties/*.json`	Content properties + space properties
`restrictions.json`	Per-page restrictions (v1)
`permissions.json`	Space permissions
`versions/<pageId>.json`	Optional page version-metadata sidecar
`native/<KEY>_native.xml.zip`	Optional native XML export
`manifest.json`	File index + sha256 + `"complete": true` — presence marks the backup complete

🔄 Restore Phases

Each phase is resumable via restore_progress.json, and is marked complete only when it finishes with zero failures:

#	Phase	What happens	Endpoint
1	Space	Create the target space (new key by default)	`POST /rest/api/space`
2	Pages	Create parent-before-child; record old→new ID map	`POST /wiki/api/v2/pages`
3	Blog posts	Create flat blog posts	`POST /wiki/api/v2/blogposts`
4	Remap	Rewrite `ri:content-id` + source-space `ri:space-key` references	`PUT /wiki/api/v2/pages`
5	Attachments	Upload binaries (idempotent PUT)	`PUT /rest/api/content/{id}/child/attachment`
6	Comments	Footer comments; author/date prepended as text	`POST /wiki/api/v2/footer-comments`
7	Labels	Re-apply page/blog labels	`POST /rest/api/content/{id}/label`
8	Properties	Recreate content & space properties	`POST /wiki/api/v2/{type}/{id}/properties`
9	Restrictions	Re-apply page restrictions (best-effort)	`PUT /rest/api/content/{id}/restriction`

Old→new content-ID mapping is saved in id_maps.json inside the backup directory.

⚠️ Known Limitations

These are Confluence Cloud REST API constraints — not tool bugs. The tool preserves everything it can and records the rest.

Data	Status	Notes / degrades to
Page bodies (storage format)	✅ Restored	round-trippable
Page hierarchy (parent/child)	✅ Restored	rebuilt via `parentId`, parent-before-child
Page links (same space)	✅ Restored	Cloud stores links by title, which is preserved — they resolve natively in the new space, no remap needed
Cross-space links to the source space	✅ Restored	`ri:space-key` rewritten source→target in the remap pass; links to other spaces untouched
Blog posts	✅ Restored	flat
Attachments (latest version)	✅ Restored	v1 content download/upload; original filename kept
Footer comments	✅ Restored	original author + date added as a footer note
Labels (page/blog)	✅ Restored	v1
Page restrictions	⚠️ Best-effort	identities must resolve in the target tenant
Content / space properties	⚠️ Best-effort	system-managed properties may reject writes
Inline comments	❌ Backup only	text re-anchoring is unreliable via API; kept in backup
Space labels	❌ Not restored	no API to set space-level labels
Space permissions	❌ Manual	cross-tenant identity remap; saved for review
Original author / creator	❌ Not settable	becomes the API user; original → footer note + `original_provenance` property
Original created / updated dates	❌ Not settable	become the restore run time
Version history	❌ Not replayed	optional metadata sidecar only
Page / content IDs	♻️ Reassigned	new IDs minted; old→new map kept
ID-referencing macros (`include`, `excerpt-include`, ID-rooted `children`/`pagetree`)	⚠️ Remapped (defensive)	`ri:content-id` rewritten in a 2nd pass; unmapped refs break. Most Cloud links use titles (above), so this mainly covers macros/migrated content that embed a numeric content ID

🛡️ Restore Safety

Default: a NEW space is created. The tool refuses to modify an existing space key.
Touching an existing space requires --overwrite and typing the space key to confirm (the menu always prompts; non-interactive CLI honors the flag). Even then, restore is additive — it never deletes content.
Dry-run (--dry-run) prints the full plan and writes nothing.
A trashed (not-yet-purged) space key is detected — restore stops and tells you to purge it (Settings → Data Management → Trashed Spaces) or pick another key.

🗜️ Native XML Export (optional, off by default)

With NATIVE_EXPORT=true / --native-export, each backup also attempts a native XML space export and stores the ZIP under native/. This is a high-fidelity DR artifact (preserves history/authors/timestamps) that you import manually via the Confluence UI ("Import a space").

⚠️ It drives undocumented endpoints that Atlassian can change without notice. It is best-effort (failure is logged, never fails the REST backup) and unverified in this build — confirm it works on a non-prod site before relying on it.

🔒 Data Handling & Security

Backups are stored UNENCRYPTED — plain JSON plus attachment binaries (and, if enabled, a native XML ZIP). They contain real space content; securing/encrypting the backup directory is your responsibility.
Gitignored by default — never commit: backups/, *.log, csv_export/, native *.zip/*.xml, and .env.
Logs can leak content: the DEBUG file log records truncated API response bodies (page text). Treat log files as sensitive.
Credentials live only in .env (gitignored). No site, space, email, or token is ever hardcoded.

✅ Round-trip Verified

The REST backup→restore round-trip has been proven end-to-end against a live Confluence Cloud site: a space was backed up, restored into a fresh space, and diffed via the API — page count, hierarchy, and attachment bytes all matched. A backup is only proven once it has been restored end-to-end and verified; structural checks alone are necessary but not sufficient.

To prove it yourself on a non-prod site: --backup SOURCE → --restore <dir> --target-key SCRATCH --dry-run → --restore <dir> --target-key SCRATCH, then compare page count + hierarchy, bodies, attachment count + sizes, comments, and labels.

🗂️ Project Structure

confluence-space-backup-restore/
├── main.py                   # Entry point — interactive menu + CLI flags
├── .env.example              # Configuration template
├── requirements.txt          # Python dependencies
│
├── confluence_tool/
│   ├── config.py             # .env loader and validation
│   ├── auth.py               # Session builder (API token / cookie auth)
│   ├── api_client.py         # HTTP client with retry + rate-limit handling
│   ├── backup.py             # BackupManager — per-space backup
│   ├── restore.py            # RestoreManager — 9-phase restore
│   ├── macros.py             # Storage-format content-ID remapper
│   ├── native_export.py      # Optional native XML export (best-effort)
│   ├── manifest.py           # Manifest build/validate (sha256 + complete flag)
│   ├── progress.py           # Resumability tracker (old→new ID maps, phases)
│   ├── export.py             # CSV export and backup statistics
│   ├── menu.py               # Interactive CLI menu
│   ├── cli.py                # Console-script entry point
│   └── utils.py              # Logging, JSON streaming I/O, utilities
│
└── backups/                  # Backup output directory (gitignored)
    └── DOCS_20260602_091819/
        ├── manifest.json     # Completion marker + file index
        ├── pages.json
        ├── attachments/
        └── ...

⚙️ Configuration Reference

All settings live in .env (copy from .env.example):

Variable	Required	Default	Description
`CONFLUENCE_URL`	Yes	—	Cloud base URL — must include `/wiki`
`CONFLUENCE_EMAIL`	Yes*	—	Account email for API token auth
`CONFLUENCE_API_TOKEN`	Yes*	—	API token — generate here
`CONFLUENCE_COOKIE_HEADER`	Alt*	—	Full `Cookie:` header value for SSO auth
`CONFLUENCE_VERIFY_SSL`	No	`true`	Set `false` to skip SSL verification
`BACKUP_ROOT`	No	`./backups`	Directory where backups are written
`PAGE_SIZE`	No	`250`	Items per API page (Cloud v2 max 250)
`MAX_RETRIES`	No	`5`	Retry count on transient failures
`READ_TIMEOUT`	No	`30`	HTTP read timeout in seconds
`API_DELAY`	No	`0.2`	Seconds to wait between API calls
`CHUNK_SIZE`	No	`8388608`	Bytes per chunk for streaming downloads
`BODY_FORMAT`	No	`storage`	`storage` (recommended) or `atlas_doc_format`
`INCLUDE_ATTACHMENTS`	No	`true`	Download attachment binary files
`INCLUDE_COMMENTS`	No	`true`	Back up footer + inline comments
`INCLUDE_BLOGPOSTS`	No	`true`	Back up blog posts
`INCLUDE_RESTRICTIONS`	No	`true`	Back up per-page restrictions
`INCLUDE_VERSIONS`	No	`false`	Save version-metadata sidecar (reference only)
`NATIVE_EXPORT`	No	`false`	Also attempt a native XML export (best-effort)
`NATIVE_EXPORT_TIMEOUT`	No	`1800`	Max seconds to wait for a native export

* Either CONFLUENCE_EMAIL + CONFLUENCE_API_TOKEN or CONFLUENCE_COOKIE_HEADER is required.

🐍 Requirements

Python 3.10+ (tested on 3.10–3.13)
requests >= 2.28
python-dotenv >= 1.0
Optional: rich for colored output (pip install .[ui])

📝 Changelog

See CHANGELOG.md for the full version history.

📄 License

MIT

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

davidmalko87

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

1.0.1

Jun 2, 2026

1.0.0

Jun 2, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

confluence_space_backup_restore-1.0.1.tar.gz (48.9 kB view details)

Uploaded Jun 2, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

confluence_space_backup_restore-1.0.1-py3-none-any.whl (51.3 kB view details)

Uploaded Jun 2, 2026 Python 3

File details

Details for the file confluence_space_backup_restore-1.0.1.tar.gz.

File metadata

Download URL: confluence_space_backup_restore-1.0.1.tar.gz
Upload date: Jun 2, 2026
Size: 48.9 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for confluence_space_backup_restore-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`58b78aa28d2071df45168579003d4afaf110415536cad516f5d50d117a18d131`
MD5	`810212a3b6378ce8cc672983492e5ac3`
BLAKE2b-256	`ad37a2266a720fcd18f8ad1a8ff2aba8387a1b6803ce30fb6d3d48567037d0b2`

See more details on using hashes here.

Provenance

The following attestation bundles were made for confluence_space_backup_restore-1.0.1.tar.gz:

Publisher: publish.yml on davidmalko87/confluence-space-backup-restore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: confluence_space_backup_restore-1.0.1.tar.gz
- Subject digest: 58b78aa28d2071df45168579003d4afaf110415536cad516f5d50d117a18d131
- Sigstore transparency entry: 1703031988
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: davidmalko87/confluence-space-backup-restore@d6b96e47df68a6a30af66955d8e5492c205e3b61
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/davidmalko87
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6b96e47df68a6a30af66955d8e5492c205e3b61
- Trigger Event: push

File details

Details for the file confluence_space_backup_restore-1.0.1-py3-none-any.whl.

File metadata

Download URL: confluence_space_backup_restore-1.0.1-py3-none-any.whl
Upload date: Jun 2, 2026
Size: 51.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for confluence_space_backup_restore-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3f3f858eccf5245b9bbc3d11bd8b981a856c1b8682ebcff9940e464eb525f9f9`
MD5	`528e66222f3f4516c5afdeea948d9b98`
BLAKE2b-256	`a8c23f8192c2e574b8f8be0ad7064555f901e83015368e7a9a96f716a3fe46ff`

See more details on using hashes here.

Provenance

The following attestation bundles were made for confluence_space_backup_restore-1.0.1-py3-none-any.whl:

Publisher: publish.yml on davidmalko87/confluence-space-backup-restore

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: confluence_space_backup_restore-1.0.1-py3-none-any.whl
- Subject digest: 3f3f858eccf5245b9bbc3d11bd8b981a856c1b8682ebcff9940e464eb525f9f9
- Sigstore transparency entry: 1703032138
- Sigstore integration time: Jun 2, 2026
Source repository:
- Permalink: davidmalko87/confluence-space-backup-restore@d6b96e47df68a6a30af66955d8e5492c205e3b61
- Branch / Tag: refs/tags/v1.0.1
- Owner: https://github.com/davidmalko87
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@d6b96e47df68a6a30af66955d8e5492c205e3b61
- Trigger Event: push

confluence-space-backup-restore 1.0.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Confluence Space Backup & Restore

🤔 Why?

✨ Features

🚀 Quick Start

1. Install

2. Configure

3. Run

📦 What Gets Backed Up

🔄 Restore Phases

⚠️ Known Limitations

🛡️ Restore Safety

🗜️ Native XML Export (optional, off by default)

🔒 Data Handling & Security

✅ Round-trip Verified

🗂️ Project Structure

⚙️ Configuration Reference

🐍 Requirements

📝 Changelog

📄 License

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance