yspcrawler: a command-line tool to backup documents from Yandex.Disk public resources
Project description
wparc: a command-line tool to backup public data from WordPress websites using WordPress API
wparc is a command line tool used to backup data from WordPress based websites.
It uses /wp-json/ API provided by default WordPress installation and extracts all data and media files.
Main features
- Data extraction: Dump all WordPress REST API routes and data
- Media download: Download all media files referenced in the API
- Smart pagination: Automatically detects and uses WordPress pagination headers (X-WP-TotalPages, X-WP-Total) for accurate progress tracking
- Progress tracking: Shows "page X of Y" progress when pagination headers are available
- SSL verification: Secure by default with configurable SSL verification
- Configurable: Customize timeout, page size, retry count, and more
- Type-safe: Full type hints for better IDE support and code quality
Installation
Production Installation
pip install --upgrade pip setuptools
pip install wparc
Development Installation
git clone https://github.com/ruarxive/wparc.git
cd wparc
pip install -e ".[dev]"
Python version
Python version 3.6 or greater is required.
Usage
Basic Commands
# Get help
wparc --help
# Ping a WordPress site
wparc ping example.com
# Dump all data from a WordPress site
wparc dump example.com
# Download media files (requires dump to be run first)
wparc getfiles example.com
Command Options
Ping Command
wparc ping <domain> [OPTIONS]
Options:
-v, --verbose Verbose output
--https Force HTTPS protocol
--no-verify-ssl Disable SSL certificate verification (not recommended)
--timeout INTEGER Request timeout in seconds (default: 360)
Example:
wparc ping example.com --https --verbose
Dump Command
wparc dump <domain> [OPTIONS]
Options:
-v, --verbose Verbose output
-a, --all Include unknown API routes (default: True)
--https Force HTTPS protocol
--no-verify-ssl Disable SSL certificate verification (not recommended)
--timeout INTEGER Request timeout in seconds (default: 360)
--page-size INTEGER Number of items per page (default: 100)
--retry-count INTEGER Number of retry attempts (default: 5)
Example:
wparc dump example.com --https --timeout 600 --page-size 50
Note: The dump command automatically uses WordPress pagination headers (X-WP-TotalPages and X-WP-Total) when available to show accurate progress like "Processing page 1 of 5". This provides better visibility into the extraction progress for large sites.
Getfiles Command
wparc getfiles <domain> [OPTIONS]
Options:
-v, --verbose Verbose output
--no-verify-ssl Disable SSL certificate verification (not recommended)
Example:
wparc getfiles example.com --verbose
Output Structure
After running wparc dump <domain>, the following structure is created:
<domain>/
├── data/
│ ├── wp-json.json # Main API index
│ ├── wp_v2_posts.jsonl # Posts data
│ ├── wp_v2_pages.jsonl # Pages data
│ ├── wp_v2_media.jsonl # Media metadata
│ └── ... # Other routes
└── files/ # Media files (after getfiles)
└── wp-content/
└── uploads/
└── ...
Development
Running Tests
pytest
Code Quality
# Format code
black wparc/
# Type checking
mypy wparc/
# Linting
flake8 wparc/
Troubleshooting
SSL Certificate Errors
If you encounter SSL certificate errors, you can temporarily disable verification:
wparc dump example.com --no-verify-ssl
Warning: This is not recommended for production use as it makes you vulnerable to man-in-the-middle attacks.
Timeout Errors
If requests are timing out, increase the timeout:
wparc dump example.com --timeout 600
Large Sites
For large WordPress sites, you may want to adjust the page size:
wparc dump example.com --page-size 50
The dump command automatically detects pagination information from WordPress API headers, so you'll see progress like "Processing page 1 of 10" when available. This helps you estimate completion time for large extractions.
Security
- SSL verification is enabled by default
- All file operations use secure context managers
- Command injection vulnerabilities have been fixed
- Proper error handling prevents information leakage
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
See LICENSE file for details.
Documentation
For detailed information about WordPress REST API endpoints, see WP_API_ENDPOINTS.md.
Changelog
See CHANGELOG.md for a list of changes.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file wparc-1.0.5.tar.gz.
File metadata
- Download URL: wparc-1.0.5.tar.gz
- Upload date:
- Size: 29.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
05ccb034b6553e229f316b5a21f805655e0a86af92282f73d984058ec63149b1
|
|
| MD5 |
8eef4696eca4da964dcc6fce4ac09369
|
|
| BLAKE2b-256 |
137a41d8852ed1816408643733c27bb20fb2205511f6103c8b2584ddb4ec3741
|
File details
Details for the file wparc-1.0.5-py2.py3-none-any.whl.
File metadata
- Download URL: wparc-1.0.5-py2.py3-none-any.whl
- Upload date:
- Size: 26.2 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
41488b66287aa7d8bba4e4b5674a572dab3acf124e650962e0be553682d9e5f1
|
|
| MD5 |
43df130295a4249782e231e73ec6a3ff
|
|
| BLAKE2b-256 |
3cc0539350a37db74b46e3c373fa547fd82fda7f66295bd92e4aac0261e93ab9
|