Python client for Adobe PDF to Word conversion using Adobe's online services
Project description
Adobe Helper
Adobe Helper is a Python library for converting PDF files to Word (DOCX) format using Adobe's online conversion services. It provides a clean, async API with automatic session management, rate limiting, and quota tracking.
⚠️ Current Status
This project is ~98% complete. The architecture, all modules, and examples are fully implemented and tested. However, API endpoint discovery is required before the library can perform actual conversions.
Recent Updates (2025-10-21)
✅ Multi-Tenant Architecture
- Automatic tenant discovery during session initialization
- Dynamic endpoint switching per session
- Support for multiple regions and tenant IDs
- Each session discovers its own numeric tenant ID from Adobe's servers
✅ Logging Enhancement
- Examples now include proper logging configuration
- Real-time visibility into conversion progress
- Better debugging and troubleshooting support
See docs/discovery/API_DISCOVERY.md for instructions on discovering Adobe's actual API endpoints using Chrome DevTools.
Features
✨ Easy to Use
- Simple async API with context manager support
- Automatic session management and rotation
- Built-in retry logic with exponential backoff
- Bypass local usage limits (mimics clearing browser data)
📊 Smart Management
- Optional usage tracking with daily limits
- Intelligent rate limiting with human-like delays
- Automatic session rotation for unlimited conversions
- Fresh session creation (like incognito mode)
🔒 Reliable
- Streaming upload/download for large files
- File integrity verification
- Comprehensive error handling
- Progress tracking support
🚀 Fast
- Async/await throughout
- HTTP/2 support via httpx
- Concurrent batch processing
Installation
Using uv (Recommended)
# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper
# Install with uv
uv sync --all-extras
Using pip
# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper
# Install in development mode
pip install -e .
Quick Start
Basic Usage
import asyncio
import logging
from pathlib import Path
from adobe import AdobePDFConverter
# Configure logging to see conversion progress
logging.basicConfig(
level=logging.INFO,
format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)
async def main():
# Convert a PDF to Word (bypasses local limits by default)
async with AdobePDFConverter(
bypass_local_limits=True # Mimics clearing browser data
) as converter:
output_file = await converter.convert_pdf_to_word(
Path("document.pdf")
)
print(f"Converted: {output_file}")
asyncio.run(main())
Batch Conversion
from adobe import AdobePDFConverter
async def batch_convert():
pdf_files = [
Path("doc1.pdf"),
Path("doc2.pdf"),
Path("doc3.pdf"),
]
async with AdobePDFConverter() as converter:
for pdf_file in pdf_files:
try:
output = await converter.convert_pdf_to_word(pdf_file)
print(f"✓ {pdf_file.name} -> {output.name}")
except Exception as e:
print(f"✗ {pdf_file.name}: {e}")
Advanced Configuration
from adobe import AdobePDFConverter
from pathlib import Path
async def advanced_convert():
# Custom configuration
converter = AdobePDFConverter(
session_dir=Path(".cache"), # Custom cache directory
use_session_rotation=True, # Enable session rotation
track_usage=True, # Track daily quota
enable_rate_limiting=True, # Rate limiting
)
try:
await converter.initialize()
# Convert with custom output path
output = await converter.convert_pdf_to_word(
Path("input.pdf"),
output_path=Path("output/converted.docx"),
)
# Check usage stats
usage = converter.get_usage_summary()
print(f"Daily usage: {usage['count']}/{usage['limit']}")
finally:
await converter.close()
Endpoint Discovery CLI
Use the bundled helper to capture endpoints and keep discovery files synced:
# Show available commands
python -m adobe.cli.api_discovery_helper --help
# Create or refresh the project discovery template
python -m adobe.cli.api_discovery_helper template
# Validate captured URLs and sync project ↔ user cache copies
python -m adobe.cli.api_discovery_helper update
# Installed entry point (after `pip install .`)
adobe-api-discovery checklist
See docs/discovery/API_DISCOVERY.md for the full walkthrough.
The helper stores discovered endpoints in ~/.adobe-helper by default, but will fall back to ./.adobe-helper (or the system temp directory) automatically when the home directory is not writable—useful for containerized or sandboxed environments.
Architecture
Core Components
adobe/
├── client.py # Main AdobePDFConverter class
├── auth.py # Session management
├── session_cycling.py # Anonymous session rotation
├── cookie_manager.py # Cookie persistence
├── upload.py # File upload handler
├── conversion.py # Conversion workflow manager
├── download.py # File download handler
├── rate_limiter.py # Rate limiting with backoff
├── usage_tracker.py # Free tier quota tracking
├── models.py # Pydantic data models
├── exceptions.py # Custom exceptions
├── constants.py # Configuration constants
├── urls.py # API endpoints
└── utils.py # Helper functions
Data Flow
PDF File → Upload → Conversion Job → Poll Status → Download DOCX
↓ ↓ ↓ ↓
Validate Create Job Wait/Poll Stream Download
Retry Track Status Adaptive Verify
Polling Integrity
Examples
See the examples/adobe/ directory for complete examples:
- basic_usage.py - Simple conversion with bypass enabled
- batch_convert.py - Sequential and concurrent batch processing
- advanced_usage.py - Advanced configuration and error handling
Legacy bypass/reset scripts now live under archive/docs/ for reference.
Bypassing Usage Limits
By default, the library now bypasses local usage tracking and relies on Adobe's server-side limits with automatic session rotation:
# Automatic session rotation (recommended for batch processing)
async with AdobePDFConverter(
bypass_local_limits=True, # Default: True
use_session_rotation=True, # Auto-rotate sessions
) as converter:
for pdf in pdf_files:
await converter.convert_pdf_to_word(pdf)
For more details, see BYPASS_LIMITS.md.
Quick reset: Call AdobePDFConverter.reset_session_data() (or use AdobePDFConverter.create_with_fresh_session()) to clear all local state; the legacy helper script now resides in archive/docs/.
API Discovery Required
⚠️ Important: Before this library can perform actual conversions, you need to discover Adobe's API endpoints using Chrome DevTools.
See docs/discovery/API_DISCOVERY.md for detailed instructions.
Discovered endpoint files are cached automatically: any discovered_endpoints.json found in docs/discovery/ or archive/discovery/ is copied into ~/.adobe-helper/ on first run, and a template is generated if missing.
Development
Setup Development Environment
# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh
# Clone and setup
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper
uv sync --all-extras --dev
Run Tests
# Run all tests
uv run pytest
# Run with coverage
uv run pytest --cov=adobe --cov-report=html
# Run specific test file
uv run pytest tests/test_models.py -v
Code Quality
# Format code
uv run black adobe/ tests/
# Lint code
uv run ruff check adobe/ tests/
# Type checking
uv run mypy adobe/
Project Status
✅ Completed (Phases 1-10)
- Project setup and architecture
- Data models with Pydantic validation
- Custom exception hierarchy
- Session management and rotation
- Cookie management
- Rate limiting with adaptive backoff
- Usage tracking
- File upload handler
- Conversion workflow manager
- File download handler
- Main client class
- Example scripts with logging
- Unit tests (30 tests, 100% pass rate)
- Documentation
- Multi-tenant architecture with automatic discovery ✨ NEW
- Dynamic endpoint switching per session ✨ NEW
🔄 Remaining
- API endpoint discovery (critical - see
docs/discovery/API_DISCOVERY.md) - Integration tests with real API
- CLI tool (optional)
- Browser automation fallback (optional)
Contributing
Contributions are welcome! Please:
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests
- Run code quality checks
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Disclaimer
This library is for legitimate use only. Please respect Adobe's Terms of Service and rate limits. The library includes built-in rate limiting and quota tracking to prevent abuse.
Acknowledgments
- Inspired by Adobe's online PDF conversion services
- Built with httpx, pydantic, and modern Python async patterns
- Developed using uv for fast dependency management
Support
- 📫 Issues: GitHub Issues
- 📖 Documentation: See
examples/andAGENTS.md - 💬 Discussions: GitHub Discussions
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file adobe_helper-1.1.1.tar.gz.
File metadata
- Download URL: adobe_helper-1.1.1.tar.gz
- Upload date:
- Size: 181.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e5560754893e389ae3fdff31a568a732048acf475a3fe6d663732c79b1040bb5
|
|
| MD5 |
7d866b73839df2bf91ac22f02745146b
|
|
| BLAKE2b-256 |
9d36ef976fd7382e2a4abea01f085cf583e2a12462a146675a05ff8494d53115
|
File details
Details for the file adobe_helper-1.1.1-py3-none-any.whl.
File metadata
- Download URL: adobe_helper-1.1.1-py3-none-any.whl
- Upload date:
- Size: 58.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: uv/0.9.4
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f00de709d9cd6a56c031ed760bb944b147dee42de5d56f977c482765c8b86152
|
|
| MD5 |
9a45c0bb2dc1813570b163744ba32e5a
|
|
| BLAKE2b-256 |
e70ff2cf1f23946430121058ab52ccf040921ed0a96220388f019b725fc4ee2c
|