Skip to main content

Python client for Adobe PDF to Word conversion using Adobe's online services

Project description

Adobe Helper

Python 3.11+ License: MIT Code style: black Ruff

Adobe Helper is a Python library for converting PDF files to Word (DOCX) format using Adobe's online conversion services. It provides a clean, async API with automatic session management, rate limiting, and quota tracking.

⚠️ Current Status

This project is ~98% complete. The architecture, all modules, and examples are fully implemented and tested. However, API endpoint discovery is required before the library can perform actual conversions.

Recent Updates (2025-10-21)

Multi-Tenant Architecture

  • Automatic tenant discovery during session initialization
  • Dynamic endpoint switching per session
  • Support for multiple regions and tenant IDs
  • Each session discovers its own numeric tenant ID from Adobe's servers

Logging Enhancement

  • Examples now include proper logging configuration
  • Real-time visibility into conversion progress
  • Better debugging and troubleshooting support

See docs/discovery/API_DISCOVERY.md for instructions on discovering Adobe's actual API endpoints using Chrome DevTools.

Features

Easy to Use

  • Simple async API with context manager support
  • Automatic session management and rotation
  • Built-in retry logic with exponential backoff
  • Bypass local usage limits (mimics clearing browser data)

📊 Smart Management

  • Optional usage tracking with daily limits
  • Intelligent rate limiting with human-like delays
  • Automatic session rotation for unlimited conversions
  • Fresh session creation (like incognito mode)

🔒 Reliable

  • Streaming upload/download for large files
  • File integrity verification
  • Comprehensive error handling
  • Progress tracking support

🚀 Fast

  • Async/await throughout
  • HTTP/2 support via httpx
  • Concurrent batch processing

Installation

Using uv (Recommended)

# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper

# Install with uv
uv sync --all-extras

Using pip

# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper

# Install in development mode
pip install -e .

Quick Start

Basic Usage

import asyncio
import logging
from pathlib import Path
from adobe import AdobePDFConverter

# Configure logging to see conversion progress
logging.basicConfig(
    level=logging.INFO,
    format="%(asctime)s - %(name)s - %(levelname)s - %(message)s",
)

async def main():
    # Convert a PDF to Word (bypasses local limits by default)
    async with AdobePDFConverter(
        bypass_local_limits=True  # Mimics clearing browser data
    ) as converter:
        output_file = await converter.convert_pdf_to_word(
            Path("document.pdf")
        )
        print(f"Converted: {output_file}")

asyncio.run(main())

Batch Conversion

from adobe import AdobePDFConverter

async def batch_convert():
    pdf_files = [
        Path("doc1.pdf"),
        Path("doc2.pdf"),
        Path("doc3.pdf"),
    ]

    async with AdobePDFConverter() as converter:
        for pdf_file in pdf_files:
            try:
                output = await converter.convert_pdf_to_word(pdf_file)
                print(f"✓ {pdf_file.name} -> {output.name}")
            except Exception as e:
                print(f"✗ {pdf_file.name}: {e}")

Advanced Configuration

from adobe import AdobePDFConverter
from pathlib import Path

async def advanced_convert():
    # Custom configuration
    converter = AdobePDFConverter(
        session_dir=Path(".cache"),      # Custom cache directory
        use_session_rotation=True,       # Enable session rotation
        track_usage=True,                # Track daily quota
        enable_rate_limiting=True,       # Rate limiting
    )

    try:
        await converter.initialize()

        # Convert with custom output path
        output = await converter.convert_pdf_to_word(
            Path("input.pdf"),
            output_path=Path("output/converted.docx"),
        )

        # Check usage stats
        usage = converter.get_usage_summary()
        print(f"Daily usage: {usage['count']}/{usage['limit']}")

    finally:
        await converter.close()

Endpoint Discovery CLI

Use the bundled helper to capture endpoints and keep discovery files synced:

# Show available commands
python -m adobe.cli.api_discovery_helper --help

# Create or refresh the project discovery template
python -m adobe.cli.api_discovery_helper template

# Validate captured URLs and sync project ↔ user cache copies
python -m adobe.cli.api_discovery_helper update

# Installed entry point (after `pip install .`)
adobe-api-discovery checklist

See docs/discovery/API_DISCOVERY.md for the full walkthrough.

The helper stores discovered endpoints in ~/.adobe-helper by default, but will fall back to ./.adobe-helper (or the system temp directory) automatically when the home directory is not writable—useful for containerized or sandboxed environments.

Architecture

Core Components

adobe/
├── client.py              # Main AdobePDFConverter class
├── auth.py                # Session management
├── session_cycling.py     # Anonymous session rotation
├── cookie_manager.py      # Cookie persistence
├── upload.py              # File upload handler
├── conversion.py          # Conversion workflow manager
├── download.py            # File download handler
├── rate_limiter.py        # Rate limiting with backoff
├── usage_tracker.py       # Free tier quota tracking
├── models.py              # Pydantic data models
├── exceptions.py          # Custom exceptions
├── constants.py           # Configuration constants
├── urls.py                # API endpoints
└── utils.py               # Helper functions

Data Flow

PDF File → Upload → Conversion Job → Poll Status → Download DOCX
           ↓         ↓                 ↓             ↓
        Validate  Create Job      Wait/Poll    Stream Download
        Retry     Track Status    Adaptive      Verify
                                  Polling       Integrity

Examples

See the examples/adobe/ directory for complete examples:

  • basic_usage.py - Simple conversion with bypass enabled
  • batch_convert.py - Sequential and concurrent batch processing
  • advanced_usage.py - Advanced configuration and error handling

Legacy bypass/reset scripts now live under archive/docs/ for reference.

Bypassing Usage Limits

By default, the library now bypasses local usage tracking and relies on Adobe's server-side limits with automatic session rotation:

# Automatic session rotation (recommended for batch processing)
async with AdobePDFConverter(
    bypass_local_limits=True,  # Default: True
    use_session_rotation=True,  # Auto-rotate sessions
) as converter:
    for pdf in pdf_files:
        await converter.convert_pdf_to_word(pdf)

For more details, see BYPASS_LIMITS.md.

Quick reset: Call AdobePDFConverter.reset_session_data() (or use AdobePDFConverter.create_with_fresh_session()) to clear all local state; the legacy helper script now resides in archive/docs/.

API Discovery Required

⚠️ Important: Before this library can perform actual conversions, you need to discover Adobe's API endpoints using Chrome DevTools.

See docs/discovery/API_DISCOVERY.md for detailed instructions.

Discovered endpoint files are cached automatically: any discovered_endpoints.json found in docs/discovery/ or archive/discovery/ is copied into ~/.adobe-helper/ on first run, and a template is generated if missing.

Development

Setup Development Environment

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper
uv sync --all-extras --dev

Run Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=adobe --cov-report=html

# Run specific test file
uv run pytest tests/test_models.py -v

Code Quality

# Format code
uv run black adobe/ tests/

# Lint code
uv run ruff check adobe/ tests/

# Type checking
uv run mypy adobe/

Project Status

✅ Completed (Phases 1-10)

  • Project setup and architecture
  • Data models with Pydantic validation
  • Custom exception hierarchy
  • Session management and rotation
  • Cookie management
  • Rate limiting with adaptive backoff
  • Usage tracking
  • File upload handler
  • Conversion workflow manager
  • File download handler
  • Main client class
  • Example scripts with logging
  • Unit tests (30 tests, 100% pass rate)
  • Documentation
  • Multi-tenant architecture with automatic discovery ✨ NEW
  • Dynamic endpoint switching per session ✨ NEW

🔄 Remaining

  • API endpoint discovery (critical - see docs/discovery/API_DISCOVERY.md)
  • Integration tests with real API
  • CLI tool (optional)
  • Browser automation fallback (optional)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run code quality checks
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This library is for legitimate use only. Please respect Adobe's Terms of Service and rate limits. The library includes built-in rate limiting and quota tracking to prevent abuse.

Acknowledgments

  • Inspired by Adobe's online PDF conversion services
  • Built with httpx, pydantic, and modern Python async patterns
  • Developed using uv for fast dependency management

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adobe_helper-1.1.2.tar.gz (181.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adobe_helper-1.1.2-py3-none-any.whl (58.9 kB view details)

Uploaded Python 3

File details

Details for the file adobe_helper-1.1.2.tar.gz.

File metadata

  • Download URL: adobe_helper-1.1.2.tar.gz
  • Upload date:
  • Size: 181.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.4

File hashes

Hashes for adobe_helper-1.1.2.tar.gz
Algorithm Hash digest
SHA256 38118d7b456ad61ad5b14f87c25aa9ce9598183a9d5ba99062deff9d3f82ea18
MD5 2490aae2f975fd329eb0eb8f3f113928
BLAKE2b-256 02d8ad1ca388ad3a9c66bffb9c280f54cdc1f1835895ce9d0b974dee41f09137

See more details on using hashes here.

File details

Details for the file adobe_helper-1.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for adobe_helper-1.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 ecf6de21a4ffa54a27d085ff8c7422c4cec6d68723195f9d608c4e6c14a9a191
MD5 e115bd38e2c6466c18ba91d5dba264f9
BLAKE2b-256 fdb5de161a3f851911f6e38bdefd00a89334a8e6f16431e39d3ba79d1ae3c2bc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page