Skip to main content

Python client for Adobe PDF to Word conversion using Adobe's online services

Project description

Adobe Helper

Python 3.11+ License: MIT Code style: black Ruff

Adobe Helper is a Python library for converting PDF files to Word (DOCX) format using Adobe's online conversion services. It provides a clean, async API with automatic session management, rate limiting, and quota tracking.

⚠️ Current Status

This project is ~98% complete. The architecture, all modules, and examples are fully implemented and tested. However, API endpoint discovery is required before the library can perform actual conversions.

See docs/discovery/API_DISCOVERY.md for instructions on discovering Adobe's actual API endpoints using Chrome DevTools.

Features

Easy to Use

  • Simple async API with context manager support
  • Automatic session management and rotation
  • Built-in retry logic with exponential backoff
  • Bypass local usage limits (mimics clearing browser data)

📊 Smart Management

  • Optional usage tracking with daily limits
  • Intelligent rate limiting with human-like delays
  • Automatic session rotation for unlimited conversions
  • Fresh session creation (like incognito mode)

🔒 Reliable

  • Streaming upload/download for large files
  • File integrity verification
  • Comprehensive error handling
  • Progress tracking support

🚀 Fast

  • Async/await throughout
  • HTTP/2 support via httpx
  • Concurrent batch processing

Installation

Using uv (Recommended)

# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper

# Install with uv
uv sync --all-extras

Using pip

# Clone the repository
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper

# Install in development mode
pip install -e .

Quick Start

Basic Usage

import asyncio
from pathlib import Path
from adobe import AdobePDFConverter

async def main():
    # Convert a PDF to Word (bypasses local limits by default)
    async with AdobePDFConverter(
        bypass_local_limits=True  # Mimics clearing browser data
    ) as converter:
        output_file = await converter.convert_pdf_to_word(
            Path("document.pdf")
        )
        print(f"Converted: {output_file}")

asyncio.run(main())

Batch Conversion

from adobe import AdobePDFConverter

async def batch_convert():
    pdf_files = [
        Path("doc1.pdf"),
        Path("doc2.pdf"),
        Path("doc3.pdf"),
    ]

    async with AdobePDFConverter() as converter:
        for pdf_file in pdf_files:
            try:
                output = await converter.convert_pdf_to_word(pdf_file)
                print(f"✓ {pdf_file.name} -> {output.name}")
            except Exception as e:
                print(f"✗ {pdf_file.name}: {e}")

Advanced Configuration

from adobe import AdobePDFConverter
from pathlib import Path

async def advanced_convert():
    # Custom configuration
    converter = AdobePDFConverter(
        session_dir=Path(".cache"),      # Custom cache directory
        use_session_rotation=True,       # Enable session rotation
        track_usage=True,                # Track daily quota
        enable_rate_limiting=True,       # Rate limiting
    )

    try:
        await converter.initialize()

        # Convert with custom output path
        output = await converter.convert_pdf_to_word(
            Path("input.pdf"),
            output_path=Path("output/converted.docx"),
        )

        # Check usage stats
        usage = converter.get_usage_summary()
        print(f"Daily usage: {usage['count']}/{usage['limit']}")

    finally:
        await converter.close()

Endpoint Discovery CLI

Use the bundled helper to capture endpoints and keep discovery files synced:

# Show available commands
python -m adobe.cli.api_discovery_helper --help

# Create or refresh the project discovery template
python -m adobe.cli.api_discovery_helper template

# Validate captured URLs and sync project ↔ user cache copies
python -m adobe.cli.api_discovery_helper update

# Installed entry point (after `pip install .`)
adobe-api-discovery checklist

See docs/discovery/API_DISCOVERY.md for the full walkthrough.

The helper stores discovered endpoints in ~/.adobe-helper by default, but will fall back to ./.adobe-helper (or the system temp directory) automatically when the home directory is not writable—useful for containerized or sandboxed environments.

Architecture

Core Components

adobe/
├── client.py              # Main AdobePDFConverter class
├── auth.py                # Session management
├── session_cycling.py     # Anonymous session rotation
├── cookie_manager.py      # Cookie persistence
├── upload.py              # File upload handler
├── conversion.py          # Conversion workflow manager
├── download.py            # File download handler
├── rate_limiter.py        # Rate limiting with backoff
├── usage_tracker.py       # Free tier quota tracking
├── models.py              # Pydantic data models
├── exceptions.py          # Custom exceptions
├── constants.py           # Configuration constants
├── urls.py                # API endpoints
└── utils.py               # Helper functions

Data Flow

PDF File → Upload → Conversion Job → Poll Status → Download DOCX
           ↓         ↓                 ↓             ↓
        Validate  Create Job      Wait/Poll    Stream Download
        Retry     Track Status    Adaptive      Verify
                                  Polling       Integrity

Examples

See the examples/adobe/ directory for complete examples:

  • basic_usage.py - Simple conversion with bypass enabled
  • batch_convert.py - Sequential and concurrent batch processing
  • advanced_usage.py - Advanced configuration and error handling

Legacy bypass/reset scripts now live under archive/docs/ for reference.

Bypassing Usage Limits

By default, the library now bypasses local usage tracking and relies on Adobe's server-side limits with automatic session rotation:

# Automatic session rotation (recommended for batch processing)
async with AdobePDFConverter(
    bypass_local_limits=True,  # Default: True
    use_session_rotation=True,  # Auto-rotate sessions
) as converter:
    for pdf in pdf_files:
        await converter.convert_pdf_to_word(pdf)

For more details, see BYPASS_LIMITS.md.

Quick reset: Call AdobePDFConverter.reset_session_data() (or use AdobePDFConverter.create_with_fresh_session()) to clear all local state; the legacy helper script now resides in archive/docs/.

API Discovery Required

⚠️ Important: Before this library can perform actual conversions, you need to discover Adobe's API endpoints using Chrome DevTools.

See docs/discovery/API_DISCOVERY.md for detailed instructions.

Discovered endpoint files are cached automatically: any discovered_endpoints.json found in docs/discovery/ or archive/discovery/ is copied into ~/.adobe-helper/ on first run, and a template is generated if missing.

Development

Setup Development Environment

# Install UV (if not already installed)
curl -LsSf https://astral.sh/uv/install.sh | sh

# Clone and setup
git clone https://github.com/karlorz/adobe-helper.git
cd adobe-helper
uv sync --all-extras --dev

Run Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=adobe --cov-report=html

# Run specific test file
uv run pytest tests/test_models.py -v

Code Quality

# Format code
uv run black adobe/ tests/

# Lint code
uv run ruff check adobe/ tests/

# Type checking
uv run mypy adobe/

Project Status

✅ Completed (Phases 1-10)

  • Project setup and architecture
  • Data models with Pydantic validation
  • Custom exception hierarchy
  • Session management and rotation
  • Cookie management
  • Rate limiting with adaptive backoff
  • Usage tracking
  • File upload handler
  • Conversion workflow manager
  • File download handler
  • Main client class
  • Example scripts
  • Unit tests (30 tests, 100% pass rate)
  • Documentation

🔄 Remaining

  • API endpoint discovery (critical - see docs/discovery/API_DISCOVERY.md)
  • Integration tests with real API
  • CLI tool (optional)
  • Browser automation fallback (optional)

Contributing

Contributions are welcome! Please:

  1. Fork the repository
  2. Create a feature branch
  3. Make your changes
  4. Add tests
  5. Run code quality checks
  6. Submit a pull request

License

This project is licensed under the MIT License - see the LICENSE file for details.

Disclaimer

This library is for legitimate use only. Please respect Adobe's Terms of Service and rate limits. The library includes built-in rate limiting and quota tracking to prevent abuse.

Acknowledgments

  • Inspired by Adobe's online PDF conversion services
  • Built with httpx, pydantic, and modern Python async patterns
  • Developed using uv for fast dependency management

Support

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adobe_helper-1.0.6.tar.gz (110.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

adobe_helper-1.0.6-py3-none-any.whl (56.3 kB view details)

Uploaded Python 3

File details

Details for the file adobe_helper-1.0.6.tar.gz.

File metadata

  • Download URL: adobe_helper-1.0.6.tar.gz
  • Upload date:
  • Size: 110.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.9.3

File hashes

Hashes for adobe_helper-1.0.6.tar.gz
Algorithm Hash digest
SHA256 686daf425de6627233631336d53dfa3056a47cd582ab01af87c678e8dd57cc32
MD5 9e391b1c995d78961cdf9e3822a8161c
BLAKE2b-256 b3ec3d165184d1fdf704121fd510d0f303408b7243341e472b759a836a9635bd

See more details on using hashes here.

File details

Details for the file adobe_helper-1.0.6-py3-none-any.whl.

File metadata

File hashes

Hashes for adobe_helper-1.0.6-py3-none-any.whl
Algorithm Hash digest
SHA256 9bc84691e29dc3f21b771d0c2b8a11fd879adb952e179d7cd6468c749a9c9bff
MD5 40439b78062fd31106742b47ab7e1a9a
BLAKE2b-256 714d21775dc1946fa6d35d2c699ab2026cb0d9ec08c087bb6233818c235a4aca

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page