Your personal, authenticated browser for Playwright, ready in one line of code.
Project description
PlaywrightAuthor
Your personal, authenticated browser for Playwright, ready in one line of code.
PlaywrightAuthor is a convenience package for Microsoft Playwright. It handles the tedious parts of browser automation: finding and launching a Chrome for Testing instance, keeping it authenticated with your user profile, and connecting Playwright to it. All you need to do is instantiate a class, and you get a ready-to-use Browser object. This lets you focus on writing your automation script, not on the boilerplate.
The core idea is to let you do this:
from playwrightauthor import Browser
with Browser() as browser:
# you get a standard Playwright browser object
# that is already connected to a logged-in browser
page = browser.new_page()
page.goto("https://github.com/me")
print(f"Welcome, {page.locator('.user-profile-name').inner_text()}!")
Contents
- Features
- Quick‑start
- Developer workflow
- Package layout – file tree, code snippet, explanation & rationale for every file
- Contributing
- License
Features
✨ Zero-Configuration Browser Automation
- Automatic Chrome Management: Discovers, installs, and launches Chrome for Testing with remote debugging enabled
- Persistent Authentication: Maintains user sessions across script runs using persistent browser profiles
- Cross-Platform Support: Works seamlessly on Windows, macOS, and Linux with optimized Chrome discovery
🚀 Performance & Reliability
- Lazy Loading: Optimized startup time with on-demand module imports
- Connection Health Monitoring: Comprehensive diagnostics and automatic retry logic
- State Management: Intelligent caching of browser paths and configuration for faster subsequent runs
- Error Recovery: Graceful handling of browser crashes with automatic restart capabilities
🛠 Developer Experience
- Simple API: Clean
Browser()andAsyncBrowser()context managers - Rich CLI Interface: Comprehensive command-line tools for browser and profile management
- Type Safety: 100% type-hinted codebase with full mypy compatibility
- Comprehensive Testing: Extensive test suite with CI/CD pipeline on multiple platforms
📋 Advanced Management
- Profile System: Create, manage, and switch between multiple browser profiles
- Configuration Management: Environment variable support and flexible configuration options
- Diagnostic Tools: Built-in troubleshooting and system health checks
- JSON Output: Machine-readable output formats for automation and scripting
Installation
# Install PlaywrightAuthor
pip install playwrightauthor
# Install Playwright browsers (required)
playwright install chromium
Quick start
# Create your script file
cat > example.py << 'EOF'
from playwrightauthor import Browser
with Browser() as browser:
page = browser.new_page()
page.goto("https://github.com")
print(f"Page title: {page.title()}")
EOF
# Run your script
python example.py
Example myscript.py:
from playwrightauthor import Browser, AsyncBrowser
import asyncio
# Synchronous API
print("--- Running Sync Example ---")
with Browser(verbose=True) as browser:
page = browser.new_page()
page.goto("https://github.com")
print(f"Page title: {page.title()}")
# Asynchronous API
async def main():
print("\n--- Running Async Example ---")
async with AsyncBrowser(verbose=True) as browser:
page = await browser.new_page()
await page.goto("https://duckduckgo.com")
print(f"Page title: {await page.title()}")
if __name__ == "__main__":
asyncio.run(main())
Common Patterns
Authentication Workflow
The most common use case is automating authenticated services. PlaywrightAuthor makes this seamless by maintaining persistent login sessions:
from playwrightauthor import Browser
# First run: You'll need to manually log in
with Browser(profile="work") as browser:
page = browser.new_page()
page.goto("https://mail.google.com")
# If not logged in, the page will show the login screen
# Complete the login manually in the browser window
# PlaywrightAuthor will save the session for future runs
print(f"Logged in as: {page.locator('[data-testid=user-email]').inner_text()}")
# Subsequent runs: Automatic authentication
with Browser(profile="work") as browser:
page = browser.new_page()
page.goto("https://mail.google.com")
# You're automatically logged in!
inbox_count = page.locator('[data-testid=inbox-count]').inner_text()
print(f"You have {inbox_count} unread emails")
Error Handling and Retry Pattern
For production automation, implement robust error handling:
from playwrightauthor import Browser
from playwright.sync_api import TimeoutError
import time
def scrape_with_retry(url, max_retries=3):
"""Robust scraping with automatic retry and error handling."""
for attempt in range(max_retries):
try:
with Browser(verbose=attempt > 0) as browser: # Enable logging on retries
page = browser.new_page()
# Set reasonable timeouts
page.set_default_timeout(30000) # 30 seconds
page.goto(url)
# Wait for content to load
page.wait_for_selector('[data-testid=content]', timeout=10000)
title = page.title()
content = page.locator('[data-testid=content]').inner_text()
return {"title": title, "content": content}
except TimeoutError:
print(f"Attempt {attempt + 1} timed out, retrying...")
if attempt < max_retries - 1:
time.sleep(2 ** attempt) # Exponential backoff
continue
except Exception as e:
print(f"Attempt {attempt + 1} failed: {e}")
if attempt < max_retries - 1:
time.sleep(2 ** attempt)
continue
raise Exception(f"Failed to scrape {url} after {max_retries} attempts")
# Usage
try:
data = scrape_with_retry("https://example.com")
print(f"Successfully scraped: {data['title']}")
except Exception as e:
print(f"Scraping failed: {e}")
Profile Management for Multiple Accounts
Use profiles to manage multiple accounts or environments:
from playwrightauthor import Browser
# Define your environments
profiles = {
"work": "work@company.com",
"personal": "me@gmail.com",
"testing": "test@example.com"
}
def check_email_for_all_accounts():
"""Check email counts across all accounts."""
results = {}
for profile_name, email in profiles.items():
try:
with Browser(profile=profile_name) as browser:
page = browser.new_page()
page.goto("https://mail.google.com")
# Each profile maintains its own authentication
unread_count = page.locator('[aria-label="Inbox"]').get_attribute('data-count')
results[email] = int(unread_count or 0)
except Exception as e:
print(f"Failed to check {email}: {e}")
results[email] = None
return results
# Usage
email_counts = check_email_for_all_accounts()
for email, count in email_counts.items():
if count is not None:
print(f"{email}: {count} unread emails")
else:
print(f"{email}: Failed to check")
Interactive Development with REPL
Use the interactive REPL for development and debugging:
# Start the interactive REPL
python -m playwrightauthor repl
# In the REPL, you can interactively explore:
>>> page = browser.new_page()
>>> page.goto("https://github.com")
>>> page.title()
'GitHub: Let's build from here · GitHub'
>>> # Test selectors interactively
>>> page.locator('h1').inner_text()
'Let's build from here'
>>> # Run CLI commands without leaving REPL
>>> !status
Browser is ready.
- Path: /Users/user/.playwrightauthor/chrome/chrome
- User Data: /Users/user/.playwrightauthor/profiles/default
>>> # Switch profiles on the fly
>>> exit() # Exit current browser
>>> browser = Browser(profile="work").__enter__()
>>> page = browser.new_page()
>>> page.goto("https://mail.google.com")
Async for High Performance
Use AsyncBrowser for concurrent operations:
import asyncio
from playwrightauthor import AsyncBrowser
async def scrape_multiple_pages(urls):
"""Scrape multiple pages concurrently."""
async def scrape_single_page(url):
async with AsyncBrowser() as browser:
page = await browser.new_page()
await page.goto(url)
title = await page.title()
return {"url": url, "title": title}
# Run up to 5 concurrent scraping tasks
semaphore = asyncio.Semaphore(5)
async def limited_scrape(url):
async with semaphore:
return await scrape_single_page(url)
tasks = [limited_scrape(url) for url in urls]
results = await asyncio.gather(*tasks, return_exceptions=True)
return results
# Usage
urls = [
"https://github.com",
"https://stackoverflow.com",
"https://python.org",
"https://docs.python.org",
"https://pypi.org"
]
async def main():
results = await scrape_multiple_pages(urls)
for result in results:
if isinstance(result, dict):
print(f"{result['url']}: {result['title']}")
else:
print(f"Error: {result}")
asyncio.run(main())
Quick Reference
Most Common Commands:
# Check if everything is working
python -m playwrightauthor status
# Start interactive development
python -m playwrightauthor repl
# Fix connection issues
python -m playwrightauthor diagnose
# Clean slate (removes all data)
python -m playwrightauthor clear-cache
Most Common Code Patterns:
# Basic automation
with Browser() as browser:
page = browser.new_page()
page.goto("https://example.com")
# Multiple accounts
with Browser(profile="work") as browser:
# Work automation
# High performance
async with AsyncBrowser() as browser:
# Async automation
Best Practices
Resource Management and Cleanup
Always use context managers to ensure proper resource cleanup:
from playwrightauthor import Browser
# ✅ GOOD: Context manager ensures cleanup
with Browser() as browser:
page = browser.new_page()
page.goto("https://example.com")
# Browser automatically cleaned up
# ❌ BAD: Manual cleanup required
browser = Browser().__enter__()
page = browser.new_page()
page.goto("https://example.com")
# Memory leak! Browser not cleaned up
Page Lifecycle Management:
with Browser() as browser:
# Create pages as needed
page1 = browser.new_page()
page2 = browser.new_page()
# Close pages when done to free memory
page1.close()
page2.close()
# Or use page context managers
page = browser.new_page()
try:
page.goto("https://example.com")
# Work with page
finally:
page.close()
Performance Optimization
For Large-Scale Automation:
from playwrightauthor import AsyncBrowser
import asyncio
async def optimize_for_performance():
"""High-performance automation patterns."""
# Use connection pooling for multiple operations
async with AsyncBrowser() as browser:
# Reuse browser context across multiple pages
context = await browser.new_context(
viewport={"width": 1280, "height": 720},
user_agent="Your-Bot/1.0"
)
# Concurrent page processing with rate limiting
semaphore = asyncio.Semaphore(5) # Max 5 concurrent pages
async def process_url(url):
async with semaphore:
page = await context.new_page()
try:
await page.goto(url, wait_until="domcontentloaded")
# Process page content
title = await page.title()
return {"url": url, "title": title}
finally:
await page.close()
# Process multiple URLs concurrently
urls = ["https://example1.com", "https://example2.com", "https://example3.com"]
results = await asyncio.gather(*[process_url(url) for url in urls])
await context.close()
return results
# Run the optimized automation
results = asyncio.run(optimize_for_performance())
Memory Management:
from playwrightauthor import Browser
def memory_efficient_scraping(urls):
"""Process many URLs without memory leaks."""
results = []
with Browser() as browser:
# Process in batches to control memory usage
batch_size = 10
for i in range(0, len(urls), batch_size):
batch = urls[i:i + batch_size]
for url in batch:
page = browser.new_page()
try:
page.goto(url, timeout=30000)
results.append({
"url": url,
"title": page.title(),
"status": "success"
})
except Exception as e:
results.append({
"url": url,
"error": str(e),
"status": "failed"
})
finally:
page.close() # Critical: free page memory
return results
Security Considerations
Profile and Credential Management:
from playwrightauthor import Browser
from pathlib import Path
import os
def secure_automation_setup():
"""Security best practices for browser automation."""
# Use dedicated profiles for different security contexts
profiles = {
"production": "prod-automation",
"staging": "staging-test",
"development": "dev-local"
}
environment = os.getenv("ENVIRONMENT", "development")
profile_name = profiles.get(environment, "default")
# Use environment-specific configuration
with Browser(profile=profile_name, verbose=False) as browser:
page = browser.new_page()
# Set security headers if needed
page.set_extra_http_headers({
"User-Agent": "Company-Automation/1.0",
"Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,*/*;q=0.8"
})
# Navigate to secure endpoints
page.goto("https://secure-api.company.com")
return page.content()
# Environment-based configuration
def get_secure_config():
"""Load configuration securely from environment."""
return {
"timeout": int(os.getenv("AUTOMATION_TIMEOUT", "30000")),
"headless": os.getenv("AUTOMATION_HEADLESS", "false").lower() == "true",
"profile": os.getenv("AUTOMATION_PROFILE", "default")
}
Sensitive Data Handling:
from playwrightauthor import Browser
import logging
# Configure logging to avoid sensitive data leaks
logging.basicConfig(
level=logging.INFO,
format='%(asctime)s - %(levelname)s - %(message)s',
handlers=[
logging.FileHandler('/var/log/automation.log'),
logging.StreamHandler()
]
)
def secure_login_automation():
"""Handle authentication securely."""
with Browser(profile="secure-profile", verbose=False) as browser:
page = browser.new_page()
# Navigate to login page
page.goto("https://app.example.com/login")
# Use environment variables for credentials (never hardcode)
username = os.getenv("APP_USERNAME")
password = os.getenv("APP_PASSWORD")
if not username or not password:
raise ValueError("Credentials not found in environment variables")
# Fill credentials (never log sensitive data)
page.fill('[name="username"]', username)
page.fill('[name="password"]', password)
# Log non-sensitive information only
logging.info("Attempting login for user authentication")
page.click('[type="submit"]')
page.wait_for_url("**/dashboard")
logging.info("Authentication successful")
return page
Configuration Management
Production Configuration:
from playwrightauthor.config import PlaywrightAuthorConfig, BrowserConfig, NetworkConfig, LoggingConfig
from pathlib import Path
def create_production_config():
"""Production-ready configuration."""
return PlaywrightAuthorConfig(
browser=BrowserConfig(
headless=True, # No UI in production
timeout=45000, # Longer timeout for stability
viewport_width=1920,
viewport_height=1080,
args=[
"--no-sandbox", # Required in containers
"--disable-dev-shm-usage", # Prevent memory issues
"--disable-gpu", # Not needed in headless
]
),
network=NetworkConfig(
retry_attempts=5, # More retries for reliability
download_timeout=600, # 10 minutes for large downloads
exponential_backoff=True,
proxy=os.getenv("HTTPS_PROXY") # Corporate proxy support
),
logging=LoggingConfig(
verbose=False, # Reduce log noise
log_level="INFO",
log_file=Path("/var/log/playwrightauthor.log"),
log_format="{time:YYYY-MM-DD HH:mm:ss} | {level} | {message}"
),
enable_lazy_loading=True, # Faster startup
default_profile="production"
)
# Apply production configuration
config = create_production_config()
from playwrightauthor.config import save_config
save_config(config)
Environment Variables Setup:
# Production environment setup
export PLAYWRIGHTAUTHOR_HEADLESS=true
export PLAYWRIGHTAUTHOR_TIMEOUT=45000
export PLAYWRIGHTAUTHOR_VERBOSE=false
export PLAYWRIGHTAUTHOR_LOG_LEVEL=INFO
export PLAYWRIGHTAUTHOR_RETRY_ATTEMPTS=5
# Credentials (never hardcode these)
export APP_USERNAME=your-automation-user
export APP_PASSWORD=secure-password-from-secrets-manager
# Network configuration
export HTTPS_PROXY=http://proxy.company.com:8080
export PLAYWRIGHTAUTHOR_PROXY=http://proxy.company.com:8080
Error Handling Best Practices
Comprehensive Error Handling:
from playwrightauthor import Browser
from playwright.sync_api import TimeoutError, Error as PlaywrightError
import logging
import time
def robust_automation_with_error_handling():
"""Production-grade error handling patterns."""
max_retries = 3
base_delay = 1.0
for attempt in range(max_retries):
try:
with Browser(verbose=attempt > 0) as browser: # Enable logging on retries
page = browser.new_page()
# Set reasonable timeouts
page.set_default_timeout(30000)
# Navigate with error handling
try:
page.goto("https://example.com", wait_until="networkidle")
except TimeoutError:
logging.warning(f"Page load timeout on attempt {attempt + 1}")
if attempt < max_retries - 1:
continue
raise
# Wait for specific elements with error handling
try:
page.wait_for_selector('[data-testid="content"]', timeout=10000)
except TimeoutError:
logging.error("Required content not found on page")
# Take screenshot for debugging
page.screenshot(path=f"error-{int(time.time())}.png")
raise
# Extract data with validation
title = page.title()
if not title:
raise ValueError("Page title is empty")
content = page.locator('[data-testid="content"]').inner_text()
if not content.strip():
raise ValueError("Page content is empty")
return {"title": title, "content": content}
except PlaywrightError as e:
logging.error(f"Playwright error on attempt {attempt + 1}: {e}")
if attempt < max_retries - 1:
delay = base_delay * (2 ** attempt) # Exponential backoff
logging.info(f"Retrying in {delay} seconds...")
time.sleep(delay)
continue
raise
except Exception as e:
logging.error(f"Unexpected error on attempt {attempt + 1}: {e}")
if attempt < max_retries - 1:
time.sleep(base_delay)
continue
raise
raise Exception(f"Failed after {max_retries} attempts")
Command-Line Interface
PlaywrightAuthor comes with a comprehensive command-line interface for managing browsers, profiles, and diagnostics.
Browser Management
# Check browser status and launch if needed
python -m playwrightauthor status
# Clear browser cache and user data
python -m playwrightauthor clear-cache
# Run comprehensive diagnostics
python -m playwrightauthor diagnose
Profile Management
# List all browser profiles
python -m playwrightauthor profile list
# Create a new profile
python -m playwrightauthor profile create myprofile
# Show profile details
python -m playwrightauthor profile show myprofile
# Delete a profile
python -m playwrightauthor profile delete myprofile
# Clear all profiles
python -m playwrightauthor profile clear
Configuration
# Show current configuration
python -m playwrightauthor config show
# Show version and system information
python -m playwrightauthor version
All commands support --json output format and --verbose for detailed logging.
Developer workflow
-
Read
WORK.md&PLAN.mdbefore touching code. -
Iterate in minimal, self‑contained commits.
-
After Python changes run:
fd -e py -x uvx autoflake -i {}; \ fd -e py -x uvx pyupgrade --py312-plus {}; \ fd -e py -x uvx ruff check --output-format=github --fix --unsafe-fixes {}; \ fd -e py -x uvx ruff format --respect-gitignore --target-version py312 {}; \ python -m pytest
-
Update
CHANGELOG.md, tick items inTODO.md, push. -
Always finish a work session with “Wait, but” → reflect → refine → push again.
Package Architecture
PlaywrightAuthor follows modern Python packaging standards with a clean src/ layout and comprehensive testing.
src/playwrightauthor/
├── __init__.py # Public API exports (Browser, AsyncBrowser)
├── __main__.py # CLI entry point
├── author.py # Core Browser context managers
├── browser_manager.py # Legacy browser management (compatibility)
├── cli.py # Fire-powered CLI with rich output
├── config.py # Configuration management system
├── connection.py # Connection health and diagnostics
├── exceptions.py # Custom exception classes
├── lazy_imports.py # Performance optimization for imports
├── onboarding.py # User authentication guidance
├── state_manager.py # Persistent state management
├── typing.py # Type definitions and protocols
├── browser/ # Modular browser management
│ ├── __init__.py
│ ├── finder.py # Cross-platform Chrome discovery
│ ├── installer.py # Chrome for Testing installation
│ ├── launcher.py # Browser process launching
│ └── process.py # Process management and control
├── templates/
│ └── onboarding.html # User guidance interface
└── utils/
├── logger.py # Loguru-based logging configuration
└── paths.py # Cross-platform path management
tests/ # Comprehensive test suite
├── test_author.py # Core functionality tests
├── test_benchmark.py # Performance benchmarks
├── test_integration.py # Integration tests
├── test_platform_specific.py # Platform-specific tests
└── test_utils.py # Utility function tests
Key Components
Core API
The library exposes a minimal, clean API through two main classes:
Browser()- Synchronous context managerAsyncBrowser()- Asynchronous context manager
Both provide identical functionality and return standard Playwright browser objects.
Browser Management
- Automatic Discovery: Finds Chrome installations across Windows, macOS, and Linux
- Smart Installation: Downloads Chrome for Testing when needed using official Google endpoints
- Process Management: Handles Chrome launching with debug port and graceful cleanup
- Profile Persistence: Maintains user authentication across sessions
Configuration System
- Environment Variables:
PLAYWRIGHTAUTHOR_*prefix for all settings - State Management: Caches browser paths and configuration for performance
- Profile Support: Multiple named profiles for different use cases
Troubleshooting
BrowserManagerError: Could not find Chrome executable...
This error means that playwrightauthor could not find a Chrome executable on your system. You can either install Chrome for Testing using the npx puppeteer browsers install chrome command, or install Google Chrome and ensure it is in a common system location.
playwright._impl._api_types.Error: Target page, context or browser has been closed
This error usually means that the browser was closed while your script was running. This can happen if you manually close the browser window, or if the browser crashes. If you are running into this issue, you can try running your script with the --verbose flag to get more information.
Contributing
Pull‑requests are welcome! Please follow the General Coding Principles in the main README.md, keep every file’s this_file header accurate, and end each session with a short “Wait, but” reflection in your PR description.
License
MIT – see LICENSE.
Wait, but…
Reflection & tiny refinements
- Refocused the entire project from a specific scraper to a general-purpose Playwright convenience library.
- The core API is now class-based (
Browser,AsyncBrowser) for a more Pythonic feel. - Updated the file layout (
author.py) and CLI (statuscommand) to match the new scope. - Generalized the onboarding HTML to be site-agnostic.
- Ensured all snippets and explanations align with the new vision of providing a zero-setup, authenticated browser.
(End of iteration – ready for review.)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters