Python SDK for Midscene.js - AI-powered UI automation using natural language
Project description
PyMidscene
Python SDK for Midscene.js - AI-powered UI automation using natural language
Features | Installation | Quick Start | Documentation | 中文文档
What is PyMidscene?
PyMidscene is a Python port of Midscene.js - an AI-powered UI automation framework. It allows you to control web browsers using natural language instead of CSS selectors or XPath.
No more fragile selectors! Just describe what you want to click, type, or extract:
# Instead of: page.click("#submit-btn-primary")
await agent.ai_click("the blue Submit button")
# Instead of: page.fill("input[name='email']", "test@example.com")
await agent.ai_input("email input field", "test@example.com")
# Extract structured data with natural language
result = await agent.ai_query({
"title": "the page title",
"price": "the product price as a number"
})
Features
- Natural Language Automation - Describe elements in plain English/Chinese, no selectors needed
- Multi-Model Support - Works with Doubao, Qwen, GPT-4V, Claude, and other vision LLMs
- Playwright Integration - Seamless integration with Playwright for web automation
- XPath Caching - Smart caching system compatible with Midscene.js format
- Visual Reports - Generate beautiful HTML reports for debugging and sharing
- Type-Safe - Full type hints for excellent IDE support
Installation
pip install pymidscene
# Install Playwright browsers
playwright install chromium
Or with Poetry:
poetry add pymidscene
playwright install chromium
Quick Start
1. Set up your API key
# For Doubao (recommended for Chinese users)
export MIDSCENE_MODEL_NAME="doubao-seed-1-6-251015"
export MIDSCENE_MODEL_API_KEY="your-api-key"
export MIDSCENE_MODEL_BASE_URL="https://ark.cn-beijing.volces.com/api/v3"
export MIDSCENE_MODEL_FAMILY="doubao-vision"
# For Qwen
export MIDSCENE_MODEL_NAME="qwen-vl-max"
export MIDSCENE_MODEL_API_KEY="your-api-key"
export MIDSCENE_MODEL_BASE_URL="https://dashscope.aliyuncs.com/compatible-mode/v1"
export MIDSCENE_MODEL_FAMILY="qwen2.5-vl"
2. Write your automation script
import asyncio
import os
from playwright.async_api import async_playwright
from pymidscene import PlaywrightAgent
async def main():
# Configure model (or use environment variables)
os.environ["MIDSCENE_MODEL_NAME"] = "doubao-seed-1-6-251015"
os.environ["MIDSCENE_MODEL_API_KEY"] = "your-api-key"
os.environ["MIDSCENE_MODEL_BASE_URL"] = "https://ark.cn-beijing.volces.com/api/v3"
os.environ["MIDSCENE_MODEL_FAMILY"] = "doubao-vision"
async with async_playwright() as p:
browser = await p.chromium.launch(headless=False)
page = await browser.new_page()
# Create agent with optional caching
agent = PlaywrightAgent(page, cache_id="my_task")
# Navigate to page
await page.goto("https://www.example.com")
# Use natural language to interact
await agent.ai_click("the search box")
await agent.ai_input("search input", "Python automation")
await agent.ai_click("search button")
# Extract data
result = await agent.ai_query({
"results_count": "number of search results",
"first_title": "title of the first result"
})
print(f"Found: {result}")
# Assert page state
await agent.ai_assert("search results are displayed")
# Generate visual report
report_path = agent.finish()
print(f"Report saved to: {report_path}")
await browser.close()
if __name__ == "__main__":
asyncio.run(main())
Documentation
Core API
| Method | Description |
|---|---|
ai_click(description) |
Click an element described in natural language |
ai_input(description, text) |
Type text into an input field |
ai_locate(description) |
Locate an element and return its coordinates |
ai_query(schema) |
Extract structured data from the page |
ai_assert(assertion) |
Assert that a condition is true |
ai_action(task) |
Execute a complex task with AI planning loop (plan-execute-replan) |
ai_wait_for(assertion, timeout) |
Wait until a page condition is met (polling) |
ai_scroll(direction, distance) |
Scroll the page with AI assistance |
finish() |
Generate HTML report and return the path |
Supported Models
| Model | Family | Provider |
|---|---|---|
| doubao-seed-1-6-251015 | doubao-vision | Bytedance/Volcano |
| qwen-vl-max | qwen2.5-vl | Alibaba |
| gpt-4-vision-preview | openai | OpenAI |
| claude-3-opus | claude | Anthropic |
Cache System
PyMidscene uses XPath-based caching compatible with Midscene.js:
# midscene_run/cache/my_task.cache.yaml
midsceneVersion: 1.0.0
cacheId: my_task
caches:
- type: locate
prompt: the login button
cache:
xpaths:
- /html/body/div[1]/button[1]
This means:
- Cache files are interchangeable between JS and Python versions
- XPath-based caching works across different window sizes
- Cache invalidation happens automatically when elements move
Examples
Check out the examples/ directory:
basic_usage.py- Getting startedlogin_demo.py- Login automation with visual reportlogin_demo.html- Test page for login demo
Project Structure
pymidscene/
├── pymidscene/ # Main package
│ ├── core/ # Core automation logic
│ │ ├── agent/ # Agent implementation
│ │ ├── ai_model/ # AI model integration
│ │ └── dump.py # Report generation
│ ├── web_integration/ # Browser integrations
│ │ └── playwright/ # Playwright adapter
│ └── shared/ # Shared utilities
├── examples/ # Usage examples
├── tests/ # Test suite
└── docs/ # Documentation
Related Projects
This is the Python implementation of Midscene.js.
- Midscene.js - Original JavaScript version
- Official Documentation
- Awesome Midscene - Community projects
Contributing
Contributions are welcome! Please see CONTRIBUTING.md for guidelines.
# Development setup
git clone https://github.com/AIPythoner/pymidscene.git
cd pymidscene
pip install -e ".[dev]"
# Run tests
pytest
# Format code
black pymidscene tests
License
MIT License - see LICENSE file for details.
Acknowledgments
- Midscene.js - The original JavaScript framework by Bytedance
- Playwright - Browser automation library
Made with love by the PyMidscene community
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pymidscene-0.1.3.tar.gz.
File metadata
- Download URL: pymidscene-0.1.3.tar.gz
- Upload date:
- Size: 95.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
005299b0ad7312d28d322c8286aca8720685403b20c4c1024cbafeb42cd2ed48
|
|
| MD5 |
e4250227ef4e27c41e931afd497ae02c
|
|
| BLAKE2b-256 |
023c9ea5497a7fe0f9a858db6782de688847a151f5e4c5afeef6082271a7a9b8
|
File details
Details for the file pymidscene-0.1.3-py3-none-any.whl.
File metadata
- Download URL: pymidscene-0.1.3-py3-none-any.whl
- Upload date:
- Size: 118.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c8f261211a30370eccf2e4ffa470bcc7be84a3f17052215a5a3b999989cb04d0
|
|
| MD5 |
3dd60d6526d986cc98b725cca59f8711
|
|
| BLAKE2b-256 |
8b3e2cab0a463f077afc5a3d74066701069f670b2d36e5f193c6b572771244bf
|