coreason-navigator
Project description
coreason-navigator
coreason-navigator is the "Eyes and Hands" of the CoReason platform for the World Wide Web. It bridges the gap between modern AI agents and legacy web interfaces using State-of-the-Art (SOTA) "Computer Use" techniques.
Features
- Visual Navigation: Agents use a headless browser (Playwright) to "see" and interact with webpages just like a human, bypassing the need for fragile API integrations.
- Vision-Language Model (VLM) Integration: Uses screenshots and Accessibility Trees (AX Tree) to ground LLM intent into precise screen coordinates.
- Robust Orchestration: Handles dynamic content, session persistence, and stealth techniques (e.g., User-Agent rotation) to avoid detection.
- Set-of-Marks (SoM): Overlays numeric tags on interactive elements to improve VLM accuracy.
- Content Extraction: Converts noisy webpages into clean Markdown for LLM consumption.
- Safety First: Includes rate limiting, domain allowlisting, and PII input protection.
Installation
pip install coreason-navigator
Or install from source:
git clone https://github.com/CoReason-AI/coreason_navigator.git
cd coreason_navigator
pip install .
You will also need to install Playwright browsers:
playwright install chromium
Usage
Here is a simple example of how to use the PlaywrightNavigator:
import asyncio
from coreason_navigator.driver import PlaywrightNavigator
from coreason_navigator.types import GotoAction, ClickAction
async def main():
# Initialize the navigator (headless by default)
navigator = PlaywrightNavigator(headless=True)
try:
# Launch the browser
await navigator.launch()
# Navigate to a URL
print("Navigating to example.com...")
state = await navigator.navigate("https://example.com")
print(f"Title: {state.title}")
# Take a screenshot (base64 encoded)
# print(state.screenshot_base64[:50] + "...")
# Extract content
content = await navigator.extract_content(format="markdown")
print("Page Content Summary:")
print(content[:200])
finally:
# Always close resources
await navigator.close()
if __name__ == "__main__":
asyncio.run(main())
Architecture
- Observe: Captures screenshot and Accessibility Tree.
- Orient: Maps user intent to screen coordinates using VLM.
- Decide: Formulates browser actions (Click, Type, Scroll).
- Act: Executes actions via Playwright.
License
This software is proprietary and dual-licensed. Licensed under the Prosperity Public License 3.0. Commercial use beyond a 30-day trial requires a separate license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters