One API. Any backend. Full browser automation to lightweight scraping.
Project description
crawlix
One API. Any backend. Full browser automation to lightweight scraping.
crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.
from crawlix import Browser
# Zero-setup scraping
with Browser() as b:
page = b.open("https://example.com")
print(page.find("h1").text)
# Full browser automation — same API
with Browser(backend="playwright") as b:
page = b.open("https://example.com")
page.type("#email", "user@example.com")
page.click("[type=submit]")
page.wait_for(".dashboard")
page.screenshot("result.png")
Install
pip install crawlix # core — requests + BeautifulSoup
pip install crawlix[playwright] # full browser via Playwright
pip install crawlix[selenium] # full browser via Selenium
pip install crawlix[async] # async support via httpx
pip install crawlix[full] # everything above
pip install crawlix[termux] # for Termux/Android (no Playwright)
[!TIP] After installing a browser backend, run
crawlix setup allto automatically download browsers and drivers.
CLI Tools
crawlix setup playwright # Install Playwright + Chromium browser
crawlix setup selenium # Install Selenium (driver auto-managed)
crawlix setup all # Install everything
crawlix doctor # Check system & diagnose issues
Why crawlix?
| Problem | Solution |
|---|---|
| Rewriting code when switching from HTTP to browser scraping | Same API — change backend= not your code |
| Heavy dependencies for small tasks | Zero hard deps — core uses only requests + bs4 |
| Bot detection blocking your scrapers | Stealth by default — realistic headers, UA rotation |
| Remembering which backend does what | Auto-detect — picks the best available backend |
| Confusing error messages | Helpful errors — BackendError tells you exactly what to install |
# Auto-detect picks the best backend installed on your system
# Priority: playwright > selenium > requests+bs4
with Browser() as b:
print(b.backend_name) # "requests" — or "playwright" if installed
Quick Start
Scrape a page
from crawlix import Browser
with Browser() as b:
page = b.open("https://news.ycombinator.com")
for item in page.find_all(".titleline > a"):
print(item.text, item.attr("href"))
Extract data from APIs
from crawlix import get, fetch
data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
Automate a login flow
with Browser(backend="playwright") as b:
b.open("https://github.com/login")
b.type("#login_field", "username")
b.type("#password", "password")
b.click("[type=submit]")
b.wait_for(".dashboard-sidebar")
print("Logged in:", b.url)
Async usage
import asyncio
from crawlix.async_api import AsyncBrowser, aget
async def main():
async with AsyncBrowser() as b:
page = await b.open("https://example.com")
print(page.title)
page = await aget("https://api.github.com/users/keyreyla")
print(page.url)
asyncio.run(main())
API at a Glance
Browser
Browser(
backend="auto", # "playwright", "selenium", "requests", "httpx"
headless=True,
stealth=True,
timeout=30,
proxy=None, # "http://user:pass@host:port"
locale="en-US",
user_agent=None,
)
b.open(url) # -> Page
b.new_page() # -> Page
b.close()
b.backend_name # -> str
b.supports_js # -> bool
Page
All interaction methods return self for chaining:
page.find(selector) # -> Element | None
page.find_all(selector) # -> list[Element]
page.click(selector) # -> Page
page.type(selector, text) # -> Page
page.wait_for(selector) # -> Page
page.screenshot(path=None) # -> bytes
page.html # -> str
page.text # -> str
page.json() # -> dict
page.links() # -> list[str]
page.tables() # -> list[list[list[str]]]
page.evaluate("document.title") # -> any
Element
el.text # -> str
el.attr(name) # -> str
el.attrs # -> dict
el.find(selector) # -> Element | None
el.click() # -> Element
el.is_visible() # -> bool
el.bounding_box() # -> dict
if el: # always True — natural presence checks
...
Backend Feature Matrix
| Feature | requests | playwright | selenium | httpx |
|---|---|---|---|---|
| JS execution | ✅ | ✅ | ||
| Click/type/hover | ✅ | ✅ | ||
| Screenshot/PDF | ✅ | ✅ | ||
| Network intercept | ✅ | |||
| Async | ✅ | ✅ | ||
| Wait/retry | ✅ | ✅ | ||
| File upload | ✅ | ✅ | ||
| Proxy | ✅ | ✅ | ✅ | ✅ |
Examples
Proxy
with Browser(proxy="http://user:pass@proxy:8080") as b:
page = b.open("https://ipinfo.io/json")
print(page.json()["ip"])
Table extraction
with Browser() as b:
page = b.open("https://en.wikipedia.org/wiki/Python_(programming_language)")
for row in page.tables()[0]:
print(row)
File upload
with Browser(backend="playwright") as b:
page = b.open("https://example.com/upload")
page.upload("#file-input", "/path/to/file.pdf")
page.click("#submit")
page.wait_for(".success")
Network intercept
with Browser(backend="playwright") as b:
page = b.open("https://example.com")
page.intercept("**/api/**", lambda req: print(req.url))
Exceptions
from crawlix.exceptions import (
CrawlixError, # base — catch-all
BackendError, # backend unavailable or op not supported
TimeoutError, # wait exceeded timeout
NavigationError, # page failed to load
SelectorError, # invalid selector or element not found
NetworkError, # connection error
JavaScriptError, # JS evaluation failed
)
[!NOTE]
BackendErroralways includes an install hint. For example, callingscreenshot()on therequestsbackend raises:BackendError: screenshot() requires a browser backend. Install: pip install crawlix[playwright]
Examples
Ready-to-run scripts in examples/:
| Example | Platform | Run command |
|---|---|---|
scrape_news.py |
PC + Android | python examples/scrape_news.py |
android_scraper.py |
Android (Termux) | pip install crawlix[termux] && python examples/android_scraper.py |
browser_login.py |
PC | pip install crawlix[playwright] && playwright install chromium && python examples/browser_login.py |
async_scraper.py |
PC + Android | pip install crawlix[async] && python examples/async_scraper.py |
Development
git clone https://github.com/keyreyla/crawlix.git
python -m venv .venv && source .venv/bin/activate
pip install -e ".[full]"
pip install pytest ruff mypy
pytest
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawlix-0.2.0.tar.gz.
File metadata
- Download URL: crawlix-0.2.0.tar.gz
- Upload date:
- Size: 48.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
813a72c7ef229f6ab24bffc8b51e965a602af75c7df363082b2c5290c61def52
|
|
| MD5 |
4fa46e0c8da0bc7634a1122b92a55f9c
|
|
| BLAKE2b-256 |
55e0733138d05e6ef4b8bb5b1100186cc53310a720269cb5aed85ebc752e8f8d
|
File details
Details for the file crawlix-0.2.0-py3-none-any.whl.
File metadata
- Download URL: crawlix-0.2.0-py3-none-any.whl
- Upload date:
- Size: 23.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
d322b3cef978530d16e19382c086d200d3bf7d67f5274abd6501bf16db04fb10
|
|
| MD5 |
e0a36e0fe3f4103b9a2f2a76cb208063
|
|
| BLAKE2b-256 |
0b9778392c5d857fcedf79d24b33613518c2da33b5dcaae1d71ed2c0653bd828
|