One API. Any backend. Full browser automation to lightweight scraping.
Project description
crawlix
One API. Any backend. Full browser automation to lightweight scraping.
crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.
from crawlix import Browser
# Zero-setup scraping — auto-detects best backend
with Browser() as b:
page = b.open("https://example.com")
print(page.find("h1").text)
# Full browser automation — same exact API
with Browser(backend="playwright") as b:
page = b.open("https://example.com")
page.type("#email", "user@example.com")
page.click("[type=submit]")
page.wait_for(".dashboard")
page.screenshot("result.png")
Install
pip install crawlix # core (requests + BeautifulSoup)
pip install crawlix[playwright] # full browser via Playwright
pip install crawlix[selenium] # full browser via Selenium
pip install crawlix[async] # async support via httpx
pip install crawlix[full] # everything above
pip install crawlix[termux] # Termux/Android (no Playwright)
[!TIP] After installing a browser backend, run
crawlix setup allto automatically download browsers and drivers.
crawlix setup playwright # install Playwright + Chromium
crawlix setup selenium # install Selenium (drivers auto-managed)
crawlix setup all # install everything
crawlix doctor # check system and diagnose issues
Features
| Feature | Details | |
|---|---|---|
| Unified API | Same code for HTTP scraping and browser automation. Change backend=, not your code. |
|
| Auto-detect | Picks the best available backend: playwright → selenium → httpx → requests. No config needed. | |
| Zero hard deps | Core depends only on requests + beautifulsoup4. Backends are optional extras. |
|
| Stealth by default | Realistic headers, user-agent rotation, no bot fingerprinting out of the box. | |
| Context manager | Resources cleaned up automatically. Works with with or async context managers. |
|
| Helpful errors | BackendError tells you exactly what to install. No silent failures, no traceback soup. |
|
| CLI tools | crawlix setup installs backends. crawlix doctor diagnoses your environment. |
|
| Termux ready | Works on Android via Termux. Use pip install crawlix[termux]. |
Quick start
from crawlix import Browser
# Detect backends lazily
with Browser() as b:
page = b.open("https://news.ycombinator.com")
for item in page.find_all(".titleline > a"):
print(item.text, item.attr("href"))
from crawlix import get, fetch
data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
import asyncio
from crawlix.async_api import AsyncBrowser
async def main():
async with AsyncBrowser() as b:
page = await b.open("https://example.com")
print(page.title)
asyncio.run(main())
[!NOTE] See more examples in the
examples/directory, including browser login flows, table extraction, file uploads, and Android scraping scripts.
API at a glance
Browser
Browser(
backend="auto", # "playwright" | "selenium" | "requests" | "httpx"
headless=True,
stealth=True,
timeout=30,
proxy=None, # "http://user:pass@host:port"
locale="en-US",
)
b.open(url) # -> Page
b.new_page() # -> Page
b.close()
b.backend_name # -> str
b.supports_js # -> bool
Page
All interaction methods return self for chaining:
page.find(selector) # -> Element | None
page.find_all(selector) # -> list[Element]
page.click(selector) # -> Page (chainable)
page.type(selector, text) # -> Page (chainable)
page.wait_for(selector) # -> Page (chainable)
page.screenshot(path=None) # -> bytes
page.html # -> str
page.text # -> str
page.json() # -> dict
page.links() # -> list[str]
page.tables() # -> list[list[list[str]]]
page.evaluate("document.title") # -> any (browser backends)
Element
el.text # -> str
el.attr(name) # -> str
el.attrs # -> dict
el.find(selector) # -> Element | None
el.click() # -> Element (chainable)
el.is_visible() # -> bool
el.bounding_box() # -> dict
if el: # always True — natural presence checks
...
Exceptions
from crawlix.exceptions import CrawlixError, BackendError, TimeoutError
Backend feature matrix
| Feature | requests | httpx | selenium | playwright |
|---|---|---|---|---|
| JS execution | yes | yes | ||
| Click / type / hover | yes | yes | ||
| Screenshot / PDF | yes | yes | ||
| Network intercept | yes | |||
| Async support | yes | yes | ||
| Wait / retry | yes | yes | ||
| File upload | yes | yes | ||
| Proxy support | yes | yes | yes | yes |
Get help
- Docs: keyreyla.github.io/crawlix
- Issues: github.com/keyreyla/crawlix/issues
- PyPI: pypi.org/project/crawlix
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file crawlix-0.2.1.tar.gz.
File metadata
- Download URL: crawlix-0.2.1.tar.gz
- Upload date:
- Size: 54.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cae8fcbeb281b103c7347fd56ab2047129677dd60498dfc126a8cdad8ae767d4
|
|
| MD5 |
51f93446aa53ffa4e9fbd658a7044553
|
|
| BLAKE2b-256 |
27ca532284b980fd611eb863c2795589c1afafffc9cc7879e2fe7b14b19d660b
|
File details
Details for the file crawlix-0.2.1-py3-none-any.whl.
File metadata
- Download URL: crawlix-0.2.1-py3-none-any.whl
- Upload date:
- Size: 24.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.11.15
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fc5a8d2ca14698726c13ef8b135abcc34113e54abbe6a38b504c3763a7405873
|
|
| MD5 |
1d4a14b082947493f33c8bdb57aa2758
|
|
| BLAKE2b-256 |
1ad6eccf4b2259d9bdd4d02ce37e0fecdad67696709e8cfca97d630b45e94f59
|