Skip to main content

One API. Any backend. Full browser automation to lightweight scraping.

Project description

crawlix

One API. Any backend. Full browser automation to lightweight scraping.

PyPI Python License Docs

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

from crawlix import Browser

# Zero-setup scraping
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")

Install

pip install crawlix                    # core — requests + BeautifulSoup
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]           # for Termux/Android (no Playwright)

[!TIP] pip install crawlix with no extras always succeeds — optional backends are imported on demand with clear install hints.


Why crawlix?

Problem Solution
Rewriting code when switching from HTTP to browser scraping Same API — change backend= not your code
Heavy dependencies for small tasks Zero hard deps — core uses only requests + bs4
Bot detection blocking your scrapers Stealth by default — realistic headers, UA rotation
Remembering which backend does what Auto-detect — picks the best available backend
Confusing error messages Helpful errorsBackendError tells you exactly what to install
# Auto-detect picks the best backend installed on your system
# Priority: playwright > selenium > requests+bs4
with Browser() as b:
    print(b.backend_name)  # "requests" — or "playwright" if installed

Quick Start

Scrape a page

from crawlix import Browser

with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))

Extract data from APIs

from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")

Automate a login flow

with Browser(backend="playwright") as b:
    b.open("https://github.com/login")
    b.type("#login_field", "username")
    b.type("#password", "password")
    b.click("[type=submit]")
    b.wait_for(".dashboard-sidebar")
    print("Logged in:", b.url)

Async usage

import asyncio
from crawlix.async_api import AsyncBrowser, aget

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

    page = await aget("https://api.github.com/users/keyreyla")
    print(page.url)

asyncio.run(main())

API at a Glance

Browser

Browser(
    backend="auto",   # "playwright", "selenium", "requests", "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
    user_agent=None,
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool

Page

All interaction methods return self for chaining:

page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page
page.type(selector, text)     # -> Page
page.wait_for(selector)       # -> Page
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any

Element

el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...

Backend Feature Matrix

Feature requests playwright selenium httpx
JS execution
Click/type/hover
Screenshot/PDF
Network intercept
Async
Wait/retry
File upload
Proxy

Examples

Proxy
with Browser(proxy="http://user:pass@proxy:8080") as b:
    page = b.open("https://ipinfo.io/json")
    print(page.json()["ip"])
Table extraction
with Browser() as b:
    page = b.open("https://en.wikipedia.org/wiki/Python_(programming_language)")
    for row in page.tables()[0]:
        print(row)
File upload
with Browser(backend="playwright") as b:
    page = b.open("https://example.com/upload")
    page.upload("#file-input", "/path/to/file.pdf")
    page.click("#submit")
    page.wait_for(".success")
Network intercept
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.intercept("**/api/**", lambda req: print(req.url))

Exceptions

from crawlix.exceptions import (
    CrawlixError,       # base — catch-all
    BackendError,       # backend unavailable or op not supported
    TimeoutError,       # wait exceeded timeout
    NavigationError,    # page failed to load
    SelectorError,      # invalid selector or element not found
    NetworkError,       # connection error
    JavaScriptError,    # JS evaluation failed
)

[!NOTE] BackendError always includes an install hint. For example, calling screenshot() on the requests backend raises: BackendError: screenshot() requires a browser backend. Install: pip install crawlix[playwright]



Examples

Ready-to-run scripts in examples/:

Example Platform Run command
scrape_news.py PC + Android python examples/scrape_news.py
android_scraper.py Android (Termux) pip install crawlix[termux] && python examples/android_scraper.py
browser_login.py PC pip install crawlix[playwright] && playwright install chromium && python examples/browser_login.py
async_scraper.py PC + Android pip install crawlix[async] && python examples/async_scraper.py

Development

git clone https://github.com/keyreyla/crawlix.git
python -m venv .venv && source .venv/bin/activate
pip install -e ".[full]"
pip install pytest ruff mypy
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlix-0.1.2.tar.gz (46.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlix-0.1.2-py3-none-any.whl (19.9 kB view details)

Uploaded Python 3

File details

Details for the file crawlix-0.1.2.tar.gz.

File metadata

  • Download URL: crawlix-0.1.2.tar.gz
  • Upload date:
  • Size: 46.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.2.tar.gz
Algorithm Hash digest
SHA256 a7f423d44ee95d4ff2c0e46570f4eb24bdb4aa02f9340c1c945f7dd1adfcb87b
MD5 7e671cc533d83a6d4051c09941d0e8c3
BLAKE2b-256 27bcd5dcc4e8589ffa41b43e8cbf60c63ff1557ff693eebffa3aaca3c59adeae

See more details on using hashes here.

File details

Details for the file crawlix-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: crawlix-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 19.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 5fae13136f95705ee9378533c15598f61facf14278100abe1a40979066ea0e67
MD5 d749ceec910c4eabad4274f34c84da73
BLAKE2b-256 74ae700a1a59b6985abd664609d4fe1f4aecb12d7651a3502f2d5c7414e323f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page