Skip to main content

One API. Any backend. Full browser automation to lightweight scraping.

Project description

crawlix

One API. Any backend. Full browser automation to lightweight scraping.

PyPI Python License

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

from crawlix import Browser

# Zero-setup scraping
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")

Install

pip install crawlix                    # core — requests + BeautifulSoup
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]           # for Termux/Android (no Playwright)

[!TIP] pip install crawlix with no extras always succeeds — optional backends are imported on demand with clear install hints.


Why crawlix?

Problem Solution
Rewriting code when switching from HTTP to browser scraping Same API — change backend= not your code
Heavy dependencies for small tasks Zero hard deps — core uses only requests + bs4
Bot detection blocking your scrapers Stealth by default — realistic headers, UA rotation
Remembering which backend does what Auto-detect — picks the best available backend
Confusing error messages Helpful errorsBackendError tells you exactly what to install
# Auto-detect picks the best backend installed on your system
# Priority: playwright > selenium > requests+bs4
with Browser() as b:
    print(b.backend_name)  # "requests" — or "playwright" if installed

Quick Start

Scrape a page

from crawlix import Browser

with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))

Extract data from APIs

from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")

Automate a login flow

with Browser(backend="playwright") as b:
    b.open("https://github.com/login")
    b.type("#login_field", "username")
    b.type("#password", "password")
    b.click("[type=submit]")
    b.wait_for(".dashboard-sidebar")
    print("Logged in:", b.url)

Async usage

import asyncio
from crawlix.async_api import AsyncBrowser, aget

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

    page = await aget("https://api.github.com/users/keyreyla")
    print(page.url)

asyncio.run(main())

API at a Glance

Browser

Browser(
    backend="auto",   # "playwright", "selenium", "requests", "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
    user_agent=None,
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool

Page

All interaction methods return self for chaining:

page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page
page.type(selector, text)     # -> Page
page.wait_for(selector)       # -> Page
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any

Element

el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...

Backend Feature Matrix

Feature requests playwright selenium httpx
JS execution
Click/type/hover
Screenshot/PDF
Network intercept
Async
Wait/retry
File upload
Proxy

Examples

Proxy
with Browser(proxy="http://user:pass@proxy:8080") as b:
    page = b.open("https://ipinfo.io/json")
    print(page.json()["ip"])
Table extraction
with Browser() as b:
    page = b.open("https://en.wikipedia.org/wiki/Python_(programming_language)")
    for row in page.tables()[0]:
        print(row)
File upload
with Browser(backend="playwright") as b:
    page = b.open("https://example.com/upload")
    page.upload("#file-input", "/path/to/file.pdf")
    page.click("#submit")
    page.wait_for(".success")
Network intercept
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.intercept("**/api/**", lambda req: print(req.url))

Exceptions

from crawlix.exceptions import (
    CrawlixError,       # base — catch-all
    BackendError,       # backend unavailable or op not supported
    TimeoutError,       # wait exceeded timeout
    NavigationError,    # page failed to load
    SelectorError,      # invalid selector or element not found
    NetworkError,       # connection error
    JavaScriptError,    # JS evaluation failed
)

[!NOTE] BackendError always includes an install hint. For example, calling screenshot() on the requests backend raises: BackendError: screenshot() requires a browser backend. Install: pip install crawlix[playwright]


Development

git clone https://github.com/keyreyla/crawlix.git
python -m venv .venv && source .venv/bin/activate
pip install -e ".[full]"
pip install pytest ruff mypy
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlix-0.1.1.tar.gz (44.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlix-0.1.1-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file crawlix-0.1.1.tar.gz.

File metadata

  • Download URL: crawlix-0.1.1.tar.gz
  • Upload date:
  • Size: 44.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.1.tar.gz
Algorithm Hash digest
SHA256 24cbe8543ab29087df7483bb5380ae1705c2212529aa7495c8e155ebbeaf3563
MD5 b9a1915305b50a1e9c7798407e6d4d28
BLAKE2b-256 df295201480f8ed75d3f41f875954e26f78d56e46fceeeff7d019678f0bd8955

See more details on using hashes here.

File details

Details for the file crawlix-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: crawlix-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 780abd499d9bfc2a59f6c6008d0cc522029247e4e4529728f9338cf50cc96979
MD5 0fa014da2e604e15838b7ba21f5b2d95
BLAKE2b-256 2d952cf9c71cadb077ae7d50a402fb4ce8d004ff9b2528ef3cc454f1acd8dd85

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page