Skip to main content

One API. Any backend. Full browser automation to lightweight scraping.

Project description

PyPI Python Build Docs License

crawlix

One API. Any backend. Full browser automation to lightweight scraping.

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

from crawlix import Browser

# Zero-setup scraping — auto-detects best backend
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same exact API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")

Install

pip install crawlix                    # core (requests + BeautifulSoup)
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]            # Termux/Android (no Playwright)

[!TIP] After installing a browser backend, run crawlix setup all to automatically download browsers and drivers.

crawlix setup playwright   # install Playwright + Chromium
crawlix setup selenium     # install Selenium (drivers auto-managed)
crawlix setup all          # install everything
crawlix doctor             # check system and diagnose issues

Features

Feature Details
Unified API Same code for HTTP scraping and browser automation. Change backend=, not your code.
Auto-detect Picks the best available backend: playwright → selenium → httpx → requests. No config needed.
Zero hard deps Core depends only on requests + beautifulsoup4. Backends are optional extras.
Stealth by default Realistic headers, user-agent rotation, no bot fingerprinting out of the box.
Context manager Resources cleaned up automatically. Works with with or async context managers.
Helpful errors BackendError tells you exactly what to install. No silent failures, no traceback soup.
CLI tools crawlix setup installs backends. crawlix doctor diagnoses your environment.
Termux ready Works on Android via Termux. Use pip install crawlix[termux].

Quick start

from crawlix import Browser

# Detect backends lazily
with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))
from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
import asyncio
from crawlix.async_api import AsyncBrowser

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

asyncio.run(main())

[!NOTE] See more examples in the examples/ directory, including browser login flows, table extraction, file uploads, and Android scraping scripts.


API at a glance

Browser

Browser(
    backend="auto",   # "playwright" | "selenium" | "requests" | "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool

Page

All interaction methods return self for chaining:

page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page (chainable)
page.type(selector, text)     # -> Page (chainable)
page.wait_for(selector)       # -> Page (chainable)
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any (browser backends)

Element

el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element (chainable)
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...

Exceptions

from crawlix.exceptions import CrawlixError, BackendError, TimeoutError

Backend feature matrix

Feature requests httpx selenium playwright
JS execution yes yes
Click / type / hover yes yes
Screenshot / PDF yes yes
Network intercept yes
Async support yes yes
Wait / retry yes yes
File upload yes yes
Proxy support yes yes yes yes

Get help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlix-0.2.1.tar.gz (54.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlix-0.2.1-py3-none-any.whl (24.2 kB view details)

Uploaded Python 3

File details

Details for the file crawlix-0.2.1.tar.gz.

File metadata

  • Download URL: crawlix-0.2.1.tar.gz
  • Upload date:
  • Size: 54.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.2.1.tar.gz
Algorithm Hash digest
SHA256 cae8fcbeb281b103c7347fd56ab2047129677dd60498dfc126a8cdad8ae767d4
MD5 51f93446aa53ffa4e9fbd658a7044553
BLAKE2b-256 27ca532284b980fd611eb863c2795589c1afafffc9cc7879e2fe7b14b19d660b

See more details on using hashes here.

File details

Details for the file crawlix-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: crawlix-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 24.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 fc5a8d2ca14698726c13ef8b135abcc34113e54abbe6a38b504c3763a7405873
MD5 1d4a14b082947493f33c8bdb57aa2758
BLAKE2b-256 1ad6eccf4b2259d9bdd4d02ce37e0fecdad67696709e8cfca97d630b45e94f59

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page