Skip to main content

One API. Any backend. Full browser automation to lightweight scraping.

Project description

PyPI Python Build Docs License

crawlix

One API. Any backend. Full browser automation to lightweight scraping.

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. Write your code once — switch between lightweight HTTP scraping and full browser automation without changing a single line.

from crawlix import Browser

# Zero-setup scraping — auto-detects best backend
with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

# Full browser automation — same exact API
with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.type("#email", "user@example.com")
    page.click("[type=submit]")
    page.wait_for(".dashboard")
    page.screenshot("result.png")

Install

pip install crawlix                    # core (requests + BeautifulSoup)
pip install crawlix[playwright]        # full browser via Playwright
pip install crawlix[selenium]          # full browser via Selenium
pip install crawlix[async]             # async support via httpx
pip install crawlix[full]              # everything above
pip install crawlix[termux]            # Termux/Android (no Playwright)

[!TIP] After installing a browser backend, run crawlix setup all to automatically download browsers and drivers.

crawlix setup playwright   # install Playwright + Chromium
crawlix setup selenium     # install Selenium (drivers auto-managed)
crawlix setup all          # install everything
crawlix doctor             # check system and diagnose issues

Features

Feature Details
Unified API Same code for HTTP scraping and browser automation. Change backend=, not your code.
Auto-detect Picks the best available backend: playwright → selenium → httpx → requests. No config needed.
Zero hard deps Core depends only on requests + beautifulsoup4. Backends are optional extras.
Stealth by default Realistic headers, user-agent rotation, no bot fingerprinting out of the box.
Context manager Resources cleaned up automatically. Works with with or async context managers.
Helpful errors BackendError tells you exactly what to install. No silent failures, no traceback soup.
CLI tools crawlix setup installs backends. crawlix doctor diagnoses your environment.
Termux ready Works on Android via Termux. Use pip install crawlix[termux].

Quick start

from crawlix import Browser

# Detect backends lazily
with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))
from crawlix import get, fetch

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")
import asyncio
from crawlix.async_api import AsyncBrowser

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.title)

asyncio.run(main())

[!NOTE] See more examples in the examples/ directory, including browser login flows, table extraction, file uploads, and Android scraping scripts.


API at a glance

Browser

Browser(
    backend="auto",   # "playwright" | "selenium" | "requests" | "httpx"
    headless=True,
    stealth=True,
    timeout=30,
    proxy=None,       # "http://user:pass@host:port"
    locale="en-US",
)

b.open(url)          # -> Page
b.new_page()         # -> Page
b.close()
b.backend_name       # -> str
b.supports_js        # -> bool

Page

All interaction methods return self for chaining:

page.find(selector)           # -> Element | None
page.find_all(selector)       # -> list[Element]
page.click(selector)          # -> Page (chainable)
page.type(selector, text)     # -> Page (chainable)
page.wait_for(selector)       # -> Page (chainable)
page.screenshot(path=None)    # -> bytes
page.html                     # -> str
page.text                     # -> str
page.json()                   # -> dict
page.links()                  # -> list[str]
page.tables()                 # -> list[list[list[str]]]
page.evaluate("document.title")  # -> any (browser backends)

Element

el.text               # -> str
el.attr(name)         # -> str
el.attrs              # -> dict
el.find(selector)     # -> Element | None
el.click()            # -> Element (chainable)
el.is_visible()       # -> bool
el.bounding_box()     # -> dict
if el:                # always True — natural presence checks
    ...

Exceptions

from crawlix.exceptions import CrawlixError, BackendError, TimeoutError

Backend feature matrix

Feature requests httpx selenium playwright
JS execution yes yes
Click / type / hover yes yes
Screenshot / PDF yes yes
Network intercept yes
Async support yes yes
Wait / retry yes yes
File upload yes yes
Proxy support yes yes yes yes

Get help

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlix-0.3.0.tar.gz (62.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlix-0.3.0-py3-none-any.whl (28.7 kB view details)

Uploaded Python 3

File details

Details for the file crawlix-0.3.0.tar.gz.

File metadata

  • Download URL: crawlix-0.3.0.tar.gz
  • Upload date:
  • Size: 62.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.3.0.tar.gz
Algorithm Hash digest
SHA256 3cc059d935742435149b5b9acdd26df987aad6af5e7d0d252ecd95c561424318
MD5 42e8afe5caf508cec04f17ccea56d8bb
BLAKE2b-256 af4d0a17d4dc5f675b627f5ad64442361a86cf91006adaff80b4caa1f7e4d956

See more details on using hashes here.

File details

Details for the file crawlix-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: crawlix-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 28.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0602354ec0f8f454cc9d0116383ececd5611f493dff54fd2594fe808f726b045
MD5 9a76757317d7aedcfd39689f339fd65b
BLAKE2b-256 eb858612818652123a59d0540a3ae77a8409b9f55f8e53f96a6d24d48090e7f8

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page