Skip to main content

One API. Any backend. Full browser automation to lightweight scraping.

Project description

crawlix

One API. Any backend. Full browser automation to lightweight scraping.

PyPI: pip install crawlix Author: keylordelrey License: MIT Python: 3.10+


What crawlix is

crawlix is a Python browser automation and web scraping library with a unified API across multiple backends. The same code works whether you are doing simple HTTP scraping or full Playwright-powered browser automation — you switch backends, not code.

from crawlix import Browser

with Browser() as b:
    page = b.open("https://example.com")
    print(page.find("h1").text)

with Browser(backend="playwright") as b:
    page = b.open("https://example.com")
    page.click("#login")
    page.type("#email", "user@example.com")
    page.submit("form")
    page.wait_for(".dashboard")
    page.screenshot("result.png")

Install

pip install crawlix
pip install crawlix[playwright]
pip install crawlix[selenium]
pip install crawlix[async]
pip install crawlix[full]

Core Design Rules

  1. Same API across all backends — switching backend never requires rewriting user code
  2. Auto-detect best available backend — no config needed, crawlix figures it out
  3. Zero hard dependenciespip install crawlix always succeeds
  4. Fail with helpful errors — BackendError tells you exactly what to install
  5. Context manager always — resources always cleaned up properly
  6. Stealth on by default — realistic headers, UA rotation, no bot fingerprint

Backend Priority

playwright > selenium > requests+bs4 (core)

Override anytime:

Browser(backend="playwright")
Browser(backend="requests")
Browser(backend="selenium")

Quick Examples

from crawlix import Browser, get, fetch

with Browser() as b:
    page = b.open("https://news.ycombinator.com")
    for item in page.find_all(".titleline > a"):
        print(item.text, item.attr("href"))

data = get("https://api.github.com/users/keyreyla").json()
html = fetch("https://example.com")

For async:

import asyncio
from crawlix.async_api import AsyncBrowser

async def main():
    async with AsyncBrowser() as b:
        page = await b.open("https://example.com")
        print(page.html)

asyncio.run(main())

API Overview

Browser

Browser(backend="auto", headless=True, stealth=True, timeout=30, proxy=None, locale="en-US", user_agent=None)
b.open(url) -> Page
b.new_page() -> Page
b.close()
b.backend_name -> str
b.supports_js -> bool

Page (all methods return self for chaining)

page.find(selector) -> Element | None
page.find_all(selector) -> list[Element]
page.click(selector) -> Page
page.type(selector, text) -> Page
page.screenshot(path=None) -> bytes
page.html -> str
page.text -> str
page.json() -> dict
page.links() -> list[str]

Element

el.text -> str
el.attr(name) -> str
el.attrs -> dict
el.find(selector) -> Element | None
el.click() -> Element
bool(el)  # always True

Exceptions

from crawlix.exceptions import CrawlixError, BackendError, TimeoutError, NavigationError, SelectorError, NetworkError, JavaScriptError

Development

git clone https://github.com/keyreyla/crawlix.git
cd crawlix
pip install -e ".[full]"
pytest

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlix-0.1.0.tar.gz (36.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

crawlix-0.1.0-py3-none-any.whl (18.4 kB view details)

Uploaded Python 3

File details

Details for the file crawlix-0.1.0.tar.gz.

File metadata

  • Download URL: crawlix-0.1.0.tar.gz
  • Upload date:
  • Size: 36.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.0.tar.gz
Algorithm Hash digest
SHA256 eb5fee9b1049235177bf833a1f5f1bb76c72b2041d907181ff75d57a272c9387
MD5 06f63c1ee33a41a9f14a2521b4f6556c
BLAKE2b-256 498985cdcfa47c03a411ef1ee03156df13ef2e74cd911da6df0a73a49ba51dac

See more details on using hashes here.

File details

Details for the file crawlix-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: crawlix-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 18.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.11.15

File hashes

Hashes for crawlix-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6bc82e8d30ab192ea3a019978f2e034ff2dcbe4ed18893634ba02907b18274ee
MD5 a992da10367c3bd1e51259770f4049dd
BLAKE2b-256 e3e0230449b2ec5648a7ccdcdab20cd558914fb9163cf4144d085516c54a10af

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page