Browserless, native-speed web crawler + extractor with a V8 JS-render tier (PyO3 binding over the turbo-surf Rust engine).
Project description
turbo-surf (Python)
Browserless, native-speed web crawler + extractor with a real V8 JS-render tier — a PyO3 binding over the turbo-surf Rust engine. Fetch-free: you pass a page's HTML in, and get a view out (Markdown, visible text, links, a typed extraction, an accessibility tree). For JS-gated pages, the engine runs the page's own scripts in a true V8 isolate over a native DOM — no headless Chromium.
import turbo_surf as ts
html = open("page.html").read()
ts.markdown(html, base_url="https://example.com/") # -> Markdown str
ts.text(html) # -> visible text
ts.links(html, base_url="https://example.com/") # -> list[str]
# Typed extraction: a JSON schema maps field names to selector specs.
schema = '{"title": {"selector": "h1"}, "prices": {"selector": ".price", "list": true}}'
ts.extract(html, schema, base_url="https://example.com/") # -> JSON str
# JS-gated page: run its own scripts, read the hydrated DOM.
hydrated = ts.render(html, script, base_url="https://example.com/")
Fatal faults (malformed schema JSON, a render-tier failure) raise
turbo_surf.TurboSurfError; the non-JS views never raise.
Install
pip install turbo-surf
Prebuilt abi3 wheels (CPython 3.8+) are published for Linux (x86_64/aarch64), macOS (arm64), and Windows (x64).
API
| function | returns | notes |
|---|---|---|
markdown(html, base_url="") |
str |
Markdown render |
text(html) |
str |
visible text |
title(html) |
str |
document <title> |
html(html) |
str |
re-serialized HTML |
links(html, base_url="") |
list[str] |
resolved hyperlink targets |
interactive_elements(html, base_url="") |
JSON str |
links/buttons/inputs |
accessibility_tree(html) |
JSON str |
a11y tree |
hydration_state(html) |
JSON str |
hydration probe |
detect(html) |
JSON str |
is the page JS-gated? |
query(html, selector, kind=None) |
JSON str |
kind = "css"/"xpath"/auto |
extract(html, schema_json, base_url="") |
JSON str |
typed extraction |
evaluate(html, script) |
str |
run script over the DOM (sync) |
render(html, script, base_url="") |
str |
hydrated HTML after page scripts run |
transform(src, ts=False, jsx=False) |
str |
TS/JSX → classic JS (swc) |
MIT licensed.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distributions
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file turbo_surf-0.2.7.tar.gz.
File metadata
- Download URL: turbo_surf-0.2.7.tar.gz
- Upload date:
- Size: 503.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
6f89b4d23ae5f2a25a8fe9bb5e177af7ff916645c00cf574247cd2021725fe00
|
|
| MD5 |
787f6aabc71233a2b0c8b30ee9325895
|
|
| BLAKE2b-256 |
3bf315ba9c13b7107390ce326e9a78d2401fd22067f8eee1f013d77f0f6e3342
|
File details
Details for the file turbo_surf-0.2.7-cp38-abi3-win_amd64.whl.
File metadata
- Download URL: turbo_surf-0.2.7-cp38-abi3-win_amd64.whl
- Upload date:
- Size: 23.0 MB
- Tags: CPython 3.8+, Windows x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c7c2448fa091128e8a02391351f35cd731134b5843026def2553a75f6db446ed
|
|
| MD5 |
262353740266fc7e317e4942d2613aa0
|
|
| BLAKE2b-256 |
0f4dc3c449b9e648c14496b5104d988784009b2cfdf8c29552e68e3824b96b15
|
File details
Details for the file turbo_surf-0.2.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.
File metadata
- Download URL: turbo_surf-0.2.7-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
- Upload date:
- Size: 26.1 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fad80b8c163b33c69f87a34007a4476bee71c755fef56a2d861fd0fcf2bdfdf2
|
|
| MD5 |
69102280ad1b016a80c0686c5d97815b
|
|
| BLAKE2b-256 |
a7ce95f8f13b0f96a4888a053b5d2a3aa0afa53950eb0f407d7aa35fd45ebc62
|
File details
Details for the file turbo_surf-0.2.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.
File metadata
- Download URL: turbo_surf-0.2.7-cp38-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
- Upload date:
- Size: 27.5 MB
- Tags: CPython 3.8+, manylinux: glibc 2.17+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eac095f12397495f74e475bca2b386b93ba3ecca38fb8525e10eff08074d7b60
|
|
| MD5 |
678e9bbdd74361540ad494f16c016307
|
|
| BLAKE2b-256 |
fc835fc576dcf44154e0c3191404c796a44b6eafd8f152c04eff8e732aa30e23
|
File details
Details for the file turbo_surf-0.2.7-cp38-abi3-macosx_11_0_arm64.whl.
File metadata
- Download URL: turbo_surf-0.2.7-cp38-abi3-macosx_11_0_arm64.whl
- Upload date:
- Size: 23.7 MB
- Tags: CPython 3.8+, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.14.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
16601ffe9178534cd40f80469dce380390275a10773c3ee2563f71575a376ce5
|
|
| MD5 |
6a845c7892b9d3ae33c7799473b4aa7e
|
|
| BLAKE2b-256 |
8b2cfbf10b58b6dffc349fb252d458f16919b97f9868adfe7b04a25fbd4d2ab2
|