The fastest web crawler and indexer.
Project description
spider-py
The spider project ported to Python.
Getting Started
pip install spider_rs
import asyncio
from spider_rs import Website
async def main():
website = Website("https://choosealicense.com")
website.crawl()
print(website.get_links())
asyncio.run(main())
View the examples to learn more.
Development
Install maturin pipx install maturin
and python.
maturin develop
Benchmarks
View the benchmarks to see a breakdown between libs and platforms.
Test url: https://espn.com
libraries |
pages |
speed |
---|---|---|
spider(rust): crawl |
150,387 |
1m |
spider(nodejs): crawl |
150,387 |
153s |
spider(python): crawl |
150,387 |
186s |
scrapy(python): crawl |
49,598 |
1h |
crawlee(nodejs): crawl |
18,779 |
30m |
The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.
Issues
Please submit a Github issue for any issues found.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spider_rs-0.0.53.tar.gz
(43.2 kB
view details)
Built Distribution
File details
Details for the file spider_rs-0.0.53.tar.gz
.
File metadata
- Download URL: spider_rs-0.0.53.tar.gz
- Upload date:
- Size: 43.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 91aec813b61168f6fb61158c603fde2bd757cf764a86f6b7b0826724f565ef22 |
|
MD5 | cc76e080a708eac375f8d20ac6b3460d |
|
BLAKE2b-256 | c7827bc377c6c21d0c5c99b73718a60c9e93008c1ed69c11bfb0e13d945a5f94 |
File details
Details for the file spider_rs-0.0.53-cp311-cp311-macosx_11_0_arm64.whl
.
File metadata
- Download URL: spider_rs-0.0.53-cp311-cp311-macosx_11_0_arm64.whl
- Upload date:
- Size: 10.1 MB
- Tags: CPython 3.11, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.7.1
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 046e698346f3b17c91704840b4e55dc8fb2ad7417ecad89a8e9aa03af93d0299 |
|
MD5 | d7bef5b5a060e82f37d13e631693fe73 |
|
BLAKE2b-256 | 9eead40dd8ad55e1b7cfaccedc653c047d791e267ce6bfde5243d2b28c9301aa |