The fastest web crawler written in Rust ported to nodejs.
Project description
spider-py
The spider project ported to Python.
The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.
Getting Started
pip install spider_rs
import asyncio
from spider_rs import Website
async def main():
website = Website("https://choosealicense.com")
website.crawl()
print(website.get_links())
asyncio.run(main())
View the examples to learn more.
Development
Install maturin pipx install maturin
and python.
maturin develop
Benchmarks
View the benchmarks to see a breakdown between libs and platforms.
Test url: https://espn.com
libraries |
pages |
speed |
---|---|---|
spider(rust): crawl |
150,387 |
1m |
spider(nodejs): crawl |
150,387 |
153s |
spider(python): crawl |
150,387 |
186s |
scrapy(python): crawl |
49,598 |
1h |
crawlee(nodejs): crawl |
18,779 |
30m |
The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.
Issues
Please submit a Github issue for any issues found.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spider_rs-0.0.17.tar.gz
(34.9 kB
view hashes)
Built Distribution
Close
Hashes for spider_rs-0.0.17-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1ca3828286b23212dbbc6e0d52d72fd37f8b18e3c2b2557d98b32c44e9ec34ee |
|
MD5 | 5882d4c76f65b99f93d088cd4e1628ff |
|
BLAKE2b-256 | 4a336d995b86821eb7a2031329c3e6df707d44313325a8898c3b0103bc69e83d |