Skip to main content

The fastest web crawler and indexer.

Project description

spider-py

The spider project ported to Python.

Getting Started

  1. pip install spider_rs
import asyncio

from spider_rs import Website

async def main():
    website = Website("https://choosealicense.com")
    website.crawl()
    print(website.get_links())

asyncio.run(main())

View the examples to learn more.

Development

Install maturin pipx install maturin and python.

  1. maturin develop

Benchmarks

View the benchmarks to see a breakdown between libs and platforms.

Test url: https://espn.com

libraries pages speed
spider(rust): crawl 150,387 1m
spider(nodejs): crawl 150,387 153s
spider(python): crawl 150,387 186s
scrapy(python): crawl 49,598 1h
crawlee(nodejs): crawl 18,779 30m

The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.

Issues

Please submit a Github issue for any issues found.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spider_rs-0.0.53.tar.gz (43.2 kB view details)

Uploaded Source

Built Distribution

spider_rs-0.0.53-cp311-cp311-macosx_11_0_arm64.whl (10.1 MB view details)

Uploaded CPython 3.11 macOS 11.0+ ARM64

File details

Details for the file spider_rs-0.0.53.tar.gz.

File metadata

  • Download URL: spider_rs-0.0.53.tar.gz
  • Upload date:
  • Size: 43.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.7.1

File hashes

Hashes for spider_rs-0.0.53.tar.gz
Algorithm Hash digest
SHA256 91aec813b61168f6fb61158c603fde2bd757cf764a86f6b7b0826724f565ef22
MD5 cc76e080a708eac375f8d20ac6b3460d
BLAKE2b-256 c7827bc377c6c21d0c5c99b73718a60c9e93008c1ed69c11bfb0e13d945a5f94

See more details on using hashes here.

File details

Details for the file spider_rs-0.0.53-cp311-cp311-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for spider_rs-0.0.53-cp311-cp311-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 046e698346f3b17c91704840b4e55dc8fb2ad7417ecad89a8e9aa03af93d0299
MD5 d7bef5b5a060e82f37d13e631693fe73
BLAKE2b-256 9eead40dd8ad55e1b7cfaccedc653c047d791e267ce6bfde5243d2b28c9301aa

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page