Skip to main content

The fastest web crawler and indexer.

Project description

spider-py

The spider project ported to Python.

Getting Started

  1. pip install spider_rs
import asyncio

from spider_rs import Website

async def main():
    website = Website("https://choosealicense.com")
    website.crawl()
    print(website.get_links())

asyncio.run(main())

View the examples to learn more.

Development

Install maturin pipx install maturin and python.

  1. maturin develop

Benchmarks

View the benchmarks to see a breakdown between libs and platforms.

Test url: https://espn.com

libraries pages speed
spider(rust): crawl 150,387 1m
spider(nodejs): crawl 150,387 153s
spider(python): crawl 150,387 186s
scrapy(python): crawl 49,598 1h
crawlee(nodejs): crawl 18,779 30m

The benches above were ran on a mac m1, spider on linux arm machines performs about 2-10x faster.

Issues

Please submit a Github issue for any issues found.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spider_rs-0.0.57.tar.gz (50.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spider_rs-0.0.57-cp313-cp313-macosx_11_0_arm64.whl (12.2 MB view details)

Uploaded CPython 3.13macOS 11.0+ ARM64

File details

Details for the file spider_rs-0.0.57.tar.gz.

File metadata

  • Download URL: spider_rs-0.0.57.tar.gz
  • Upload date:
  • Size: 50.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: maturin/1.8.1

File hashes

Hashes for spider_rs-0.0.57.tar.gz
Algorithm Hash digest
SHA256 34b95a194c74182f65974f89ba1a156c342f2eacedc1ab339aacd913677cba5e
MD5 7ec521eb3902023cd1f336933340624e
BLAKE2b-256 126b0da3c8484c7247f7c6d1f8131336cb26ec879be5264d5c9351aa286fb108

See more details on using hashes here.

File details

Details for the file spider_rs-0.0.57-cp313-cp313-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for spider_rs-0.0.57-cp313-cp313-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 0c67c9766badae64d82acc4e06104af80de27ff53c71345a0de1acf6b8aa63d9
MD5 670635bd5a96063388478f92c5cd5bb7
BLAKE2b-256 e379bd2fd2d585f33dba91893f997637fb84e2e183d43a25a98912c2d5c9f88f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page