The fastest web crawler written in Rust ported to nodejs.
Project description
spider-py
The spider project ported to Python.
Getting Started
pip install spider_rs
import asyncio
from spider_rs import crawl
async def main():
website = await crawl("https://choosealicense.com")
print(website.links)
# print(website.pages)
asyncio.run(main())
Use the Website class to build the crawler you need.
import asyncio
from spider_rs import Website
async def main():
website = Website("https://choosealicense.com", False).with_headers({ "authorization": "myjwttoken" })
website.crawl()
print(website.get_links())
asyncio.run(main())
Setting up real time subscriptions can be done too.
import asyncio
from spider_rs import Website
class Subscription:
def __init__(self):
print("Subscription Created...")
def __call__(self, page):
print(page.url + " - status: " + str(page.status_code))
async def main():
website = Website("https://choosealicense.com", False)
website.crawl(Subscription())
asyncio.run(main())
View the examples for more.
Development
Install maturin pipx install maturin
and python.
maturin develop
Benchmarks
View bench to see the results.
Issues
Please submit a Github issue for any issues found.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
spider_rs-0.0.12.tar.gz
(33.4 kB
view hashes)
Built Distribution
Close
Hashes for spider_rs-0.0.12-cp311-cp311-macosx_11_0_arm64.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c5e0042ee3a2e46f712872fc2fd66538a312d93c16eb35da2db88d1e7c6bb00 |
|
MD5 | 1bd3b0347245e2f90a4f545e12c41c57 |
|
BLAKE2b-256 | 16ce2b09d2273ed3cca64dc36116b824ed5e9c0a59a2744679586e57ae9ebfd5 |