Skip to main content

Simple website crawler built with Python's asyncio

Project description

crawlio

crawlio

Simple web crawler built with Python's asyncio

Warning: this project is under active development and not yet production-ready!

Features

  • Crawling: download an entire website in seconds
  • Scraping: Customizable XPath selectors
  • Zero-configuration: get up and running with ~5 LoC

Built with asyncio, aiohttp and Parsel (by Scrapy authors)

Setup

pip install crawlio

Usage

import asyncio
from crawlio import Crawler

fields = {
    'title': '//title/text()',
    'text': '//p//text()'
}
crawler = Crawler('https://quotes.toscrape.com/', selectors=fields)
output = asyncio.run(crawler.run(), debug=True)
for item in output["results"]:
    print(item)

License

Copyright (C) 2021 Maximilian Wolf

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlio-1.1.1.tar.gz (16.6 kB view details)

Uploaded Source

Built Distribution

crawlio-1.1.1-py3-none-any.whl (16.9 kB view details)

Uploaded Python 3

File details

Details for the file crawlio-1.1.1.tar.gz.

File metadata

  • Download URL: crawlio-1.1.1.tar.gz
  • Upload date:
  • Size: 16.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-1.1.1.tar.gz
Algorithm Hash digest
SHA256 05b794e70ba75ff8cdec44554dd2f993c8c1531c1d9bdbdda3e199c9666e8ac7
MD5 3f801c6a4540d1945bdfbe1b347181a3
BLAKE2b-256 e84fd9525670c59d0647bcd2fd9a5c378b2074117a474b3cfc3904663ec7397c

See more details on using hashes here.

File details

Details for the file crawlio-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: crawlio-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 16.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 f4f2caba366c6ae2081027f2d496f2cbb141b7bff6cdf82c52d75fcfd5552d8c
MD5 cf7c849d857194428e51b86ef40e6a59
BLAKE2b-256 188b9b492e46f6ddf9428b522378562f7ba1450c6de7743b68838852140cd4df

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page