Skip to main content

Simple and customizable web crawler built with Python's asyncio

Project description

crawlio

Asynchronous web crawling and scraping with Python for minimalists

Warning: this project is under active development and not yet production-ready!

Features

  • Crawling: download an entire website in just a few seconds
  • Scraping: Customizable XPath & CSS data selectors (using parsel)
  • Zero-configuration: get up and running with ~5 LoC
  • Interfaces: Web UI + JSON API powered by FastAPI & VueJS

Built with asyncio, aiohttp and 🍺

Setup

pip install crawlio

Usage

import asyncio
from crawlio import Crawler, Selector

bot = Crawler(
    url='https://quotes.toscrape.com/',
    selectors=[
        Selector('links', '//a/@href'),
        Selector('heading', type='xpath', query='//h3//text()', process=lambda items: ' '.join(items))
    ]
)
output = asyncio.run(bot.run())
for item in output["data"]:
    print(item)

License

Copyright (C) 2021 Maximilian Wolf

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlio-2.3.3.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

crawlio-2.3.3-py3-none-any.whl (19.1 kB view details)

Uploaded Python 3

File details

Details for the file crawlio-2.3.3.tar.gz.

File metadata

  • Download URL: crawlio-2.3.3.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-2.3.3.tar.gz
Algorithm Hash digest
SHA256 5345120aeab4372f13b43bfe23f9988c5c00424ecc9b12be1e9e22f9c8d96491
MD5 ec59659419d72eae482ad642b6c8f60a
BLAKE2b-256 e44e72e9e77b63aae23f7b9a31437ecb7a43559b49456f1afad94f63a8c538fd

See more details on using hashes here.

File details

Details for the file crawlio-2.3.3-py3-none-any.whl.

File metadata

  • Download URL: crawlio-2.3.3-py3-none-any.whl
  • Upload date:
  • Size: 19.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-2.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 3999047d47f5374c6323643fbbe30df26171fd51c6d216dfee00dc899fa4b3b4
MD5 7507f119b14a1d7d56fd4ba5170c523c
BLAKE2b-256 deaec5a08ef29001180ccc3508c4109d401677b3cfe5ff028f605f23de3df993

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page