Skip to main content

Simple and customizable web crawler built with Python's asyncio

Project description

crawlio

Simple and customizable web crawler built with Python's asyncio

Warning: this project is under active development and not yet production-ready!

Features

  • Crawling: download an entire website in seconds
  • Scraping: Customizable XPath selectors
  • Zero-configuration: get up and running with ~5 LoC
  • Interfaces: Web UI + JSON API powered by FastAPI & VueJS (coming soon)

Built with asyncio, aiohttp and Parsel (by Scrapy authors)

Setup

pip install crawlio

Usage

import asyncio
from crawlio import Crawler, Selector

crawler = Crawler(
    url='https://innovinati.com/',
    selectors=[
        Selector('title', 'css', 'title::text', lambda items: items[0]),
        Selector('text', 'xpath', '//p//text()', lambda items: ' '.join(items))
    ]
)
output = asyncio.run(crawler.run())
for item in output["data"]:
    print(item)

License

Copyright (C) 2021 Maximilian Wolf

This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.

This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.

You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlio-2.0.0.tar.gz (16.8 kB view details)

Uploaded Source

Built Distribution

crawlio-2.0.0-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file crawlio-2.0.0.tar.gz.

File metadata

  • Download URL: crawlio-2.0.0.tar.gz
  • Upload date:
  • Size: 16.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-2.0.0.tar.gz
Algorithm Hash digest
SHA256 474743749b3de6fca274bf79a0e8c7fa5fb8bf1468c76a94a983dfe9d63af40b
MD5 e3124bd45ebddb6cadf85046c602e8a6
BLAKE2b-256 fca09c53586b57dbe890dbb88c2d4d0f0363065d824ea423b71349415331d641

See more details on using hashes here.

File details

Details for the file crawlio-2.0.0-py3-none-any.whl.

File metadata

  • Download URL: crawlio-2.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-2.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 8c265792a143d93f30dbbc0be05dac929884916d938810095d46a33a2bf5d38e
MD5 1382e73569ee31fb6bcf683840ed6fd0
BLAKE2b-256 7b5d0e0214655588056b9e3a6fb7090932c1ae0bd677fe0118f810c0b2e792d6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page