Skip to main content

Simple website crawler built with Python's asyncio

Project description

crawlio

crawlio

Simple website crawler built with Python's asyncio

Features

  • Asynchronous "deep" crawling using asyncio, aiohttp and Parsel (by Scrapy authors)
  • Zero-configuration
  • Customizable XPath selectors

Setup

pip install crawlio

Usage

Synchronous ()

import asyncio
from crawlio import Crawler

fields = {
    'title': '/html/head/title/text()',
    # ...
}
crawler = Crawler('https://quotes.toscrape.com/', selectors=fields)
results = asyncio.run(crawler.run(), debug=True)
for item in results:
    print(item)

Asynchronous

import asyncio
from crawlio import Crawler

async def some_coroutine():
    fields = {
        'title': '/html/head/title/text()',
        # ...
    }
    loop = asyncio.get_event_loop()
    crawler = Crawler('https://quotes.toscrape.com/', selectors=fields)
    results = await crawler.run()
    return results

Contribute

...

License

...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawlio-1.0.0.tar.gz (16.4 kB view details)

Uploaded Source

Built Distribution

crawlio-1.0.0-py3-none-any.whl (16.2 kB view details)

Uploaded Python 3

File details

Details for the file crawlio-1.0.0.tar.gz.

File metadata

  • Download URL: crawlio-1.0.0.tar.gz
  • Upload date:
  • Size: 16.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-1.0.0.tar.gz
Algorithm Hash digest
SHA256 6cb1ce0c20224f6ac6ec8bd5637a3093be50a8ebe042531ccc9a7904587869c5
MD5 d5d2fe4f917ca6460423cfbfce4ff6b0
BLAKE2b-256 f494c0764c6b854bc04c7fbebd951ff76ef6dbdd2c114a85fbe4bfbe080aa1f0

See more details on using hashes here.

File details

Details for the file crawlio-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: crawlio-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2

File hashes

Hashes for crawlio-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d2f7eff07b91bbf95775d5d41f1eb3f96acec8f83dae67023f89df7c5ed313da
MD5 1563f06a97b2e1e56682b889c0e534a4
BLAKE2b-256 f8f60f129bf6749adbf78b55ddd7e58a393c0caf8398cad45933175774f80a17

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page