Simple and customizable web crawler built with Python's asyncio
Project description
crawlio
Simple and customizable web crawler built with Python's asyncio
Warning: this project is under active development and not yet production-ready!
Features
- Crawling: download an entire website in seconds
- Scraping: Customizable XPath selectors
- Zero-configuration: get up and running with ~5 LoC
- Interfaces: Web UI + JSON API powered by FastAPI & VueJS (coming soon)
Built with asyncio
, aiohttp
and Parsel
(by Scrapy authors)
Setup
pip install crawlio
Usage
import asyncio
from crawlio import Crawler, Selector
crawler = Crawler(
url='https://innovinati.com/',
selectors=[
Selector('title', 'css', 'title::text', lambda items: items[0]),
Selector('text', 'xpath', '//p//text()', lambda items: ' '.join(items))
]
)
output = asyncio.run(crawler.run())
for item in output["data"]:
print(item)
License
Copyright (C) 2021 Maximilian Wolf
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file crawlio-2.0.0.tar.gz
.
File metadata
- Download URL: crawlio-2.0.0.tar.gz
- Upload date:
- Size: 16.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 474743749b3de6fca274bf79a0e8c7fa5fb8bf1468c76a94a983dfe9d63af40b |
|
MD5 | e3124bd45ebddb6cadf85046c602e8a6 |
|
BLAKE2b-256 | fca09c53586b57dbe890dbb88c2d4d0f0363065d824ea423b71349415331d641 |
File details
Details for the file crawlio-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: crawlio-2.0.0-py3-none-any.whl
- Upload date:
- Size: 17.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.2 importlib_metadata/4.8.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.3 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8c265792a143d93f30dbbc0be05dac929884916d938810095d46a33a2bf5d38e |
|
MD5 | 1382e73569ee31fb6bcf683840ed6fd0 |
|
BLAKE2b-256 | 7b5d0e0214655588056b9e3a6fb7090932c1ae0bd677fe0118f810c0b2e792d6 |