A high-level web scraping framework
Project description
Okami
Okami is a high-level web scraping framework built entirely for Python 3.6+ using asynchronous model provided by standard library asyncio module with aiohttp as a networking layer and lxml for parsing data.
Architecture is entirely modular and main components can be swapped out and replaced with custom implementations.
Features
- complete website-wide page processing
- full scraping mode or delta mode scraping only unvisited pages
- immediate, on-demand or real-time page processing over HTTP API
- single page processing via command line
- lots of pipelines, middlewares and signals
Spiders are very simple implementations. Take a look at an example here.
Quick start
-
Install okami
pip install okami
-
Run example web server
OKAMI_SETTINGS=okami.cfg.example okami example server
Open localhost:8000 and browse around a little. Quite a remarkable website. We will run our example spider against this website shortly and process few items.
-
Run example spider
OKAMI_SETTINGS=okami.cfg.example okami example spider
Our example spider started and you can see it processing pages. Take a look at an example spider implementation here.
Documentation
Read the rest of documentation here.
License
Okami is licensed under a three clause BSD License. Full license text can be found here.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file okami-0.2.0.tar.gz
.
File metadata
- Download URL: okami-0.2.0.tar.gz
- Upload date:
- Size: 20.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 887339537339a04c33700cd37ad6490c5e43d8e081865c28b47b7c1e0b71b2f2 |
|
MD5 | d154ea382a17ccab8c84be5df874be2c |
|
BLAKE2b-256 | 0abde75a8a747f7e829c3c9d7e1b43d626118b62a6c1e9c52c330e0f449a9d65 |
File details
Details for the file okami-0.2.0-py2.py3-none-any.whl
.
File metadata
- Download URL: okami-0.2.0-py2.py3-none-any.whl
- Upload date:
- Size: 25.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a9424280a8d0208e4d95eebd0b32f892cec9cd83733cb3e929e1cb39140c8331 |
|
MD5 | f12a6de30e84e10ccafae796341a666d |
|
BLAKE2b-256 | c09ba7106c4c24ff27029bbfb87769120b18711834114773ae7cc478dc932d36 |