A high-level web scraping framework
Okami is a high-level web scraping framework built entirely for Python 3.6+ using asynchronous model provided by standard library asyncio module with aiohttp as a networking layer and lxml for parsing data.
Architecture is entirely modular and main components can be swapped out and replaced with custom implementations.
- complete website-wide page processing
- full scraping mode or delta mode scraping only unvisited pages
- immediate, on-demand or real-time page processing over HTTP API
- single page processing via command line
- lots of pipelines, middlewares and signals
Spiders are very simple implementations. Take a look at an example here.
pip install okami
Run example web server
OKAMI_SETTINGS=okami.cfg.example okami example server
Open localhost:8000 and browse around a little. Quite a remarkable website. We will run our example spider against this website shortly and process few items.
Run example spider
OKAMI_SETTINGS=okami.cfg.example okami example spider
Our example spider started and you can see it processing pages. Take a look at an example spider implementation here.
Read the rest of documentation here.
Okami is licensed under a three clause BSD License. Full license text can be found here.
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Hashes for okami-0.2.0-py2.py3-none-any.whl