Skip to main content

A high-level web scraping framework

Project description

Okami

Okami is a high-level web scraping framework built entirely for Python 3.6+ using asynchronous model provided by standard library asyncio module with aiohttp as a networking layer and lxml for parsing data.

Architecture is entirely modular and main components can be swapped out and replaced with custom implementations.

Features

  • complete website-wide page processing
  • full scraping mode or delta mode scraping only unvisited pages
  • immediate, on-demand or real-time page processing over HTTP API
  • single page processing via command line
  • lots of pipelines, middlewares and signals

Spiders are very simple implementations. Take a look at an example here.

Quick start

  • Install okami

    • pip install okami
  • Run example web server

    • OKAMI_SETTINGS=okami.cfg.example okami example server

Open localhost:8000 and browse around a little. Quite a remarkable website. We will run our example spider against this website shortly and process few items.

  • Run example spider

    • OKAMI_SETTINGS=okami.cfg.example okami example spider

Our example spider started and you can see it processing pages. Take a look at an example spider implementation here.

Documentation

Read the rest of documentation here.

License

Okami is licensed under a three clause BSD License. Full license text can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okami-0.2.0.tar.gz (20.5 kB view hashes)

Uploaded Source

Built Distribution

okami-0.2.0-py2.py3-none-any.whl (25.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page