Skip to main content

A high-level web scraping framework

Project description

Okami

Okami is a high-level web scraping framework built entirely for Python 3.6+ using asynchronous model provided by standard library asyncio module with aiohttp as a networking layer and lxml for parsing data.

Architecture is entirely modular and main components can be swapped out and replaced with custom implementations.

Features

  • complete website-wide page processing
  • full scraping mode or delta mode scraping only unvisited pages
  • immediate, on-demand or real-time page processing over HTTP API
  • single page processing via command line
  • lots of pipelines, middlewares and signals

Spiders are very simple implementations. Take a look at an example here.

Quick start

  • Install okami

    • pip install okami
  • Run example web server

    • OKAMI_SETTINGS=okami.cfg.example okami example server

Open localhost:8000 and browse around a little. Quite a remarkable website. We will run our example spider against this website shortly and process few items.

  • Run example spider

    • OKAMI_SETTINGS=okami.cfg.example okami example spider

Our example spider started and you can see it processing pages. Take a look at an example spider implementation here.

Documentation

Read the rest of documentation here.

License

Okami is licensed under a three clause BSD License. Full license text can be found here.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

okami-0.2.0.tar.gz (20.5 kB view details)

Uploaded Source

Built Distribution

okami-0.2.0-py2.py3-none-any.whl (25.1 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file okami-0.2.0.tar.gz.

File metadata

  • Download URL: okami-0.2.0.tar.gz
  • Upload date:
  • Size: 20.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6

File hashes

Hashes for okami-0.2.0.tar.gz
Algorithm Hash digest
SHA256 887339537339a04c33700cd37ad6490c5e43d8e081865c28b47b7c1e0b71b2f2
MD5 d154ea382a17ccab8c84be5df874be2c
BLAKE2b-256 0abde75a8a747f7e829c3c9d7e1b43d626118b62a6c1e9c52c330e0f449a9d65

See more details on using hashes here.

File details

Details for the file okami-0.2.0-py2.py3-none-any.whl.

File metadata

  • Download URL: okami-0.2.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.11.0 pkginfo/1.4.2 requests/2.19.1 setuptools/40.1.0 requests-toolbelt/0.8.0 tqdm/4.24.0 CPython/3.6.6

File hashes

Hashes for okami-0.2.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 a9424280a8d0208e4d95eebd0b32f892cec9cd83733cb3e929e1cb39140c8331
MD5 f12a6de30e84e10ccafae796341a666d
BLAKE2b-256 c09ba7106c4c24ff27029bbfb87769120b18711834114773ae7cc478dc932d36

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page