Skip to main content

A Python framework to build web scraping applications.

Project description

broccoli-server

Build Status PyPI version

A Python framework to build web scraping applications.

Problem Statement

  • I want to
    • Scrape contents (images, texts, etc) from places on the web (RSS, Twitter, parsing HTML, etc)
    • Moderate and modify them internally on a web UI
    • Expose them in a public API
  • Without having to re-implement, for different scraping applications
    • Reliable "cron" jobs that scrape the content
    • A pipeline that processes the contents stage by stage
    • Internal web UIs that allow human to moderate the contents as a pipeline stage

Solution

This is a Python framework that extracts out common components in the scraping, processing and moderating of web contents, and provides interfaces for programmers to implement "business logic" with, so that they can build reliable and easy-to-use web scraping applications in less time.

Getting started

WIP (https://github.com/k-t-corp/broccoli-server/issues/124)

Development

Prerequisites

  • Python 3.9
  • Make

Prepare virtualenv

make venv

Develop

make deps

Test

make test

Project details


Release history Release notifications | RSS feed

This version

7.2.4

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

broccoli_server-7.2.4.tar.gz (1.2 MB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page