Skip to main content

ArchiveTeam seesaw kit

Project description

Seesaw toolkit
==============

An attempt to write a toolkit for making seesaw scripts in Python, with support for concurrent downloads, uploads etc.

How to try it out
-----------------

To run the example pipeline:

sudo pip install -r requirements.txt
./run-pipeline --help
./run-pipeline example-pipeline.py someone

Point your browser to `http://127.0.0.1:8001/`


Description
-----------

Needs the Tornado library for event-driven I/O.

General idea: a set of `Task`s that can be combined into a `Pipeline` that processes `Item`s:

* An `Item` is a thing that needs to be downloaded (a user, for example). It has properties that are filled by the `Task`s.
* A `Task` is a step in the download process: it takes an item, does something with it and passes it on. Example Tasks: getting an item name from the tracker, running a download script, rsyncing the result, notifying the tracker that it's done.
* A `Pipeline` represents a sequence of `Task`s. To make a seesaw script for a new project you'd specify a new `Pipeline`.

A `Task` can work on multiple `Item`s at a time (e.g., multiple Wget downloads). The concurrency can be limited by wrapping the task in a `LimitConcurrency` `Task`: this will queue the items and run them one-by-one (e.g., a single Rsync upload).

The `Pipeline` needs to be fed empty `Item` objects; by controlling the number of active `Item`s you can limit the number of items. (For example, add a new item each time an item leaves the pipeline.)

With the `ItemValue`, `ItemInterpolation` and `ConfigValue` classes it is possible to pass item-specific arguments to the `Task` objects. The value of these objects will be re-evaluated for each item. Examples: a path name that depends on the item name, a configurable bandwidth limit, the number of concurrent downloads.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
seesaw-0.0.7-py2.7.egg (120.2 kB) Copy SHA256 hash SHA256 Egg 2.7
seesaw-0.0.7.tar.gz (87.6 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page