Skip to main content

ArchiveTeam seesaw kit

Project description

Seesaw toolkit
==============

An asynchronous toolkit for distributed web processing. Written in Python and named after its behavior, it supports concurrent downloads, uploads, etc.

This toolkit is well-known for [Archive Team projects](http://archiveteam.org). It also powers the [Archive Team warrior](http://archiveteam.org/index.php?title=Warrior).

[![Build Status](https://secure.travis-ci.org/ArchiveTeam/seesaw-kit.png)](http://travis-ci.org/ArchiveTeam/seesaw-kit)
[![Coverage Status](https://coveralls.io/repos/ArchiveTeam/seesaw-kit/badge.svg)](https://coveralls.io/r/ArchiveTeam/seesaw-kit)

Installation
------------

Requires Python 2 or 3.

Needs the Tornado library for event-driven I/O. The complete list of Python modules needed are listed in requirements.txt.


How to try it out
-----------------

To run the example pipeline:

sudo pip install -r requirements.txt
./run-pipeline --help
./run-pipeline examples/example-pipeline.py someone

Point your browser to `http://127.0.0.1:8001/`.

You can also use `run-pipeline2` or `run-pipeline3` to be explicit for the Python version.


Overview
--------

General idea: a set of `Task`s that can be combined into a `Pipeline` that processes `Item`s:

* An `Item` is a thing that needs to be downloaded (a user, for example). It has properties that are filled by the `Task`s.
* A `Task` is a step in the download process: it takes an item, does something with it and passes it on. Example Tasks: getting an item name from the tracker, running a download script, rsyncing the result, notifying the tracker that it's done.
* A `Pipeline` represents a sequence of `Task`s. To make a seesaw script for a new project you'd specify a new `Pipeline`.

A `Task` can work on multiple `Item`s at a time (e.g., multiple Wget downloads). The concurrency can be limited by wrapping the task in a `LimitConcurrency` `Task`: this will queue the items and run them one-by-one (e.g., a single Rsync upload).

The `Pipeline` needs to be fed empty `Item` objects; by controlling the number of active `Item`s you can limit the number of items. (For example, add a new item each time an item leaves the pipeline.)

With the `ItemValue`, `ItemInterpolation` and `ConfigValue` classes it is possible to pass item-specific arguments to the `Task` objects. The value of these objects will be re-evaluated for each item. Examples: a path name that depends on the item name, a configurable bandwidth limit, the number of concurrent downloads.

Consult [the wiki](https://github.com/ArchiveTeam/seesaw-kit/wiki) for more information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

seesaw-0.9.2.tar.gz (140.9 kB view details)

Uploaded Source

Built Distribution

seesaw-0.9.2-py2.7.egg (205.2 kB view details)

Uploaded Egg

File details

Details for the file seesaw-0.9.2.tar.gz.

File metadata

  • Download URL: seesaw-0.9.2.tar.gz
  • Upload date:
  • Size: 140.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for seesaw-0.9.2.tar.gz
Algorithm Hash digest
SHA256 65cb4a9ee5b1cc90338c49c86ed8387b1b2fb71fd697bd914c4d5aed4435a9d1
MD5 c73fc28119db7f7a111835e1d145e1d5
BLAKE2b-256 89544261f5a4313c1636d18543a022dfb2a1f6569311fed8ae6bf3676628ff13

See more details on using hashes here.

File details

Details for the file seesaw-0.9.2-py2.7.egg.

File metadata

  • Download URL: seesaw-0.9.2-py2.7.egg
  • Upload date:
  • Size: 205.2 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for seesaw-0.9.2-py2.7.egg
Algorithm Hash digest
SHA256 d00f4f02700a64d9a2ceccdbc73a77b484743cba9b1d7c70713f292954ff9f2c
MD5 c57b594dc122b6e2e01317ab54f692c1
BLAKE2b-256 089e29f621efba725cd13643f05c1d5e20e93e5684ec3c5d1cc580781c3f1604

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page