Skip to main content

Cadasta Worker Toolbox

Project description

PyPI version Build Status Requirements Status

A collection of helpers to assist in quickly building asynchronous workers for the Cadasta system.

Architecture

Async System Architecture Diagram

Async System Architecture Diagram

The Cadasta asynchronous system is designed so that both the scheduled tasks and the task results can be tracked by the central Cadasta Platform. To ensure that this takes place, all Celery workers must be correctly configured to support these features.

Tracking Scheduled Tasks

To keep our system aware of all tasks being scheduled, the Cadasta Platform has a process running to consume task messages off of a task-monitor queue and insert those messages into our database. To support this design, all task producers (including worker nodes) must publish their task messages to both the normal destination queues and the task-monitor queue. This is acheived by registering all queues with a Topic Exchange, setting the task-monitor queue to subscribe to all messages sent to the exchange, and setting standard work queues to subscribe to messages with a matching routing_key. Being that the Cadasta Platform is designed to work with Amazon SQS and the SQS backend only keeps exchange/queue declarations in memory, each message producer must have this set up within their configuration.

Tracking Task Results

Tasks results are inserted by each worker into the Platform DB. For this reason, it is important that each worker have network access to the Platform DB (via AWS Security Groups). Additionally, each worker should have a provided username and password that grants them authorization to write to the Platform DB’s Result Table. For reasons of security, it is advised that these credentials be permitted to only access this single table. The Result Table has a one-to-one relation via the task_id column to the Task Table. This should not be enforced via a constraint, as it is possible for a task’s result to be entered into the DB before the sync-tasks service enters the task into the Task Table.

Library

cadasta.workertoolbox.conf.Config

The Config class was built to simplify configuring Celery settings, helping to ensure that all workers adhere to the architecture requirements of the Cadasta asynchronous system. An instance of the Config should come configured with all Celery settings that are required by our system. It is the aim of the class to not require much customization on the part of the developer. However, some customization may be needed when altering configuration between environments (e.g. if dev settings vary greatly from prod settings).

Required Arguments

queues

The only required argument is the queues array. This should contain an array of names for queues that are to be used by the given worker. This includes queues from which the node processes tasks and queues into which the node will schedule tasks. It is not necessary to include the 'celery' or 'platform.fifo' queues, as these will be added automatically. The input of the queues variable will be stored as QUEUES on the Config instance.

Optional Arguments

Any Celery setting may be submitted. It is internal convention that we use the lowercase Celery settings rather than their older upper-case counterparts. This will ensure that they are displayed when calling repr on the Conf instance.

result_backend

Defaults to 'db+postgresql://{0.RESULT_DB_USER}:{0.RESULT_DB_PASS}@{0.RESULT_DB_HOST}/{0.RESULT_DB_NAME}' rendered with self.

broker_transport

Defaults to 'sqs’.

broker_transport_options

Defaults to:

{
    'region': 'us-west-2',
    'queue_name_prefix': '{}-'.format(QUEUE_NAME_PREFIX)
}
task_queues

Defaults to the following set of kombu.Queue objects, where queues is the configuration’s required queues argument and exchange is an a kombu.Exchange object constructed from the task_default_exchange and task_default_exchange_type settings:

set([
    Queue('celery', exchange, routing_key='celery'),
    Queue(platform_queue, exchange, routing_key='#'),
] + [
    Queue(q_name, exchange, routing_key=q_name)
    for q_name in queues
])

Note: It is recommended that developers not alter this setting.

task_routes

Defaults to the following dict, where queues is the configuration’s required queues argument and exchange is an a kombu.Exchange object constructed from the task_default_exchange and task_default_exchange_type settings:

{
    'celery.*': {
        'exchange': exchange,
        'routing_key': 'celery',
    },
}
for q in queues:
    routes.setdefault('{}.*'.format(q), {
        'exchange': exchange,
        'routing_key': q,
    })

Note: It is recommended that developers not alter this setting.

task_default_exchange

Defaults to 'task_exchange'

task_default_exchange_type

Defaults to 'topic'

task_track_started

Defaults to True.

Internal Variables

By convention, all variables pertinent to only the Config class (i.e. not used by Celery) should be written entirely uppercase.

PLATFORM_QUEUE_NAME

Defaults to 'platform.fifo'.

Note: It is recommended that developers not alter this setting.

QUEUE_NAME_PREFIX

Used to populate the queue_name_prefix value of the connections broker_transport_options. Defaults to value of QUEUE_PREFIX environment variable if populated, 'dev' if not.

RESULT_DB_USER

Used to populate the default result_backend template. Defaults to RESULT_DB_USER environment variable if populated, 'cadasta' if not.

RESULT_DB_PASS

Used to populate the default result_backend template. Defaults to RESULT_DB_PASS environment variable if populated, 'cadasta' if not.

RESULT_DB_HOST

Used to populate the default result_backend template. Defaults to RESULT_DB_HOST environment variable if populated, 'localhost' if not.

RESULT_DB_PORT

Used to populate the default result_backend template. Defaults to RESULT_DB_PORT environment variable if populated, 'cadasta' if not.

RESULT_DB_NAME

Used to populate the default result_backend template. Defaults to RESULT_DB_PORT environment variable if populated, '5432' if not.

cadasta.workertoolbox.tests.build_functional_tests

When provided with a Celery app instance, this function generates a suite of functional tests to ensure that the provided application’s configuration and functionality conforms with the architecture of the Cadasta asynchronous system.

An example, where an instanciated and configured Celery() app instance exists in a parallel celery module:

from cadasta.workertoolbox.tests import build_functional_tests

from .celery import app

FunctionalTests = build_functional_tests(app)

To run these tests, use your standard test runner (e.g. pytest) or call manually from the command-line:

python -m unittest path/to/tests.py

Development

Testing

pip install -r requirements-test.txt
./runtests

Deploying

pip install -r requirements-deploy.txt
python setup.py test clean build publish tag

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cadasta-workertoolbox-0.1.8.tar.gz (8.1 kB view details)

Uploaded Source

Built Distribution

cadasta_workertoolbox-0.1.8-py2.py3-none-any.whl (12.9 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file cadasta-workertoolbox-0.1.8.tar.gz.

File metadata

File hashes

Hashes for cadasta-workertoolbox-0.1.8.tar.gz
Algorithm Hash digest
SHA256 1d05e2bac2e54e612be0dec42b0af2693a51f385a95ad54376a55516cdbe72e7
MD5 5035f531b77c7f82630cbd2f89693660
BLAKE2b-256 c37ba35e9e8ff62ce09c7f09065ff28442a4c218d5c80ed5f512a925d02713bd

See more details on using hashes here.

Provenance

File details

Details for the file cadasta_workertoolbox-0.1.8-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for cadasta_workertoolbox-0.1.8-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 f4173214222160474e5b147687049e56bbf3f99a409557a991296699d67e8903
MD5 e0e09a8c2b23db6d392f217504d0b5ca
BLAKE2b-256 096739902f6634da8589a4da3fa7cc6e1754dec26793782e3c3917099d61340c

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page