Django Background Tasks for Amazon Elastic Beanstalk

# django-eb-sqs-worker

Django Background Tasks for Amazon Elastic Beanstalk.

Created by Alexey "DataGreed" Strelkov.

## Overview

django-eb-sqs-worker lets you handle background jobs on Elastic Beanstalk Worker Environment sent via SQS and provides methods to send tasks to worker.

You can use the same Django codebase for both your Web Tier and Worker Tier environments and send tasks from Web environment to Worker environment. Amazon fully manages autoscaling for you.

Tasks are sent via Amazon Simple Queue Service and are delivered to your worker with Elastic Beanstalk's SQS daemon. Periodic tasks are also supported.

Here's the diagram of how tasks move through the system, tasks movement is represented by arrows:

## Installation

Install using pip (only python3.x+ is supported):

pip install django-eb-sqs-worker


Add eb_sqs_worker to settings.INSTALLED_APPS:

INSTALLED_APPS = [
# ...
"eb_sqs_worker",
]


Add localhost to settings.ALLOWED_HOSTS so SQS Daemon can post tasks from the queue to your worker:

ALLOWED_HOSTS = [
# ...
"localhost",
]


Update your settings.py for both Worker and Web EB environments:

# region where your elastic beanstalk environments are deployed, e.g. "us-west-1"
AWS_EB_DEFAULT_REGION = "your default region"
# your aws access key id
AWS_ACCESS_KEY_ID = "insert your key id here"
# your aws access key
AWS_SECRET_ACCESS_KEY = "insert your key here"
# queue name to use - queues that don't exist will be created automatically
AWS_EB_DEFAULT_QUEUE_NAME = "any_queue_name_to_use"


In the settings file for your Web tier environment add the following setting (this is important due to possible security problems if you don't set this):

# never set to True on Web environment. Use True only on Worker env and local development env


In the setting files used by your Worker environments add the following setting:

# never set to True on Web environment. Use True only on Worker env and local development env


Add eb-sqs-worker urls to your project's main urls.py module:

# urls.py

urlpatterns = [
# your url patterns
# ...
]

from eb_sqs_worker.urls import urlpatterns as eb_sqs_urlpatterns
urlpatterns += eb_sqs_urlpatterns


Navigate to your Worker environment in Elastic Beanstalk Web console, then go to Configuration > Worker and set HTTP path to /sqs/.

You should also select the queue to use here corresponding to your AWS_EB_DEFAULT_QUEUE_NAME or, if you prefer to use the autogenerated one, you can copy its name and set as your AWS_EB_DEFAULT_QUEUE_NAME. If you don't see your AWS_EB_DEFAULT_QUEUE_NAME here, try sending first task to it (see "Queueing tasks" section) and it will be automatically created for you (you may need to reload the page for it to appear here).

Apply changes.

## Usage

### Simple way

#### Defining Background Tasks

To define a job create a function decorated by task decorator:

from eb_sqs_worker.decorators import task
print(f"The decorated test task is being run with kwargs {kwargs} and will echo them back")

return kwargs


Make sure the module with your tasks is imported so they will register correctly.

The best practice is to do it as soon as django loads, e.g. in your app's models.py or in corresponding AppConfig.

If the task was defined using @task decorator, you can send it to background queue like this:

# sends the task to SQS queue where it will be automatically picked up and executed by worker


You can set settings.AWS_EB_RUN_TASKS_LOCALLY to True in development – this will force all tasks to execute locally in sync mode without sending them to the queue. This is useful for testing.

If you need to execute the function synchronously just one time somewhere in your code without changing this setting, you can do it like this:

# runs the task function synchronously without sending it to the queue and returns result


Note: don't supply positional arguments to the task, always use keyword arguments.

Periodic tasks are defined the same way as regular task, but it's better to supply a custom name for them:

from eb_sqs_worker.decorators import task
print(f"Periodic test task is being run ")

return True


Add cron.yaml to the root of the project:

version: 1
cron:
url: "/sqs/"
schedule: "0 23 * * *"


Deploy your project to elastic beanstalk and your task will run every day at 23:00.

Refer to the documentation for more info on periodic tasks.

Note: periodic tasks don't support arguments passing

#TODO describe (add link to https://docs.aws.amazon.com/elasticbeanstalk/latest/dg/using-features-managing-env-tiers.html#worker-periodictasks), explain configuration

#### Defining Background Tasks

#TODO describe

#TODO describe

#TODO describe

### Interface reference

#TODO: add info on arguments

#TODO: add info on arguments

## Settings

If set to True, tasks will be accepted and handled on this instance. If set to False, the URL for handling tasks will return 404. Defaults to False.

Important: set this to True only on your Worker environment

Dictionary of enabled tasks. Routes task names to actual task methods.

If you register your tasks using the task decorator, you don't need to worry about this setting at all, it will be set automatically by the decorator.

E.g.:

AWS_EB_ENABLED_TASKS = {
# name used in serialization   # path to actual method that does the job
}


### AWS_EB_DEFAULT_REGION

Default Elastic Beanstalk Region. Use the one that your app id deployed in.

### AWS_EB_DEFAULT_QUEUE_NAME

Name of the queue used by default. If the queue with specified name does not exist, it will be created automatically when the first task is queued.

### AWS_ACCESS_KEY_ID

Amazon Access Key Id, refer to the docs

### AWS_SECRET_ACCESS_KEY

Amazon Secret Access Key, refer to the docs

If set to true, all tasks will be run locally and synchronnously instead of being sent to SQS Queue. Defaults to False

Set this to the maximum number of seconds the job (not periodic, use AWS_EB_ALERT_WHEN_PERIODIC_TASK_EXECUTES_LONGER_THAN_SECONDS for periodic jobs) is supposed to run. If the job finishes requires more time to finish ADMINS will be notified by email.

Same as AWS_EB_ALERT_WHEN_TASK_EXECUTES_LONGER_THAN_SECONDS but for periodic jobs, since sometimes they need a separate threshold.

## Security

Always set AWS_EB_HANDLE_SpipQS_TASKS=False on Web Tier Environment so the tasks could not be spoofed! Web Tier environments are typically used for hosting publici websites and can be accessed by anoyone on the Internet, meaning that anyone can send any jobs to your site if you leave this option on on Web environment.

Worker environments can only be accessed internally, e.g. via SQS Daemon that POSTs, so AWS_EB_HANDLE_SQS_TASKS=True should be set only on worker environments.

Use Elastic Beanstalk Environment properties to supply different setting files for Web and Worker environments. See also: docs on designating the Django settings

## Tips

#TODO

### Accessing Web Tier Database from Worker

You will probably want your worker environment to have access to the same database as your web tier environment.

Assuming you have a web tier environment and a worker environment with the same Django apps deployed (if you don't have a worker environment, yet, you can create it using eb create -t worker <environment name>) and the web tier environment has an attached database set up via Elastic Beanstalk with database connection settings populated from environmantal variables, do the following:

1. Open Elastic Beanstalk Web Console
2. Navigate to your Web Tier environment > Configuration > Database
3. Copy database connection settings. Note that the database password will not be shown here. If you don't remember it, you can connect to the Web environment using eb ssh and getting it using cat /opt/python/current/env
4. Navigate to your Worker environment > Configuration > Software > Edit
5. Add environmental variables for DB connection that you've copied (RDS_PORT,RDS_PASSWORD,RDS_USERNAME, RDS_DB_NAME, RDS_HOSTNAME) and hit "Apply"
6. Navigate to your Worker environment > Configuration > Instances > Edit
7. Add security group corresponding to your Web Tier environment and hit "Apply", confirm changes.
8. Re-deploy the application using eb deploy to make sure that everything works as expected.

### Delay abstraction

#TODO

### Using different cron files for different environments

#TODO

## Testing

### Synchronous mode

When developing on local machine it might be a good idea to set AWS_EB_RUN_TASKS_LOCALLY=True, so all the tasks that should normally be sent to queue will be executed locally on the same machine in sync mode. This lets you test your actual task methods in integration tests.

### Testing django-eb-sqs-worker itself

Clone the repository.

git clone https://github.com/DataGreed/django-eb-sqs-worker.git


Install requirements (use python virtual environment)

cd django-eb-sqs-worker
pip install -r requirements.txt


Run tests

sh test.sh


## Contributing

If you would like to contribute, please make a Pull Request with the description of changes and add tests to cover these changes.

Feel free to open issues if you have any problems or questions with this package.

# TODOs

• take advantage of the new environment link feature
• add pickle serialization
• parse GET-parameters for periodic tasks?

Search tags

Django Elastic Beanstalk Worker Web Tier Asynchronous celery async django-q Jobs Background Tasks SQS

## Release history Release notifications | RSS feed

Uploaded source
Uploaded py3