A queue/jobs system based on redis-limpyd, a redis orm (sort of) in python
Where to find it:
Python 2.6, 2.7, 3.3 and 3.4 are supported.
pip install redis-limpyd-jobs
redis-limpyd-jobs provides three limpyd models (Queue, Job, Error), and a Worker class.
These models implement the minimum stuff you need to run jobs asynchronously:
from limpyd_jobs import STATUSES, Queue, Job, Worker # The function to run when a job is called by the worker def do_stuff(job, queue): # here do stuff with your job pass # Create a first job, name 'job:1', in a queue named 'myqueue', with a # priority of 1. The higher the priority, the sooner the job will run job1 = Job.add_job(identifier='job:1', queue_name='myqueue', priority=1) # Add another job in the same queue, with a higher priority, and a different # identifier (if the same was used, no new job would be added, but the # existing job's priority would have been updated) job2 = Job.add_job(identifier='job:2', queue_name='myqueue', priority=2) # Create a worker for the queue used previously, asking to call the # "do_stuff" function for each job, and to stop after 2 jobs worker = Worker(queues='myqueue', callback=do_stuff, max_loops=2) # Now really run the jobs worker.run() # Here our jobs are done, our queue is empty queue1 = Queue.get_queue('myqueue', priority=1) queue2 = Queue.get_queue('myqueue', priority=2) # nothing waiting print queue1.waiting.lmembers(), queue2.waiting.lmembers() >>   # two jobs in success (show PKs of jobs) print queue1.success.lmembers(), queue2.success.lmembers() >> ['limpyd_jobs.models.Job:1', 'limpyd_jobs.models.Job:2'] # Check our jobs statuses print job1.status.hget() == STATUSES.SUCCESS >> True print job2.status.hget() == STATUSES.SUCCESS >> True
You notice how it works:
Notice that you can run as much workers as you want, even on the same queue name. Internally, we use the blpop redis command to get jobs atomically.
But you can also run only one worker, having only one queue, doing different stuff in the callback depending on the idenfitier attribute of the job.
Workers are able to catch SIGINT/SIGTERM signals, finishing executing the current job before exiting. Useful if used, for example, with supervisord.
If you want to store more information in a job, queue or error, or want to have a different behavior in a worker, it’s easy because you can create subclasses of everything in limpyd-jobs, the limpyd models or the Worker class.
A Job stores all needed informations about a task to run.
Note: If you want to subclass the Job model to add your own fields, run method, or whatever, note that the class must be at the first level of a python module (ie not in a parent class or function) to work.
A string (InstanceHashField, indexed) to identify the job.
When using the (recommended) add_job class method, you can’t have many jobs with the same identifier in a waiting queue. If you create a new job with an identifier while an other with the same is still in the same waiting queue, what is done depends on the priority of the two jobs: - if the new job has a lower (or equal) priority, it’s discarded - if the new job has a higher priority, the priority of the existing job is updated to the higher.
In both cases the add_job class method returns the existing job, discarding the new one.
A common way of using the identifier is to, at least, store a way to identify the object on which we want the task to apply: - you can have one or more queue for a unique task, and store only the id of an object on the identifier field - you can have one or more queue each doing many tasks, then you may want to store the task too in the identifier field: “task:id”
Note that by subclassing the Job model, you are able to add new fields to a Job to store the task and other needed parameters, as arguments (size for a photo to resize, a message to send…)
A string (InstanceHashField, indexed) to store the actual status of the job.
It’s a single letter but we provide a class to help using it verbosely: STATUSES
from limpyd_jobs import STATUSES print STATUSES.SUCCESS >> "s"
When a job is created via the add_job class method, its status is set to STATUSES.WAITING, or STATUSES.DELAYED if it’is delayed by setting delayed_until. When it selected by the worker to execute it, the status passes to STATUSES.RUNNING. When finished, it’s one of STATUSES.SUCCESS or STATUSES.ERROR. An other available status is STATUSES.CANCELED, useful if you want to cancel a job without removing it from its queue.
You can also display the full string of a status:
print STATUSES.by_value(my_job.status.hget()) >> "SUCCESS"
A string (InstanceHashField, indexed, default = 0) to store the priority of the job.
The priority of a job determines in which Queue object it will be stored. A worker listen for all queues with some names and different priorities, but respecting the priority (reverse) order: the higher the priority, the sooner the job will be executed.
We choose to use the “`”higher priority is better” way of doing things to give the possibility to always add a job in a higher priority than any other ones.
Directly updating the priority of a job will not change the queue in which it’s stored. But when you add a job via the (recommended) add_job class method, if a job with the same identifier exists, its priority will be updated (only if the new one is higher) and the job will be moved to the higher priority queue.
A string (InstanceHashField) to store the date and time (a string representation of datetime.utcnow()) of the time the job was added to its queue.
It’s useful in combination of the end field to calculate the job duration.
A string (InstanceHashField) to store the date and time (a string representation of datetime.utcnow()) of the time the job was fetched from the queue, just before the callback is called.
It’s useful in combination of the end field to calculate the job duration.
A string (InstanceHashField) to store the date and time (a string representation of datetime.utcnow()) of the moment the job was set as finished or in error, just after the has finished.
It’s useful in combination of the start field to calculate the job duration.
A integer saved as a string (InstanceHashField) to store the number of times the job was executed. It can be more than one if it was requeued after an error.
The string representation (InstanceHashField) of a datetime object until when the job may be in the delayed list (a redis sorted-set) of the queue.
It can be set when calling add_job by passing either a delayed_until argument, which must be a datetime, or a delayed_for argument, which must be a number of seconds (int or float) or a timedelta object. The delayed_for argument will be added to the current time (datetime.utcnow()) to compute delayed_until.
If a job is in error after its execution and if the worker has a positive requeue_delay_delta attribute, the delayed_until field will be set accordingly, useful to retry a erroneous job after a certain delay.
This field is set to '1' when it’s currently managed by a queue: waiting, delayed, running. This flag is set when calling enqueue_or_delay, and removed by the worker when the job is canceled, is finished with success, or finished with error and not requeued. It’s this field that is checked to test if the same job already exists when add_job is called.
You must be set this field to a True value (don’t forget that Redis stores Strings, so 0 will be saved as "0" so it will be True… so don’t set it to False or 0 if you want a False value: yo can let it empty) if you don’t want the job to be requeued in case of error.
Note that if you want to do this for all jobs a a class, you may want to set to True the always_cancel_on_error attribute of this class.
When adding jobs via the add_job method, the model defined in this attribute will be used to get or create a queue. It’s set by default to Queue but if you want to update it to your own model, you must subclass the Job model too, and update this attribute.
None by default, can be set when overriding the Job class to avoid passing the queue_name argument to the job’s methods (especially add_job)
Note that if you don’t subclass the Job model, you can pass the queue_model argument to the add_job method.
Set this attribute to True if you want all your jobs of this class not be be requeued in case of error. If you let it to its default value of False, you can still do it job by job by setting their field cancel_on_error to a True value.
The ident property is a string representation of the model + the primary key of the job, saved in queues, allowing the retrieval of the Job.
The must_be_cancelled_on_error property returns a Boolean indicating if, in case of error during its execution, the job must NOT be requeued.
By default it will be False, but there is to way to change this behavior:
The duration property simply returns the time used to compute the job. The return value is a datetime.timedelta object if the start and end fields are set, or None on the other case.
It’s the main method of the job, the only one you must override, to do some tuff when the job is executed by the worker.
The return value of this method will be passed to the job_success of the worker, then, if defined, to the on_success method of the job.
By default a NotImplemented error is raised.
The requeue method allow a job to be put back in the waiting (or delayed) queue when its execution failed.
It’s the method, called in add_job and requeue that will either put the job in the waiting or delayed queue, depending of delayed_until. If this argument is defined and in the future, the job is delayed, else it’s simply queued.
This method also set the queued flag of the job to '1'.
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called when the job is fetched by the worker and about to be executed (“waiting” status)
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called by the worker when the job’s execution was a success (it did not raise any exception).
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called by the worker when the job’s execution failed (an exception was raised)
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called when the job, just fetched by the worker, could not be executed because of its status, not “waiting”. Another possible reason is that the job was canceled during its execution (by settings its status to STATUSES.CANCELED)
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called by the worker when the job failed and has been requeued by the worker.
This method, if defined on your job model (it’s not there by default, ie “ghost”) is called by the worker when the job was delayed (by settings its status to STATUSES.DELAYED) during its execution (note that you may also want to set the delayed_until of the job value to a correct one datetime (a string represetation of an utc datetime), or the worker will delay it for 60 seconds).
It can also be called if the job’s status was set to STATUSES.DELAYED while still in the waiting list of the queue.
The add_job class method is the main (and recommended) way to create a job. It will check if a job with the same identifier already exists in a queue (not finished) and if one is found, update its priority (and move it in the correct queue). If no existing job is found, a new one will be created and added to a queue.
If you use a subclass of the Job model, you can pass additional arguments to the add_job method simply by passing them as named arguments, they will be save if a new job is created (but not if an existing job is found in a waiting queue)
Returns the string representation of the model, used to compute the ident property of a job.
Returns a job from a string previously got via the ident property of a job.
A Queue stores a list of waiting jobs with a given priority, and keep a list of successful jobs and ones on error.
A string (InstanceHashField, indexed), used by the add_job method to find the queue in which to store it. Many queues can have the same names, but different priorities.
This name is also used by a worker to find which queues it needs to wait for.
A string (InstanceHashField, indexed, default = 0), to store the priority of a queue’s jobs. All jobs in a queue are considered having this priority. It’s why, as said for the property fields of the Job model, changing the property of a job doesn’t change its real property. But adding (via the add_job class method of the Job model) a new job with the same identifier for the same queue’s name can update the job’s priority by moving it to another queue with the correct priority.
As already said, the higher the priority, the sooner the jobs in a queue will be executed. If a queue has a priority of 2, and another queue of the same name has a priority of 0, or 1, all jobs in the one with the priority of 2 will be executed (at least fetched) before the others, regardless of the number of workers.
A list (ListField) to store the primary keys of job in the waiting status. It’s a fifo list: jobs are appended to the right (via rpush), and fetched from the left (via blpop)
When fetched, a job from this list is executed, then pushed in the success or error list, depending if the callback raised an exception or not. If a job in this waiting list is not in the waiting status, it will be skipped by the worker.
A list (ListField) to store the primary keys of jobs fetched from the waiting list and successfully executed.
A list (ListField) to store the primary keys of jobs fetched from the waiting list for which the execution failed.
A sorted set (SortedSetField) to store delayed jobs, ones having a delayed_until datetime in the future. The timestamp representation of the delayed_until field is used as the score for this sorted-set, to ease the retrieval of jobs that are now ready.
The Queue model has no specific attributes.
Returns a tuple representing the first job to be ready in the delayed queue. It’s a tuple with the job’s pk and the timestamp representation of it’s delayed_until value (it’s the score of the sorted_set).
Returns None if the delayed queue is empty.
Return the timestamp representation of the first delayed job to be ready, or None if the delayed queue is empty.
Put a job in the delayed queue.
Put a job in the waiting list.
This method will check for all jobs in the delayed queue that are now ready to be executed and put them back in the waiting list.
This method will return the list of failures, each failure being a tuple with the value returned by the ident property of a job, and the message of the raised exception causing the failure.
Not that the status of the jobs is changed only if their status was STATUSES.DELAYED. It allows to cancel a delayed job before.
The get_queue class method is the recommended way to get a Queue object. Given a name and a priority, it will return the found queue or create a queue if no matching one exist.
If you use a subclass of the Queue model, you can pass additional arguments to the get_queue method simply by passing them as named arguments, they will be saved if a new queue is created (but not if an existing queue is found)
The get_waiting_keys class method returns all the existing (waiting) queues with the given names, sorted by priority (reverse order: the highest priorities come first), then by names. The returned value is a list of redis keys for each waiting lists of matching queues. It’s used internally by the workers as argument to the blpop redis command.
The count_waiting_jobs class method returns the number of jobs still waiting for the given queue names, combining all priorities.
The count_delayed_jobs class method returns the number of jobs still delayed for the given queue names, combining all priorities.
The get_all class method returns a list of queues for the given names.
The get_all_by_priority class method returns a list of queues for the given names, ordered by priorities (the highest priority first), then names.
The Error model is used to store errors from the jobs that are not successfully executed by a worker.
Its main purpose is to be able to filter errors, by queue name, job model, job identifier, date, exception class name or code. You can use your own subclass of the Error model and then store additional fields, and filter on them.
A string (InstanceHashField, indexed) to store the string representation of the job’s model.
A string (InstanceHashField, indexed) to store the primary key of the job which generated the error.
A string (InstanceHashField, indexed) to store the identifier of the job that failed.
A string (InstanceHashField, indexed) to store the name of the queue the job was in when it failed.
A string (InstanceHashField, indexed) to store the date (only the date, not the time) of the error (a string representation of datetime.utcnow().date()). This field is indexed so you can filter errors by date, useful to graph errors.
A string (InstanceHashField) to store the time (only the time, not the date) of the error (a string representation of datetime.utcnow().time()).
A string (InstanceHashField, indexed) to store the type of error. It’s the class’ name of the originally raised exception.
A string (InstanceHashField, indexed) to store the value of the code attribute of the originally raised exception. Nothing is stored here if there is no such attribute.
A string (InstanceHashField) to store the string representation of the originally raised exception.
A string (InstanceHashField) to store the string representation of the traceback of the originally raised exception (the worker may not have filled it)
This property returns a datetime object based on the content of the date and time fields of an Error object.
The add_error class method is the main (and recommended) way to add an entry on the Error model, by accepting simple arguments that will be break down (job becomes identifier and job_pk, when becomes date and time, error becomes code and message)
queue_name The name of the queue the job came from.
job The job which generated the error, from which we’ll extract job_pk and identifier
error An exception from which we’ll extract the code and the message.
when=None A datetime object from which we’ll extract the date and time.
If not filled, datetime.utcnow() will be used.
trace=None The traceback, stringyfied, to store.
If you use a subclass of the Error model, you can pass additional arguments to the add_error method simply by passing them as named arguments, they will be save in the object to be created.
The collection_for_job is a helper to retrieve the errors assiated with a given job, more precisely for all the instances of this job with the same identifier.
The result is a limpyd collection, to you can use filter, instances… on it.
The Worker class does all the logic, working with Queue and Job models.
The main behavior is: - reading queue keys for the given names - waiting for a job available in the queues - executing the job - manage success or error - exit after a defined number of jobs or a maximum duration (if defined), or when a SIGINT/SIGTERM signal is caught
The class is split in many short methods so that you can subclass it to change/add/remove whatever you want.
Each of the following worker’s attributes can be set by an argument in the constructor, using the exact same name. It’s why the two are described here together.
Names of the queues to work with. It can be a list/tuple of strings, or a string with names separated by a comma (no spaces), or without comma for a single queue.
Note that all queues must be from the same queue_model.
Default to None, but if not set and not defined in a subclass, will raise an LimpydJobsException.
The model to use for queues. By default it’s the Queue model included in limpyd_jobs, but you can use a subclass of the default model to add fields, methods…
The model to use for saving errors. By default it’s the Error model included in limpyd_jobs, but you can use a subclass of the default model to add fields, methods…
limpyd_jobs uses the python logging module, so this is the name to use for the logger created for the worker. The default value is LOGGER_NAME, with LOGGER_NAME defined in limpyd_jobs.workers with a value of “limpyd-jobs”.
It’s the level set for the logger created with the name defined in logger_name, default to logging.INFO.
A boolean, default to True, to indicate if we have to save errors in the Error model (or the one defined in error_model) when the execution of the job is not successful.
A boolean, default to True, to indicate if we have to save the tracebacks of exceptions in the Error model (or the one defined in error_model) when the execution of the job is not successful (and only if save_errors is True)
The max number of loops (fetching + executing a job) to do in the worker lifetime, default to 1000. Note that after this number of loop, the worker ends (the run method cannot be executed again)
The aim is to avoid memory leaks become too important.
If defined, the worker will end when its run method was called for at least this number of seconds. By default it’s set to None, saying there is no maximum duration.
To avoid interrupting the execution of a job, if terminate_gracefully is set to True (the default), the SIGINT and SIGTERM signals are caught, asking the worker to exit when the current jog is done.
The callback is the function to run when a job is fetched. By default it’s the execute method of the worker (which calls the run method of jobs, which, if not overridden, raises a NotImplemented error) , but you can pass any function that accept a job and a queue as argument.
Using the queue’s name, and the job’s identifier+model (via job.ident), you can manage many actions depending on the queue if needed.
If this callback (or the execute method) raises an exception, the job is considered in error. In the other case, it’s considered successful and the return value is passed to the job_success method, to let you do what you want with it.
The timeout is used as parameter to the blpop redis command we use to fetch jobs from waiting lists. It’s 30 seconds by default but you can change it to any positive number (in seconds). You can set it to 0 if you don’t want any timeout be applied to the blpop command.
It’s better to always set a timeout, to reenter the main loop and call the must_stop method to see if the worker must exit. Note that the number of loops is not updated in the case of the timeout occurred, so a little timeout won’t alter the number of loops defined by max_loops.
The fetch_priorities_delay is the delay between two fetches of the list of priorities for the current worker.
If a job was added with a priority that did not exist when the worker run was started, it will not be taken into account until this delay expires.
Note that if this delay is, say, 5 seconds (it’s 25 by default), and the timeout parameter is 30, you may wait 30 seconds before the new priority fetch because if there is no jobs in the priority queues actually managed by the worker, the time is in the redis hands.
The fetch_delayed_delay is the delay between two fetches of the delayed jobs that are now ready in the queues managed by the worker.
Note that if this delay is, say, 5 seconds (it’s 25 by default), and the timeout parameter is 30, you may wait 30 seconds before the new delayed fetch because if there is no jobs in the priority queues actually managed by the worker, the time is in the redis hands.
It’s the number of times a job will be requeued when its execution results in a failure. It will then be put back in the same queue.
This attribute is 0 by default so by default a job won’t be requeued.
This number will be added to the current priority of the job that will be requeued. By default it’s set to -1 to decrease the priority at each requeue.
It’s a number of seconds to wait before adding back an erroneous job in the waiting queue, set by default to 30: when a job failed to execute, it’s put in the delayed queue for 30 seconds then it’ll be put back in the waiting queue (depending on the fetch_delayed_delay attribute)
In case on subclassing, you can need these attributes, created and defined during the use of the worker:
A list of keys of queues waiting lists, which are listened by the worker for new jobs. Filled by the update_keys method.
The current status of the worker. None by default until the run method is called, after what it’s set to "starting" while getting for an available queue. Then it’s set to "waiting" while the worker waits for new jobs. When a job is fetched, the status is set to "running". And finally, when the loop is over, it’s set to "terminated".
If the status is not None, the run method cannot be called.
The logger (from the logging python module) defined by the set_logger method.
The number of loops done by the worker, incremented each time a job is fetched from a waiting list, even if the job is skipped (bad status…), or in error. When this number equals the max_loops attribute, the worker ends.
When True, ask for the worker to terminate itself after executing the current job. It can be set to True manually, or when a SIGINT/SIGTERM signal is caught.
This boolean is set to True when a SIGINT/SIGTERM is caught (only if the terminate_gracefully is True)
None by default, set to datetime.utcnow() when the run method starts.
None by default, set to datetime.utcnow() when the run method ends.
None by default, it’s computed to know when the worker must stop based on the start_date and max_duration. It will always be None if no max_duration is defined.
It’s a property, not an attribute, to get the current connection to the redis server.
It’s a tuple holding all parameters accepted by the worker’s constructor
parameters = ('queues', 'callback', 'queue_model', 'error_model', 'logger_name', 'logger_level', 'save_errors', 'save_tracebacks', 'max_loops', 'max_duration', 'terminate_gracefuly', 'timeout', 'fetch_priorities_delay', 'fetch_delayed_delay', 'requeue_times', 'requeue_priority_delta', 'requeue_delay_delta')
As said before, the Worker class in spit in many little methods, to ease subclassing. Here is the list of public methods:
def __init__(self, queues=None, **kwargs):
It’s the constructor (you guessed it ;) ) of the Worker class, expecting all arguments (defined in parameters) that can also be defined as class attributes.
It validates these arguments, prepares the logging and initializes other attributes.
You can override it to add, validate, initialize other arguments or attributes.
It’s called in the constructor if terminate_gracefully is True. It plugs the SIGINT and SIGTERM signal to the catch_end_signal method.
You can override it to catch more signals or do some checked before plugging them to the catch_end_signal method.
It’s called at the end of the run method, as we don’t need to catch the SIGINT and SIGTERM signals anymore. It’s useful when launching a worker in a python shell to finally let the shell handle these signals. Useless in a script because the script is finished when the run method exits.
It’s called in the constructor to initialize the logger, using logger_name and logger_level, saving it in self.logger.
It’s called on the main loop, to exit it on some conditions: an end signal was caught, the max_loops number was reached, or end_forced was set to True.
Returns a tuple with a queue and a job
This method is called during the loop, to wait for an available job in the waiting lists. When one job is fetched, returns the queue (an instance of the model defined by queue_model) on which the job was found, and the job itself.
def get_job(self, job_ident):
Returns a job.
Called during wait_for_job to get a real job object based on the job’s ident (model + pk) fetched from the waiting lists.
def get_queue(self, queue_redis_key):
Returns a Queue.
Called during wait_for_job to get a real queue object (an instance of the model defined by queue_model) based on the key returned by redis telling us in which list the job was found. This key is not the primary key of the queue, but the redis key of it’s waiting field.
def catch_end_signal(self, signum, frame):
It’s called when a SIGINT/SIGTERM signal is caught. It’s simply set end_signal_caught and end_forced to True, to tell the worker to terminate as soon as possible.
def execute(self, job, queue):
Returns nothing by default.
This method is called if no callback argument is provided when initiating the worker and call the run method of the job, which raises a NotImplementedError by default.
If the execution is successful, no return value is attended, but if any, it will be passed to the job_success method. And if an error occurred, an exception must be raised, which will be passed to the job_error method.
Calling this method updates the internal keys attributes, which contains redis keys of the waiting lists of all queues listened by the worker.
It’s actually called at the beginning of the run method, and at intervals depending on fetch_priorities_delay. Note that if a queue with a specific priority doesn’t exist when this method is called, but later, by adding a job with add_job, the worker will ignore it unless this update_keys method was called again (programmatically or by waiting at least fetch_priorities_delay seconds)
It’s the main method of the worker, with all the logic: while we don’t have to stop (result of the must_stop method), fetch a job from redis, and if this job is really in waiting state, execute it, and do something depending of the status of the execution (success, error…).
In addition to the methods that do real stuff (update_keys, wait_for_job), some other methods are called during the execution: run_started, run_ended, about the run, and job_skipped, job_started, job_success and job_error about jobs. You can override these methods in subclasses to adapt the behavior depending on your needs.
This method is called in the run method after the keys are computed using update_keys, just before starting the loop. By default it does nothing but a log.info.
This method is called just before exiting the run method. By default it does nothing but a log.info.
def job_skipped(self, job, queue):
When a job is fetched in the run method, its status is checked. If it’s not STATUSES.WAITING, this job_skipped method is called, with two main arguments: the job and the queue in which it was found.
This method is also called when the job is canceled during its execution (ie if, when the execution is done, the job’s status is STATUSES.CANCELED).
This method remove the queued flag of the job, logs the message returned by the job_skipped_message method, then call, if defined, the on_skipped method of the job.
def job_skipped_message(self, job, queue):
Returns a string to be logged in job_skipped.
def job_started(self, job, queue):
When the job is fetched and its status verified (it must be STATUSES.WAITING), the job_started method is called, just before the callback (or the execute method if no callback is defined), with the job and the queue in which it was found.
This method updates the start and status fields of the job, then log the message returned by job_started_message and finally call, if defined, the on_started method of the job.
def job_started_message(self, job, queue):
Returns a string to be logged in job_started.
def job_success(self, job, queue, job_result):
When the callback (or the execute method) is finished, without having raised any exception, the job is considered successful, and the job_success method is called, with the job and the queue in which it was found, and the return value of the callback method.
Note that this method is not called, and so the job not considered a “success” if, when the execution is done, the status of the job is either STATUS.CANCELED or STATUS.DELAYED. In these cases, the methods job_skipped and job_delayed are called respectively.
This method remove the queued flag of the job, updates its end and status fields, moves the job into the success list of the queue, then log the message returned by job_success_message and finally call, if defined, the on_success method of the job.
def job_success_message(self, job, queue, job_result):
Returns a string to be logged in job_success.
def job_delayed(self, job, queue):
When the callback (or the execute method) is finished, without having raised an exception, and the status of the job at this moment is STATUSES.DELAYED, the job is not successful but not in error: it will be delayed.
Another way to have this method called if its a job is in the waiting queue but its status was set to STATUSES.DELAYED. In this cas, the job is not executed, but delayed by calling this method.
This method check if the job has a delayed_until value, and if not, or if an invalid one, it is set to 60 seconds in the future. You may want to explicitly set this value, or at least clear the field because if the job was initially delayed, the value may be set, but in the past, and the job will be delayed to this date, so, not delayed but just queued.
With this value, the method enqueue_or_delay of the queue is called, to really delay the job.
Then, log the message returned by job_delayed_message and finally call, if defined, the on_delayed method of the job.
def job_delayed_message(self, job, queue):
Returns a string to be logged in job_delayed.
def job_error(self, job, queue, exception, trace=None):
When the callback (or the execute method) is terminated by raising an exception, the job_error method is called, with the job and the queue in which it was found, and the raised exception and, if save_tracebacks is True, the traceback.
This method remove the queued flag of the job if it is no to be requeued, updates its end and status fields, moves the job into the error list of the queue, adds a new error object (if save_errors is True), then log the message returned by job_error_message and call the on_error method of the job is called, if defined.
And finally, if the must_be_cancelled_on_error property of the job is False, and the requeue_times worker attribute allows it (considering the tries attribute of the job, too), the requeue_job method is called.
def job_error_message(self, job, queue, to_be_requeued_exception, trace=None):
Returns a string to be logged in job_error.
def job_requeue_message(self, job, queue):
Returns a string to be logged in job_error when the job was requeued.
def additional_error_fields(self, job, queue, exception, trace=None):
Returns a dictionary of fields to add to the error object, empty by default.
This method is called by job_error to let you define a dictionary of fields/values to add to the error object which will be created, if you use a subclass of the Error model, defined in error_model.
To pass these additional fields to the error object, you have to override this method in your own subclass.
def requeue_job(self, job, queue, priority, delayed_for=None):
This method is called to requeue the job when its execution failed, and will call the requeue method of the job, then its requeued one, and finally will log the message returned by job_requeue_message.
It’s a property returning a string identifying the current worker, used in logging to distinct log entries for each worker.
It’s a property returning, when running the time elapsed since when the run started. When the run method ends, it’s the time between start_date and end_date.
If the run method is not called, it will be set to None.
def log(self, message, level='info'):
log is a simple wrapper around self.logger, which automatically add the id of the worker at the beginning. It can accepts a level argument which is info by default.
def set_status(self, status):
set_status simply update the worker’s status field.
Returns the number of jobs in waiting state that can be run by this worker.
Returns the number of jobs in the delayed queues managed by this worker.
To help using limpyd_jobs, an executable python script is provided: scripts/worker.py (usable as limpyd-jobs-worker, in your path, when installed from the package)
This script is highly configurable to help you launching workers without having to write a script or customize the one included.
With this script you don’t have to write a custom worker too, because all arguments attended by a worker can be passed as arguments to the script.
The script is based on a WorkerConfig class defined in limpyd_jobs.workers, that you can customize by subclassing it, and you can tell the script to use your class instead of the default one.
You can even pass one or many python paths to add to sys.path.
This script is designed to ease you as much as possible.
Instead of explaining all arguments, see below the result of the --help command for this script:
$ limpyd-jobs-worker --help Usage: worker.py [options] Run a worker using redis-limpyd-jobs Options: --pythonpath=PYTHONPATH A directory to add to the Python path, e.g. --pythonpath=/my/module --worker-config=WORKER_CONFIG The worker config class to use, e.g. --worker- config=my.module.MyWorkerConfig, default to limpyd_jobs.workers.WorkerConfig --print-options Print options used by the worker, e.g. --print-options --dry-run Won't execute any job, just starts the worker and finish it immediatly, e.g. --dry-run --queues=QUEUES Name of the Queues to handle, comma separated e.g. --queues=queue1,queue2 --queue-model=QUEUE_MODEL Name of the Queue model to use, e.g. --queue- model=my.module.QueueModel --error-model=ERROR_MODEL Name of the Error model to use, e.g. --queue- model=my.module.ErrorModel --worker-class=WORKER_CLASS Name of the Worker class to use, e.g. --worker- class=my.module.WorkerClass --callback=CALLBACK The callback to call for each job, e.g. --worker- class=my.module.callback --logger-name=LOGGER_NAME The base name to use for logging, e.g. --logger-base- name="limpyd-jobs.%s" --logger-level=LOGGER_LEVEL The level to use for logging, e.g. --worker-class=ERROR --save-errors Save job errors in the Error model, e.g. --save-errors --no-save-errors Do not save job errors in the Error model, e.g. --no- save-errors --save-tracebacks Save exception tracebacks on job error in the Error model, e.g. --save-tracebacks --no-save-tracebacks Do not save exception tracebacks on job error in the Error model, e.g. --no-save-tracebacks --max-loops=MAX_LOOPS Max number of jobs to run, e.g. --max-loops=100 --max-duration=MAX_DURATION Max duration of the worker, in seconds (None by default), e.g. --max-duration=3600 --terminate-gracefuly Intercept SIGTERM and SIGINT signals to stop gracefuly, e.g. --terminate-gracefuly --no-terminate-gracefuly Do NOT intercept SIGTERM and SIGINT signals, so don't stop gracefuly, e.g. --no-terminate-gracefuly --timeout=TIMEOUT Max delay (seconds) to wait for a redis BLPOP call (0 for no timeout), e.g. --timeout=30 --fetch-priorities-delay=FETCH_PRIORITIES_DELAY Min delay (seconds) to wait before fetching new priority queues, e.g. --fetch-priorities-delay=20 --fetch-delayed-delay=FETCH_DELAYED_DELAY Min delay (seconds) to wait before updating delayed jobs, e.g. --fetch-delayed-delay=20 --requeue-times=REQUEUE_TIMES Number of time to requeue a failing job (default to 0), e.g. --requeue-times=5 --requeue-priority-delta=REQUEUE_PRIORITY_DELTA Delta to add to the actual priority of a failing job to be requeued (default to -1, ie one level lower), e.g. --requeue-priority-delta=-2 --requeue-delay-delta=REQUEUE_DELAY_DELTA How much time (seconds) to delay a job to be requeued (default to 30), e.g. --requeue-delay-delta=15 --database=DATABASE Redis database to use (host:port:db), e.g. --database=localhost:6379:15 --no-title Do not update the title of the worker's process, e.g. --no-title --version show program's version number and exit -h, --help show this help message and exit
Except for --pythonpath, --worker-config, --print-options,--dry-run, --worker-class and --no-title, all options will be passed to the worker.
So, if you use the default models, the default worker with its default options, and to launch a worker to work on the queue “queue-name”, all you need to do is:
limpyd-jobs-worker --queues=queue-name --callback=python.path.to.callback
We use the setproctitle module to display useful informations in the process name, to have stuff like this:
limpyd-jobs-worker#1566090 [init] queues=foo,bar limpyd-jobs-worker#1566090 [starting] queues=foo,bar loop=0/1000 waiting=10 delayed=0 limpyd-jobs-worker#1566090 [running] queues=foo,bar loop=1/1000 waiting=9 delayed=2 duration=0:00:15 limpyd-jobs-worker#1566090 [terminated] queues=foo,bar loop=10/1000 waiting=0 delayed=0 duration=0:12:27
You can disable it by passing the --no-title argument.
Note that if no logging handler is set for the logger-name, a StreamHandler formatter will be automatically added by the script, given logs like:
 2013-10-02 00:51:24,158 (limpyd-jobs) WARNING  [test|job:1] job skipped (current status: SUCCESS)
(the format used is "[%(process)d] %(asctime)s (%(name)s) %(levelname)-8s %(message)s")
Sometimes you may want to do some initialization work before even loading the Worker class, for example, using django, to add django.setup()
For this, simple override the WorkerConfig class:
import django from limpyd_jobs.workers import WorkerConfig class MyWorkerConfig(WorkerConfig): def __init__(self, argv=None): django.setup() super(MyWorkerConfig, self).__init__(argv)
And pass the python path to this class using the --worker-config option to the limpyd-jobs-worker script.
The redis-limpyd-jobs package is fully tested (coverage: 100%).
To run the tests, which are not installed via the setup.py file, you can do:
$ python run_tests.py [...] Ran 136 tests in 19.353s OK
Or if you have nosetests installed:
$ nosetests [...] Ran 136 tests in 20.471s OK
The nosetests configuration is provided in the setup.cfg file and include the coverage, if nose-cov is installed.
you can see a full example in example.py (in the source, not in the installed package)
to use limpyd_jobs models on your own redis database instead of the default one (localhost:6379:db=0), simply use the use_database method of the main model:
from limpyd.contrib.database import PipelineDatabase from limpyd_jobs.models import BaseJobsModel database = PipelineDatabase(host='localhost', port=6379, db=15) BaseJobsModel.use_database(database)
or simply change the connection settings:
from limpyd_jobs.models import BaseJobsModel BaseJobsModel.database.connect(host='localhost', port=6379, db=15)
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|File Name & Checksum SHA256 Checksum Help||Version||File Type||Upload Date|
|redis_limpyd_jobs-0.1.5-py2-none-any.whl (53.9 kB) Copy SHA256 Checksum SHA256||2.7||Wheel||Dec 25, 2016|
|redis-limpyd-jobs-0.1.5.tar.gz (66.6 kB) Copy SHA256 Checksum SHA256||–||Source||Dec 25, 2016|