Skip to main content

A lightweight job queue system with SQLite backend

Project description

GigQ

GigQ is a lightweight job queue system with SQLite as its backend. It's designed for managing and executing small jobs ("gigs") locally with atomicity guarantees, particularly suited for processing tasks like GitHub Archive data, without the complexity of distributed job systems.

Features

  • Simple Job Definition & Management

    • Define small jobs with parameters, priority, and basic dependencies
    • Organize jobs into simple workflows
    • Enable job cancellation and status checking
  • SQLite State Storage

    • Maintain job states in SQLite (pending, running, completed, failed)
    • Use transactions to ensure state consistency
    • Simple, efficient schema design optimized for local usage
    • Handle SQLite locking appropriately for local concurrency
  • Lightweight Concurrency

    • Prevent duplicate job execution using SQLite locking mechanisms
    • Support a modest number of workers processing jobs simultaneously
    • Implement transaction-based state transitions
    • Handle worker crashes and job recovery
  • Basic Recovery

    • Configurable retry for failed jobs with backoff
    • Timeout detection for hung jobs
    • Simple but effective error logging
  • CLI Interface

    • Submit and monitor jobs
    • View job queue and history
    • Simple worker management commands

Project Structure

The GigQ library is organized as follows: gigq/ # Root project directory ├── gigq/ # Main package directory │ ├── init.py # Package initialization (exports main classes) │ ├── core.py # Core implementation (Job, JobQueue, Worker, Workflow) │ └── cli.py # Command-line interface ├── examples/ # Example applications │ ├── init.py # Empty file to make examples a package │ └── github_archive.py # GitHub Archive processing example ├── tests/ # Test directory │ ├── init.py # Empty file to make tests a package │ └── test_gigq.py # Test suite ├── README.md # Project documentation ├── LICENSE # MIT License ├── setup.py # Package configuration for installation └── pyproject.toml # Build system requirements (optional)

Installation

pip install gigq

Quick Start

Define and Submit a Job

from gigq import Job, JobQueue, Worker

# Define a job function
def process_data(filename, threshold=0.5):
    # Process some data
    print(f"Processing {filename} with threshold {threshold}")
    return {"processed": True, "count": 42}

# Define a job
job = Job(
    name="process_data_job",
    function=process_data,
    params={"filename": "data.csv", "threshold": 0.7},
    max_attempts=3,
    timeout=300
)

# Create or connect to a job queue
queue = JobQueue("jobs.db")
job_id = queue.submit(job)

print(f"Submitted job with ID: {job_id}")

Start a Worker

# Start a worker
worker = Worker("jobs.db")
worker.start()  # This blocks until the worker is stopped

Or use the CLI:

# Start a worker
gigq --db jobs.db worker

# Process just one job
gigq --db jobs.db worker --once

Check Job Status

# Check job status
status = queue.get_status(job_id)
print(f"Job status: {status['status']}")

Or use the CLI:

gigq --db jobs.db status your-job-id

Creating Workflows

GigQ allows you to create workflows of dependent jobs:

from gigq import Workflow

# Create a workflow
workflow = Workflow("data_processing")

# Add jobs with dependencies
job1 = Job(name="download", function=download_data, params={"url": "https://example.com/data.csv"})
job2 = Job(name="process", function=process_data, params={"filename": "data.csv"})
job3 = Job(name="analyze", function=analyze_data, params={"processed_file": "processed.csv"})

# Add jobs to workflow with dependencies
workflow.add_job(job1)
workflow.add_job(job2, depends_on=[job1])
workflow.add_job(job3, depends_on=[job2])

# Submit all jobs in the workflow
job_ids = workflow.submit_all(queue)

CLI Usage

GigQ comes with a command-line interface for common operations:

# Submit a job
gigq submit my_module.my_function --name "My Job" --param "filename=data.csv" --param "threshold=0.7"

# List jobs
gigq list
gigq list --status pending

# Check job status
gigq status your-job-id --show-result

# Cancel a job
gigq cancel your-job-id

# Requeue a failed job
gigq requeue your-job-id

# Start a worker
gigq worker

# Clear completed jobs
gigq clear
gigq clear --before 7  # Clear jobs completed more than 7 days ago

Example: GitHub Archive Processing

See the examples/github_archive.py script for a complete example of using GigQ to process GitHub Archive data.

Technical Details

SQLite Schema

GigQ uses a simple SQLite schema with two main tables:

  1. jobs - Stores job definitions and current state
  2. job_executions - Tracks individual execution attempts

The schema is designed for simplicity and efficiency with appropriate indexes for common operations.

Concurrency Handling

GigQ uses SQLite's built-in locking mechanisms to ensure safety when multiple workers are running. Each worker claims jobs using an exclusive transaction, preventing duplicate execution.

Error Handling

Failed jobs can be automatically retried up to a configurable number of times. Detailed error information is stored in the database for debugging. Jobs that exceed their timeout are automatically detected and marked as failed or requeued.

Development and Contribution

For local development:

  1. Clone the repository
  2. Create a virtual environment
  3. Install in development mode: pip install -e .
  4. Run tests: python -m unittest discover tests

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gigq-0.1.0.tar.gz (19.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gigq-0.1.0-py3-none-any.whl (16.6 kB view details)

Uploaded Python 3

File details

Details for the file gigq-0.1.0.tar.gz.

File metadata

  • Download URL: gigq-0.1.0.tar.gz
  • Upload date:
  • Size: 19.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for gigq-0.1.0.tar.gz
Algorithm Hash digest
SHA256 a3e95f0731d93df09470a4dc7478e3c4a2e173d99a80dd67efa34d421b0bcca8
MD5 37ee8863d02913d6c8d98ad79ffe4a16
BLAKE2b-256 3957ec3956bbdf998311ee48792478692d8abc29b11edba14eb3a23d2ba2043d

See more details on using hashes here.

File details

Details for the file gigq-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: gigq-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 16.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for gigq-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 cb0628494dbe0918380d734a9b391b391a19898f6352902cc66a6e6f8ec20e90
MD5 83c94d2c705565fefaefd0154ce0f4a0
BLAKE2b-256 dc31dfb09340427807d2338d3dcc0d828d5a3906965db510a53777cd50062e5e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page