Skip to main content

Data pipeline library for civic.band - manages fetching, processing, and deploying civic meeting data

Project description

clerk

A Python library for managing civic data pipelines for civic.band. Clerk handles the complete workflow of fetching, processing, and deploying civic meeting data including minutes and agendas.

Tests Lint Python 3.12+

Features

  • Distributed Task Queue: RQ-based distributed processing with parallel OCR and horizontal scaling
  • Site Management: Create and manage civic sites with metadata
  • Data Pipeline: Automated fetch → OCR → compilation → deploy workflow
  • Plugin System: Extensible architecture using pluggy for custom fetchers and deployers
  • Full-Text Search: Automatic FTS index generation for searchable meeting data
  • Database Management: SQLite-based storage with per-site and aggregate databases
  • Observability: Structured logging with Loki integration

Quick Install

pip install "civicband-clerk[pdf,extraction] @ git+https://github.com/civicband/clerk.git"

Documentation

Setup

Operations

Reference

Guides

Quick Start

# Create a new site
clerk new

# Update a site (enqueues fetch → OCR → compilation → deploy)
clerk update --subdomain example.civic.band

# Check status
clerk status

See Your First Site Tutorial for a complete walkthrough.

Architecture

Clerk uses a distributed task queue (RQ) with specialized worker types:

  • fetch - Download meeting data from city websites
  • ocr - Extract text from PDFs (parallel, CPU-intensive)
  • compilation - Build databases and coordinate pipeline
  • extraction - Entity and vote extraction (optional, memory-intensive)
  • deploy - Upload to storage/CDN

Workers can run on a single machine or distributed across multiple machines for better performance.

See Worker Architecture Guide for details.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

BSD 3-Clause License - See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

civicband_clerk-0.1.2.tar.gz (463.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

civicband_clerk-0.1.2-py3-none-any.whl (76.2 kB view details)

Uploaded Python 3

File details

Details for the file civicband_clerk-0.1.2.tar.gz.

File metadata

  • Download URL: civicband_clerk-0.1.2.tar.gz
  • Upload date:
  • Size: 463.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for civicband_clerk-0.1.2.tar.gz
Algorithm Hash digest
SHA256 bb3c42ce2e3b083a740d2ed7a0b808f7a779f0d997a12ec565429afc975f28fa
MD5 a5216b0c643af457e02640d32f469df9
BLAKE2b-256 ddd414cd5a7fa23f89be3f5ca5bc354f533621af8bd4330e4054cbb0ed318e37

See more details on using hashes here.

Provenance

The following attestation bundles were made for civicband_clerk-0.1.2.tar.gz:

Publisher: release.yml on civicband/clerk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file civicband_clerk-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for civicband_clerk-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 425b1ed8eb642f25b7bcf5416ca36d2681c05c539b575ace78716715c8702a1c
MD5 eaabb3a2fd834473b60a9edbe5e442af
BLAKE2b-256 5649877fb36e09fcde4dd420ba0ad311c954dff8e708b6c3c27422059bce0972

See more details on using hashes here.

Provenance

The following attestation bundles were made for civicband_clerk-0.1.2-py3-none-any.whl:

Publisher: release.yml on civicband/clerk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page