Skip to main content

Data pipeline library for civic.band - manages fetching, processing, and deploying civic meeting data

Project description

clerk

A Python library for managing civic data pipelines for civic.band. Clerk handles the complete workflow of fetching, processing, and deploying civic meeting data including minutes and agendas.

Tests Lint Python 3.12+

Features

  • Distributed Task Queue: RQ-based distributed processing with parallel OCR and horizontal scaling
  • Site Management: Create and manage civic sites with metadata
  • Data Pipeline: Automated fetch → OCR → compilation → deploy workflow
  • Plugin System: Extensible architecture using pluggy for custom fetchers and deployers
  • Full-Text Search: Automatic FTS index generation for searchable meeting data
  • Database Management: SQLite-based storage with per-site and aggregate databases
  • Observability: Structured logging with Loki integration

Quick Install

pip install "civicband-clerk[pdf,extraction] @ git+https://github.com/civicband/clerk.git"

Documentation

Setup

Operations

Reference

Guides

Quick Start

# Create a new site
clerk new

# Update a site (enqueues fetch → OCR → compilation → deploy)
clerk update --subdomain example.civic.band

# Check status
clerk status

See Your First Site Tutorial for a complete walkthrough.

Architecture

Clerk uses a distributed task queue (RQ) with specialized worker types:

  • fetch - Download meeting data from city websites
  • ocr - Extract text from PDFs (parallel, CPU-intensive)
  • compilation - Build databases and coordinate pipeline
  • extraction - Entity and vote extraction (optional, memory-intensive)
  • deploy - Upload to storage/CDN

Workers can run on a single machine or distributed across multiple machines for better performance.

See Worker Architecture Guide for details.

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

BSD 3-Clause License - See LICENSE for details.

Links

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

civicband_clerk-0.1.1.tar.gz (464.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

civicband_clerk-0.1.1-py3-none-any.whl (76.0 kB view details)

Uploaded Python 3

File details

Details for the file civicband_clerk-0.1.1.tar.gz.

File metadata

  • Download URL: civicband_clerk-0.1.1.tar.gz
  • Upload date:
  • Size: 464.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for civicband_clerk-0.1.1.tar.gz
Algorithm Hash digest
SHA256 d605866968106d8a5ab90ffb42406b5a8040ccf9b2a03f54c4cefe01ac3282e7
MD5 bf6db45231544b5d198c3fa4edb882a9
BLAKE2b-256 7f462083f74abad0a6a93ca5783f206c58c649115398b59daa1cdee21776c04a

See more details on using hashes here.

Provenance

The following attestation bundles were made for civicband_clerk-0.1.1.tar.gz:

Publisher: release.yml on civicband/clerk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file civicband_clerk-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for civicband_clerk-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 e3c438e9689c36f1506c997f0fdecebf4a7a1eee7ff7b2bd4e5d903668a06775
MD5 9d9bb281b6ae81a542e1ab3f3296a684
BLAKE2b-256 dad3fa41aaffec5122e5be08288ff4503fdca85df8160ac19174043020ac1560

See more details on using hashes here.

Provenance

The following attestation bundles were made for civicband_clerk-0.1.1-py3-none-any.whl:

Publisher: release.yml on civicband/clerk

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page