Data pipeline library for civic.band - manages fetching, processing, and deploying civic meeting data
Project description
clerk
A Python library for managing civic data pipelines for civic.band. Clerk handles the complete workflow of fetching, processing, and deploying civic meeting data including minutes and agendas.
Features
- Distributed Task Queue: RQ-based distributed processing with parallel OCR and horizontal scaling
- Site Management: Create and manage civic sites with metadata
- Data Pipeline: Automated fetch → OCR → compilation → deploy workflow
- Plugin System: Extensible architecture using pluggy for custom fetchers and deployers
- Full-Text Search: Automatic FTS index generation for searchable meeting data
- Database Management: SQLite-based storage with per-site and aggregate databases
- Observability: Structured logging with Loki integration
Quick Install
pip install "civicband-clerk[pdf,extraction] @ git+https://github.com/civicband/clerk.git"
Documentation
Setup
- macOS Setup - Complete installation guide for macOS
- Linux Setup - Complete installation guide for Linux
- Single-Machine Workers - Configure workers on one machine
- Distributed Workers - Scale across multiple machines
- Verification - Test your setup
Operations
- Daily Tasks - Common operational tasks
- Monitoring - Health checks and metrics
- Troubleshooting - Fix common issues
- Scaling - Add workers and scale horizontally
Reference
- CLI Reference - Complete command-line reference
- Python API - Python library reference
- Plugin API - Plugin development guide
Guides
- Your First Site - Complete beginner tutorial
- Worker Architecture - Understanding task queues
- Custom Fetcher - Build a fetcher plugin
- Production Checklist - Pre-launch validation
Quick Start
# Create a new site
clerk new
# Update a site (enqueues fetch → OCR → compilation → deploy)
clerk update --subdomain example.civic.band
# Check status
clerk status
See Your First Site Tutorial for a complete walkthrough.
Architecture
Clerk uses a distributed task queue (RQ) with specialized worker types:
- fetch - Download meeting data from city websites
- ocr - Extract text from PDFs (parallel, CPU-intensive)
- compilation - Build databases and coordinate pipeline
- extraction - Entity and vote extraction (optional, memory-intensive)
- deploy - Upload to storage/CDN
Workers can run on a single machine or distributed across multiple machines for better performance.
See Worker Architecture Guide for details.
Contributing
See CONTRIBUTING.md for development setup and guidelines.
License
BSD 3-Clause License - See LICENSE for details.
Links
- Documentation: docs/
- Issues: https://github.com/civicband/clerk/issues
- civic.band: https://civic.band
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file civicband_clerk-0.1.2.tar.gz.
File metadata
- Download URL: civicband_clerk-0.1.2.tar.gz
- Upload date:
- Size: 463.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bb3c42ce2e3b083a740d2ed7a0b808f7a779f0d997a12ec565429afc975f28fa
|
|
| MD5 |
a5216b0c643af457e02640d32f469df9
|
|
| BLAKE2b-256 |
ddd414cd5a7fa23f89be3f5ca5bc354f533621af8bd4330e4054cbb0ed318e37
|
Provenance
The following attestation bundles were made for civicband_clerk-0.1.2.tar.gz:
Publisher:
release.yml on civicband/clerk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
civicband_clerk-0.1.2.tar.gz -
Subject digest:
bb3c42ce2e3b083a740d2ed7a0b808f7a779f0d997a12ec565429afc975f28fa - Sigstore transparency entry: 1437115416
- Sigstore integration time:
-
Permalink:
civicband/clerk@9b79bdbd39278d50100c09b318eb91aaea8164e8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/civicband
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b79bdbd39278d50100c09b318eb91aaea8164e8 -
Trigger Event:
workflow_dispatch
-
Statement type:
File details
Details for the file civicband_clerk-0.1.2-py3-none-any.whl.
File metadata
- Download URL: civicband_clerk-0.1.2-py3-none-any.whl
- Upload date:
- Size: 76.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
425b1ed8eb642f25b7bcf5416ca36d2681c05c539b575ace78716715c8702a1c
|
|
| MD5 |
eaabb3a2fd834473b60a9edbe5e442af
|
|
| BLAKE2b-256 |
5649877fb36e09fcde4dd420ba0ad311c954dff8e708b6c3c27422059bce0972
|
Provenance
The following attestation bundles were made for civicband_clerk-0.1.2-py3-none-any.whl:
Publisher:
release.yml on civicband/clerk
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
civicband_clerk-0.1.2-py3-none-any.whl -
Subject digest:
425b1ed8eb642f25b7bcf5416ca36d2681c05c539b575ace78716715c8702a1c - Sigstore transparency entry: 1437115419
- Sigstore integration time:
-
Permalink:
civicband/clerk@9b79bdbd39278d50100c09b318eb91aaea8164e8 -
Branch / Tag:
refs/heads/main - Owner: https://github.com/civicband
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@9b79bdbd39278d50100c09b318eb91aaea8164e8 -
Trigger Event:
workflow_dispatch
-
Statement type: