Skip to main content

Policy document classification powered by LLMs

Project description

cat-pol

Political text classification and analysis powered by LLMs. A policy-specific wrapper around cat-stack with built-in access to 15 political data sources on HuggingFace.

Installation

pip install cat-pol

With optional extras:

pip install "cat-pol[pdf]"         # PDF document processing
pip install "cat-pol[embeddings]"  # Embedding-based similarity scoring
pip install "cat-pol[sources]"     # Data source loading (datasets, huggingface_hub)

Quick Start

Classify ordinances from a built-in source

import cat_pol as pol

results = pol.classify(
    source="city_san_diego",
    categories=["Housing", "Public Safety", "Infrastructure", "Finance"],
    doc_type="ordinance",
    since="2022-01-01",
    n=50,
    api_key="sk-...",
)

Classify raw text

results = pol.classify(
    input_data=[
        "The committee voted to approve the rezoning request for parcel 42.",
        "Motion to table the budget amendment until the next session.",
    ],
    categories=["Approval", "Rejection", "Deferral", "Amendment"],
    document_context="City council meeting minutes",
    api_key="sk-...",
)

Optimize prompts with user feedback

result = pol.prompt_tune(
    source="city_san_diego",
    categories=["Pro-Business", "Pro-Regulation", "Tax Increase", "Tax Decrease"],
    doc_type="ordinance",
    since="2020-01-01",
    n=100,
    api_key="sk-...",
    sample_size=15,
)

# Use the optimized prompt for full classification
results = pol.classify(
    source="city_san_diego",
    categories=["Pro-Business", "Pro-Regulation", "Tax Increase", "Tax Decrease"],
    system_prompt=result["system_prompt"],
    api_key="sk-...",
)

Summarize with different formats

# Bullet points
pol.summarize(source="federal_executive_orders", n=10, format="bullets", api_key="sk-...")

# Full report
pol.summarize(source="federal_laws", n=5, format="report", api_key="sk-...")

# One-liner
pol.summarize(source="social_trump_truth", since="2024-01-01", n=20, format="one-liner", api_key="sk-...")

Discover categories

result = pol.extract(
    source="city_berkeley",
    n=200,
    api_key="sk-...",
)
print(result["top_categories"])

Fetch raw data

# List all sources
pol.list_sources()
pol.list_sources(level="city")
pol.list_sources(level="federal")

# Fetch data
df = pol.fetch_source("city_san_diego", n=100, since="2020-01-01", doc_type="ordinance")
df = pol.fetch_source("federal_executive_orders", n=50)
df = pol.fetch_source("social_trump_truth", since="2024-01-01")

Data Sources

All datasets are public on HuggingFace — no authentication required.

California Cities

Source Rows Types Repo
city_san_diego 87,983 ordinances, resolutions chrissoria/san-diego-ordinances
city_los_angeles 34,427 ordinances chrissoria/la-ordinances
city_berkeley 9,028 ordinances chrissoria/berkeley-ordinances
city_san_francisco 4,033 ordinances chrissoria/sf-ordinances
city_long_beach 3,898 ordinances, resolutions chrissoria/long-beach-ordinances
city_bakersfield 2,655 ordinances chrissoria/bakersfield-ordinances
city_newport_beach 2,719 ordinances chrissoria/newport-beach-ordinances
city_salinas 2,574 ordinances, resolutions chrissoria/salinas-ordinances
city_clovis 2,343 ordinances chrissoria/clovis-ordinances
city_oakland 1,824 ordinances chrissoria/oakland-ordinances
city_fresno 706 ordinances, resolutions chrissoria/fresno-ordinances

Federal

Source Rows Types Repo
federal_laws 5,915 public laws (1995–present) chrissoria/federal-public-laws
federal_executive_orders 1,530+ executive orders chrissoria/executive-orders
federal_speeches 305 SOTU, inaugurals chrissoria/presidential-speeches

Social Media

Source Rows Types Repo
social_trump_truth 32,000+ Truth Social posts chrissoria/trump-truth-social

All sources are updated weekly (Sundays at 9 AM) via automated scrapers.

API

Function Description
classify() Classify text into predefined categories
prompt_tune() Optimize classification prompts via user feedback
extract() Discover and normalize categories from text
explore() Raw category extraction (no deduplication)
summarize() Summarize text with format options (paragraph, bullets, one-liner, structured, report)
list_sources() List available data sources
fetch_source() Fetch raw data from a source

All functions accept either input_data= (raw text, files, directories) or source= (pull from HuggingFace). All cat-stack parameters (multi-model ensemble, batch mode, chain-of-thought, etc.) pass through via **kwargs.

Ecosystem

Package Role
cat-stack Domain-agnostic LLM classification engine
cat-pol Political text classification (this package)
cat-vader Social media text (Reddit, Twitter/X)
cat-ademic Academic papers and citations

License

GPL-3.0-or-later

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cat_pol-1.0.0.tar.gz (23.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

cat_pol-1.0.0-py3-none-any.whl (37.1 kB view details)

Uploaded Python 3

File details

Details for the file cat_pol-1.0.0.tar.gz.

File metadata

  • Download URL: cat_pol-1.0.0.tar.gz
  • Upload date:
  • Size: 23.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for cat_pol-1.0.0.tar.gz
Algorithm Hash digest
SHA256 1f48c093ba21a6c1cfbc92dc5a203f5fed7ed64eea2018f626f3807bbef0eb40
MD5 0a2ef4776693ef83d3962f81d06c0172
BLAKE2b-256 01b81780e3f59ea99a0d71680f00c90723e4c2b2fb1c4ae2e1002ef40442bad5

See more details on using hashes here.

File details

Details for the file cat_pol-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: cat_pol-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 37.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for cat_pol-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 7de2602a2f68b96f13f619087ebba07e352ddc1acef1c735b2737429ac023ecb
MD5 99c10997736f52a67f5f09c62cfae9e8
BLAKE2b-256 06751630c04d636addd38085470e0c8e29ce991ca5d1726f6ff997d6dfafe6ec

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page