Policy document classification powered by LLMs

These details have not been verified by PyPI

Project links

Project description

cat-pol

Political text classification and analysis powered by LLMs. A policy-specific wrapper around cat-stack with built-in access to 15 political data sources on HuggingFace.

Installation

pip install cat-pol

With optional extras:

pip install "cat-pol[pdf]"         # PDF document processing
pip install "cat-pol[embeddings]"  # Embedding-based similarity scoring
pip install "cat-pol[sources]"     # Data source loading (datasets, huggingface_hub)

Quick Start

Classify ordinances from a built-in source

import cat_pol as pol

results = pol.classify(
    source="city_san_diego",
    categories=["Housing", "Public Safety", "Infrastructure", "Finance"],
    doc_type="ordinance",
    since="2022-01-01",
    n=50,
    api_key="sk-...",
)

Classify raw text

results = pol.classify(
    input_data=[
        "The committee voted to approve the rezoning request for parcel 42.",
        "Motion to table the budget amendment until the next session.",
    ],
    categories=["Approval", "Rejection", "Deferral", "Amendment"],
    document_context="City council meeting minutes",
    api_key="sk-...",
)

Optimize prompts with user feedback

result = pol.prompt_tune(
    source="city_san_diego",
    categories=["Pro-Business", "Pro-Regulation", "Tax Increase", "Tax Decrease"],
    doc_type="ordinance",
    since="2020-01-01",
    n=100,
    api_key="sk-...",
    sample_size=15,
)

# Use the optimized prompt for full classification
results = pol.classify(
    source="city_san_diego",
    categories=["Pro-Business", "Pro-Regulation", "Tax Increase", "Tax Decrease"],
    system_prompt=result["system_prompt"],
    api_key="sk-...",
)

Summarize with different formats

# Bullet points
pol.summarize(source="federal_executive_orders", n=10, format="bullets", api_key="sk-...")

# Full report
pol.summarize(source="federal_laws", n=5, format="report", api_key="sk-...")

# One-liner
pol.summarize(source="social_trump_truth", since="2024-01-01", n=20, format="one-liner", api_key="sk-...")

Discover categories

result = pol.extract(
    source="city_berkeley",
    n=200,
    api_key="sk-...",
)
print(result["top_categories"])

Fetch raw data

# List all sources
pol.list_sources()
pol.list_sources(level="city")
pol.list_sources(level="federal")

# Fetch data
df = pol.fetch_source("city_san_diego", n=100, since="2020-01-01", doc_type="ordinance")
df = pol.fetch_source("federal_executive_orders", n=50)
df = pol.fetch_source("social_trump_truth", since="2024-01-01")

Data Sources

All datasets are public on HuggingFace — no authentication required.

California Cities

Source	Rows	Types	Repo
`city_san_diego`	87,983	ordinances, resolutions	chrissoria/san-diego-ordinances
`city_los_angeles`	34,427	ordinances	chrissoria/la-ordinances
`city_berkeley`	9,028	ordinances	chrissoria/berkeley-ordinances
`city_san_francisco`	4,033	ordinances	chrissoria/sf-ordinances
`city_long_beach`	3,898	ordinances, resolutions	chrissoria/long-beach-ordinances
`city_bakersfield`	2,655	ordinances	chrissoria/bakersfield-ordinances
`city_newport_beach`	2,719	ordinances	chrissoria/newport-beach-ordinances
`city_salinas`	2,574	ordinances, resolutions	chrissoria/salinas-ordinances
`city_clovis`	2,343	ordinances	chrissoria/clovis-ordinances
`city_oakland`	1,824	ordinances	chrissoria/oakland-ordinances
`city_fresno`	706	ordinances, resolutions	chrissoria/fresno-ordinances

Federal

Source	Rows	Types	Repo
`federal_laws`	5,915	public laws (1995–present)	chrissoria/federal-public-laws
`federal_executive_orders`	1,530+	executive orders	chrissoria/executive-orders
`federal_speeches`	305	SOTU, inaugurals	chrissoria/presidential-speeches

Social Media

Source	Rows	Types	Repo
`social_trump_truth`	32,000+	Truth Social posts	chrissoria/trump-truth-social

All sources are updated weekly (Sundays at 9 AM) via automated scrapers. Truth Social is updated daily at 9 AM.

Trump Truth Social Dataset Columns

The social_trump_truth dataset is enriched with metadata, market data, and image descriptions:

Post metadata:

Column	Description
`date`	Post date (YYYY-MM-DD)
`time`	Post time in UTC (HH:MM:SS)
`day_of_week`	Day name (Monday, Tuesday, etc.)
`datetime`	Full ISO timestamp
`text`	Post text content
`url`	Truth Social post URL
`post_id`	Unique post identifier
`is_president`	Whether Trump was president at time of post
`is_president_elect`	Whether Trump was president-elect at time of post
`replies_count`	Number of replies
`reblogs_count`	Number of reblogs
`favourites_count`	Number of favourites
`media_urls`	Image/video URLs attached to the post
`has_media`	Whether the post has media attachments
`image_alt_text`	AI-generated factual image description (alt-text format)

Market data (18 tickers):

Each ticker has 7 columns following the convention {ticker}_{metric}:

Metric	Description
`{ticker}_open`	Daily open price
`{ticker}_close`	Daily close price
`{ticker}_1hr_before`	Price 1 hour before the post
`{ticker}_5min_before`	Price 5 minutes before the post
`{ticker}_at_post`	Price at time of post
`{ticker}_5min_after`	Price 5 minutes after the post
`{ticker}_1hr_after`	Price 1 hour after the post

Tickers included:

Ticker	Name	Category
`sp500`	S&P 500 (^GSPC)	Broad market
`dia`	SPDR Dow Jones Industrial Average ETF	Broad market
`qqq`	Invesco QQQ (Nasdaq-100)	Tech/growth
`djt`	Trump Media & Technology Group	Trump-linked
`lmt`	Lockheed Martin	Defense
`war`	Themes US Military Academy ETF	Defense
`xli`	Industrial Select Sector SPDR	Industrials
`xlv`	Health Care Select Sector SPDR	Healthcare
`xph`	SPDR S&P Pharmaceuticals ETF	Pharma
`cnrg`	SPDR S&P Kensho Clean Power ETF	Clean energy
`gld`	SPDR Gold Shares	Gold/commodities
`uso`	United States Oil Fund	Oil/energy
`fxi`	iShares China Large-Cap ETF	China/trade
`eww`	iShares MSCI Mexico ETF	Mexico/trade
`vgk`	Vanguard FTSE Europe ETF	Europe
`ibit`	iShares Bitcoin ETF	Crypto
`tlt`	iShares 20+ Year Treasury Bond ETF	Bonds/rates
`uup`	Invesco DB US Dollar Index	USD strength

Intraday prices use the highest available resolution: 1-minute (last ~7 days), 5-minute (last ~60 days), or hourly (last ~2 years). Weekend/holiday posts use the most recent trading day's values. The sp500_resolution column indicates the data resolution used.

API

Function	Description
`classify()`	Classify text into predefined categories
`prompt_tune()`	Optimize classification prompts via user feedback
`extract()`	Discover and normalize categories from text
`explore()`	Raw category extraction (no deduplication)
`summarize()`	Summarize text, PDFs, or image URLs with format options (paragraph, bullets, one-liner, structured, report, alt-text)
`list_sources()`	List available data sources
`fetch_source()`	Fetch raw data from a source

All functions accept either input_data= (raw text, files, directories) or source= (pull from HuggingFace). All cat-stack parameters (multi-model ensemble, batch mode, chain-of-thought, etc.) pass through via **kwargs.

Ecosystem

Package	Role
cat-stack	Domain-agnostic LLM classification engine
cat-pol	Political text classification (this package)
cat-vader	Social media text (Reddit, Twitter/X)
cat-ademic	Academic papers and citations

License

GPL-3.0-or-later

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.2.0

Apr 3, 2026

1.1.0

Mar 27, 2026

1.0.0

Mar 23, 2026

0.1.0

Mar 19, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cat_pol-1.2.0.tar.gz (28.4 kB view details)

Uploaded Apr 3, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

cat_pol-1.2.0-py3-none-any.whl (42.4 kB view details)

Uploaded Apr 3, 2026 Python 3

File details

Details for the file cat_pol-1.2.0.tar.gz.

File metadata

Download URL: cat_pol-1.2.0.tar.gz
Upload date: Apr 3, 2026
Size: 28.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for cat_pol-1.2.0.tar.gz
Algorithm	Hash digest
SHA256	`ae3bbabc78c8fb2b46722d83906117382f06b0d62604dcf38b48b214f4a749a9`
MD5	`47b0e85bc41d230cf3ab8788b450cede`
BLAKE2b-256	`f083417a3eba4fa612efcb599c0e897620e747a5dce2053af2818384a711d91c`

See more details on using hashes here.

File details

Details for the file cat_pol-1.2.0-py3-none-any.whl.

File metadata

Download URL: cat_pol-1.2.0-py3-none-any.whl
Upload date: Apr 3, 2026
Size: 42.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.14

File hashes

Hashes for cat_pol-1.2.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`4ffaf7ae2954b3a5371df036c3ef1bdb2d06781ffb60240e80f076d166475914`
MD5	`f6b44e21a55c1eb84f1c12af2089d88c`
BLAKE2b-256	`5a35f9d519809e86d1da3db85d55653d20fa1edab5096727e972d5fec88f99d0`

See more details on using hashes here.

cat-pol 1.2.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

cat-pol

Installation

Quick Start

Classify ordinances from a built-in source

Classify raw text

Optimize prompts with user feedback

Summarize with different formats

Discover categories

Fetch raw data

Data Sources

California Cities

Federal

Social Media

Trump Truth Social Dataset Columns

API

Ecosystem

License

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes