Tools that capture public hearings, committee meetings, and symposiums from YouTube, then convert the recordings into searchable, analyzable transcripts.
Project description
Civic-AI-Recap (CAIR)
Tools to digitize, transcribe, and analyze public hearings, committees, and symposiums on youtube.
Install from PyPI:
pip install civic-ai-recap
Install with transcription dependencies:
pip install "civic-ai-recap[transcription]"
Install from source:
git clone https://github.com/thoppe/Civic-AI-Recap/
cd Civic-AI-Recap
pip install .
The PyPI project name is civic-ai-recap, but the import remains CAIR.
Set required environment variables:
YOUTUBE_API_KEYfor fetching metadata via the YouTube Data API.OPENAI_API_KEYfor LLM-powered analysis (used byAnalyze).
The transcription extra installs Whisper, faster-whisper, Silero VAD, and Torch support.
Resolve a YouTube channel ID from a handle URL:
from CAIR import channel_id_from_url
channel_id = channel_id_from_url("https://www.youtube.com/@hanovercountyva")
print(channel_id)
'''
UCg0poGd4dTMOKXEXL4xPi4g
'''
from CAIR import Channel, Video, Transcription, Analyze
video_id = "P0rxq42sckU"
vid = Video(video_id)
channel = Channel(vid.channel_id)
uploads = channel.get_uploads()
print(vid.title)
print(channel.title, channel.n_videos)
print(uploads[["video_id", "title", "publishedAt"]].head())
'''
SEP 30, 2025 | City Council
City of San Jose, CA 1741
video_id title publishedAt
0 h1sCi9oiBSc NOV 6, 2025 | Police & Fire Department Retirem... 2025-11-08T07:05:34Z
1 4mvGLqa-G70 NOV 18, 2025 | City Council 2025-11-05T22:27:04Z
2 BAvwrwjsnZM 18 NOVIEMBRE 2025 | Reunión del Ayuntamiento 2025-11-05T22:24:28Z
3 KGeDIw6vUDo NOV 5, 2025 | Rules & Open Government/Committe... 2025-11-05T22:16:10Z
4 itaRH6GLzBw 4 NOVIEMBRE 2025 | Reunión del Ayuntamiento 2025-11-05T12:59:25Z
'''
f_audio = f"{video_id}.mp3"
vid.download_audio(f_audio)
df = Transcription().transcribe(f_audio, text_only=False)
print(df)
'''
start end text
0 0.00 29.28 All right.
1 29.28 30.28 Good afternoon.
2 30.28 31.28 Welcome, everyone.
3 31.28 36.40 I'm pleased to call to order this meeting of t...
4 36.40 38.10 of September 30th.
... ... ... ...
3682 19872.78 19874.34 I thought he was waiting to speak. Back to cou...
3683 19876.92 19879.92 Thank you. That concludes our meeting. Thank you.
3684 19881.48 19911.46 Thank you.
3685 19911.48 19912.20 Thank you.
'''
model = Analyze(model_name="gpt-5-mini")
text = model.preprocess_text(df)
summary = model(
prompt=text,
system_prompt="Provide a concise executive summary of this hearing.",
)
print(summary)
'''
1. Bottom Line Up Front (BLUF)
San Jose’s council advanced an ambitious, data-driven “Focus Area 2.0” performance model while
approving near-term actions with statewide implications: significant police labor concessions
to stabilize staffing, a city amicus joining litigation in defense of Planned Parenthood, an
ordinance limiting masked identities for law-enforcement/immigration agents, major downtown
land acquisition to preserve future convention/sports options, and a large subsidized downtown
workforce housing loan — all overlapping statewide priorities on public safety, homelessness,
housing affordability, labor enforcement, and immigrant/community trust.
2. Key State-Level Themes and Implications
- Homelessness and shelter operations are shifting from capacity-building to systems/integration
issues (throughput, CalAIM billing, HMIS integration, county coordination). San Jose’s
[...]
'''
Channel metadata and videos can be accessed:
uploads = channel.get_uploads()
print(uploads[['video_id', 'title', 'publishedAt']])
'''
video_id title publishedAt
0 h1sCi9oiBSc NOV 6, 2025 | Police & Fire Department Retirem... 2025-11-08T07:05:34Z
1 4mvGLqa-G70 NOV 18, 2025 | City Council 2025-11-05T22:27:04Z
2 BAvwrwjsnZM 18 NOVIEMBRE 2025 | Reunión del Ayuntamiento 2025-11-05T22:24:28Z
3 KGeDIw6vUDo NOV 5, 2025 | Rules & Open Government/Committe... 2025-11-05T22:16:10Z
4 itaRH6GLzBw 4 NOVIEMBRE 2025 | Reunión del Ayuntamiento 2025-11-05T12:59:25Z
... ... ... ...
1747 BV2WEzVDrLw Fireworks Prevention en Español 2016-11-04T16:43:10Z
1748 nQWZLit5Kn0 Fireworks Prevention with Firefighter Alfred 2016-11-04T16:41:21Z
1749 2jH3dEH8gK0 SJ Journey To Fiscal SustainabilityHD 2016-11-04T00:02:36Z
1750 i2I98YY8btQ Bike Sharing arrives in San José 2016-11-03T23:59:27Z
1751 BpJ911ynFN0 Parks & Rec. 2013 Junior Games 2016-11-03T23:57:56Z
'''
channel.get_metadata()
{
"kind": "youtube#channelListResponse",
"etag": "I-t6Dq6TbsrHZb-C8Tvw3iLjn-0",
"pageInfo": {
"totalResults": 1,
"resultsPerPage": 5
},
"items": [
{
"kind": "youtube#channel",
"etag": "4WmqmG5PoRLHq5DgHM_Iix4UEJE",
"id": "UCeDiMzJEUbPgaruDcXnD4Cg",
"snippet": {
"title": "City of San Jose, CA",
"description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
"customUrl": "@cityofsanjosecalifornia",
"publishedAt": "2013-07-15T19:52:00Z",
"localized": {
"title": "City of San Jose, CA",
"description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world."
},
"country": "US"
},
"contentDetails": {
"relatedPlaylists": {
"likes": "",
"uploads": "UUeDiMzJEUbPgaruDcXnD4Cg"
}
},
"statistics": {
"viewCount": "1428701",
"subscriberCount": "5340",
"hiddenSubscriberCount": false,
"videoCount": "1741"
},
"topicDetails": {
"topicIds": [
"/m/098wr",
"/m/05qt0"
],
"topicCategories": [
"https://en.wikipedia.org/wiki/Society",
"https://en.wikipedia.org/wiki/Politics"
]
},
"status": {
"privacyStatus": "public",
"isLinked": true,
"longUploadsStatus": "longUploadsUnspecified"
},
"brandingSettings": {
"channel": {
"title": "City of San Jose, CA",
"description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
"unsubscribedTrailer": "mEd25UErtPw",
"country": "US"
},
"image": {
"bannerExternalUrl": "https://yt3.googleusercontent.com/Vp-n5GjLp9EkbgaWcJntExB2442KAHU3zYqo5NTMsJpiY2vCIxIlZwlLJxkeEE-EzvQ8oabm"
}
},
"contentOwnerDetails": {}
}
],
"download_date": "2025-11-10T11:25:38.516298"
}
Usage notes
Analyze module:
Analyzewraps OpenAI chat calls and records per-call usage inAnalyze.usage.- Caching uses
cache/<model_name>. Setforce=Trueto skip cache reads, andcache_result=Falseto skip writes; both can be overridden per call. - Set
websearch=TrueonAnalyze(...)to include the OpenAIweb_searchtool in requests. - Per-call overrides include
seed,timeout,force, andcache_result.
from CAIR import Analyze
model = Analyze(model_name="gpt-5-mini", force=True, websearch=True)
content = model(
prompt="Summarize the hearing in 5 bullets.",
system_prompt="You are a concise analyst.",
cache_result=True,
)
print(model.usage)
Transcription module:
Transcription.transcribe_s3(s3_location, text_only=...)streams audio directly from S3 and reuses the same post-processing astranscribe(...).s3_locationmust be a full S3 URI likes3://my-bucket/path/to/audio.mp3.Transcription(method=...)supportswhisperandfaster_whisper.compute_vad=Trueenables Silero VAD and addsis_vadto row-based transcript output.- Silero VAD prefers CUDA when available and falls back to CPU automatically.
vad_progress=Trueshows a tqdm progress bar while Silero VAD scans the waveform.stitch_progress=Trueshows a tqdm progress bar while transcript segments are matched against VAD intervals.output_progress=True(faster_whisper only) shows a tqdm progress bar and prints(segment.start, segment.end, segment.text)while consuming segment output.force=Trueskips cache reads for that call while still writing fresh results.
from CAIR import Transcription
t = Transcription()
df = t.transcribe_s3("s3://my-bucket/path/to/audio.mp3", text_only=False)
print(df[["start", "end", "text"]].head())
from CAIR import Transcription
t = Transcription(
method="faster_whisper",
model_size="distil-large-v3",
compute_vad=True,
vad_progress=True,
stitch_progress=True,
output_progress=True,
)
df = t.transcribe("meeting_audio.wav", text_only=False)
print(df[["start", "end", "text", "is_vad"]].head())
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file civic_ai_recap-0.10.4.tar.gz.
File metadata
- Download URL: civic_ai_recap-0.10.4.tar.gz
- Upload date:
- Size: 26.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c689c453b78049008537d064afbd5e3d0e554305f92043c2b3a85340c9bd0408
|
|
| MD5 |
26606024439b01bcd875751c8f700c3c
|
|
| BLAKE2b-256 |
57f5d48af0448b5529c70bc4eea22cf4cdac3e802939c58333b90fae173fdf28
|
File details
Details for the file civic_ai_recap-0.10.4-py3-none-any.whl.
File metadata
- Download URL: civic_ai_recap-0.10.4-py3-none-any.whl
- Upload date:
- Size: 20.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f9f4dc026dc6c95e920e167e2d63b0d718a56f679457a0febc55b9691eba74f6
|
|
| MD5 |
9b4f8d2b7e3a33efd9f7369b95415b7a
|
|
| BLAKE2b-256 |
3dd644cf7c41e55275dc51b6cc5b302e6ddcea37ef10a3ba94aa75e6deb97b25
|