Skip to main content

Tools that capture public hearings, committee meetings, and symposiums from YouTube, then convert the recordings into searchable, analyzable transcripts.

Project description

Civic-AI-Recap (CAIR)

Tools to digitize, transcribe, and analyze public hearings, committees, and symposiums on youtube.

Install from PyPI:

pip install civic-ai-recap

Install with transcription dependencies:

pip install "civic-ai-recap[transcription]"

Install from source:

git clone https://github.com/thoppe/Civic-AI-Recap/
cd Civic-AI-Recap
pip install .

The PyPI project name is civic-ai-recap, but the import remains CAIR.

Set required environment variables:

  • YOUTUBE_API_KEY for fetching metadata via the YouTube Data API.
  • OPENAI_API_KEY for LLM-powered analysis (used by Analyze).

The transcription extra installs Whisper, faster-whisper, Silero VAD, and Torch support.

Resolve a YouTube channel ID from a handle URL:

from CAIR import channel_id_from_url

channel_id = channel_id_from_url("https://www.youtube.com/@hanovercountyva")
print(channel_id)

'''
UCg0poGd4dTMOKXEXL4xPi4g
'''
from CAIR import Channel, Video, Transcription, Analyze

video_id = "P0rxq42sckU"
vid = Video(video_id)
channel = Channel(vid.channel_id)
uploads = channel.get_uploads()

print(vid.title)
print(channel.title, channel.n_videos)
print(uploads[["video_id", "title", "publishedAt"]].head())

'''
SEP 30, 2025 | City Council
City of San Jose, CA 1741
      video_id                                              title           publishedAt
0  h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1  4mvGLqa-G70                       NOV 18, 2025 | City Council  2025-11-05T22:27:04Z
2  BAvwrwjsnZM      18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3  KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4  itaRH6GLzBw       4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
'''

f_audio = f"{video_id}.mp3"
vid.download_audio(f_audio)
df = Transcription().transcribe(f_audio, text_only=False)
print(df)

'''
      start    end                                               text
0      0.00  29.28                                         All right.
1     29.28  30.28                                    Good afternoon.
2     30.28  31.28                                 Welcome, everyone.
3     31.28  36.40  I'm pleased to call to order this meeting of t...
4     36.40  38.10                                 of September 30th.
...     ...    ...                                                ...
3682  19872.78  19874.34  I thought he was waiting to speak. Back to cou...
3683  19876.92  19879.92  Thank you. That concludes our meeting. Thank you.
3684  19881.48  19911.46                                         Thank you.
3685  19911.48  19912.20                                         Thank you.
'''

model = Analyze(model_name="gpt-5-mini")
text = model.preprocess_text(df)
summary = model(
    prompt=text,
    system_prompt="Provide a concise executive summary of this hearing.",
)
print(summary)

'''
1. Bottom Line Up Front (BLUF)
San Jose’s council advanced an ambitious, data-driven “Focus Area 2.0” performance model while
approving near-term actions with statewide implications: significant police labor concessions
to stabilize staffing, a city amicus joining litigation in defense of Planned Parenthood, an
ordinance limiting masked identities for law-enforcement/immigration agents, major downtown
land acquisition to preserve future convention/sports options, and a large subsidized downtown
workforce housing loan — all overlapping statewide priorities on public safety, homelessness,
housing affordability, labor enforcement, and immigrant/community trust.

2. Key State-Level Themes and Implications
- Homelessness and shelter operations are shifting from capacity-building to systems/integration
issues (throughput, CalAIM billing, HMIS integration, county coordination). San Jose’s
[...]
'''

Channel metadata and videos can be accessed:

uploads = channel.get_uploads()
print(uploads[['video_id', 'title', 'publishedAt']])

'''
         video_id                                              title           publishedAt
0     h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1     4mvGLqa-G70                       NOV 18, 2025 |  City Council  2025-11-05T22:27:04Z
2     BAvwrwjsnZM       18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3     KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4     itaRH6GLzBw        4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
...           ...                                                ...                   ...
1747  BV2WEzVDrLw                    Fireworks Prevention en Español  2016-11-04T16:43:10Z
1748  nQWZLit5Kn0       Fireworks Prevention with Firefighter Alfred  2016-11-04T16:41:21Z
1749  2jH3dEH8gK0              SJ Journey To Fiscal SustainabilityHD  2016-11-04T00:02:36Z
1750  i2I98YY8btQ                   Bike Sharing arrives in San José  2016-11-03T23:59:27Z
1751  BpJ911ynFN0                     Parks & Rec. 2013 Junior Games  2016-11-03T23:57:56Z

'''

channel.get_metadata()

{
  "kind": "youtube#channelListResponse",
  "etag": "I-t6Dq6TbsrHZb-C8Tvw3iLjn-0",
  "pageInfo": {
    "totalResults": 1,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#channel",
      "etag": "4WmqmG5PoRLHq5DgHM_Iix4UEJE",
      "id": "UCeDiMzJEUbPgaruDcXnD4Cg",
      "snippet": {
        "title": "City of San Jose, CA",
        "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
        "customUrl": "@cityofsanjosecalifornia",
        "publishedAt": "2013-07-15T19:52:00Z",
        "localized": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world."
        },
        "country": "US"
      },
      "contentDetails": {
        "relatedPlaylists": {
          "likes": "",
          "uploads": "UUeDiMzJEUbPgaruDcXnD4Cg"
        }
      },
      "statistics": {
        "viewCount": "1428701",
        "subscriberCount": "5340",
        "hiddenSubscriberCount": false,
        "videoCount": "1741"
      },
      "topicDetails": {
        "topicIds": [
          "/m/098wr",
          "/m/05qt0"
        ],
        "topicCategories": [
          "https://en.wikipedia.org/wiki/Society",
          "https://en.wikipedia.org/wiki/Politics"
        ]
      },
      "status": {
        "privacyStatus": "public",
        "isLinked": true,
        "longUploadsStatus": "longUploadsUnspecified"
      },
      "brandingSettings": {
        "channel": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
          "unsubscribedTrailer": "mEd25UErtPw",
          "country": "US"
        },
        "image": {
          "bannerExternalUrl": "https://yt3.googleusercontent.com/Vp-n5GjLp9EkbgaWcJntExB2442KAHU3zYqo5NTMsJpiY2vCIxIlZwlLJxkeEE-EzvQ8oabm"
        }
      },
      "contentOwnerDetails": {}
    }
  ],
  "download_date": "2025-11-10T11:25:38.516298"
}

Usage notes

Analyze module:

  • Analyze wraps OpenAI chat calls and records per-call usage in Analyze.usage.
  • Caching uses cache/<model_name>. Set force=True to skip cache reads, and cache_result=False to skip writes; both can be overridden per call.
  • Set websearch=True on Analyze(...) to include the OpenAI web_search tool in requests.
  • Per-call overrides include seed, timeout, force, and cache_result.
from CAIR import Analyze

model = Analyze(model_name="gpt-5-mini", force=True, websearch=True)
content = model(
    prompt="Summarize the hearing in 5 bullets.",
    system_prompt="You are a concise analyst.",
    cache_result=True,
)
print(model.usage)

Transcription module:

  • Transcription.transcribe_s3(s3_location, text_only=...) streams audio directly from S3 and reuses the same post-processing as transcribe(...).
  • s3_location must be a full S3 URI like s3://my-bucket/path/to/audio.mp3.
  • Transcription(method=...) supports whisper and faster_whisper.
  • compute_vad=True enables Silero VAD and adds is_vad to row-based transcript output.
  • Silero VAD prefers CUDA when available and falls back to CPU automatically.
  • Progress bars are enabled by default for Silero VAD, VAD stitching, and faster_whisper segment consumption. Set vad_progress=False, stitch_progress=False, or output_progress=False to disable them.
  • force=True skips cache reads for that call while still writing fresh results.
from CAIR import Transcription

t = Transcription()
df = t.transcribe_s3("s3://my-bucket/path/to/audio.mp3", text_only=False)
print(df[["start", "end", "text"]].head())
from CAIR import Transcription

t = Transcription(
    method="faster_whisper",
    model_size="distil-large-v3",
    compute_vad=True,
    vad_progress=True,
    stitch_progress=True,
    output_progress=True,
)
df = t.transcribe("meeting_audio.wav", text_only=False)
print(df[["start", "end", "text", "is_vad"]].head())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

civic_ai_recap-0.10.5.tar.gz (26.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

civic_ai_recap-0.10.5-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file civic_ai_recap-0.10.5.tar.gz.

File metadata

  • Download URL: civic_ai_recap-0.10.5.tar.gz
  • Upload date:
  • Size: 26.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for civic_ai_recap-0.10.5.tar.gz
Algorithm Hash digest
SHA256 25927343f1a60937b0574adc933c00463eae587bb7df8e207c916fc7672b872d
MD5 bea42ee09d5c76fc4dc26c295193ef15
BLAKE2b-256 6563d6b28464284f7731a56fece8b48351b17f2ae55ead9e9284fd81044d783c

See more details on using hashes here.

File details

Details for the file civic_ai_recap-0.10.5-py3-none-any.whl.

File metadata

File hashes

Hashes for civic_ai_recap-0.10.5-py3-none-any.whl
Algorithm Hash digest
SHA256 36eebbfa75f2ed96be0e073bfc5fed6c1a355b15dc6d506cc79129e8a4c56bef
MD5 71a5397fa8b13d4d4a0d8f85c30af6e4
BLAKE2b-256 be1e1a8b7a850ae50a24457caa35b5715260885f70948f16f7d5b1a33afa42d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page