Skip to main content

Tools that capture public hearings, committee meetings, and symposiums from YouTube, then convert the recordings into searchable, analyzable transcripts.

Project description

Civic-AI-Recap (CAIR)

Tools to digitize, transcribe, and analyze public hearings, committees, and symposiums on youtube.

Install from PyPI:

pip install civic-ai-recap

Install with transcription dependencies:

pip install "civic-ai-recap[transcription]"

Install from source:

git clone https://github.com/thoppe/Civic-AI-Recap/
cd Civic-AI-Recap
pip install .

The PyPI project name is civic-ai-recap, but the import remains CAIR.

Set required environment variables:

  • YOUTUBE_API_KEY for fetching metadata via the YouTube Data API.
  • OPENAI_API_KEY for LLM-powered analysis (used by Analyze).

The transcription extra installs Whisper, faster-whisper, Silero VAD, and Torch support.

Resolve a YouTube channel ID from a handle URL:

from CAIR import channel_id_from_url

channel_id = channel_id_from_url("https://www.youtube.com/@hanovercountyva")
print(channel_id)

'''
UCg0poGd4dTMOKXEXL4xPi4g
'''
from CAIR import Channel, Video, Transcription, Analyze

video_id = "P0rxq42sckU"
vid = Video(video_id)
channel = Channel(vid.channel_id)
uploads = channel.get_uploads()

print(vid.title)
print(channel.title, channel.n_videos)
print(uploads[["video_id", "title", "publishedAt"]].head())

'''
SEP 30, 2025 | City Council
City of San Jose, CA 1741
      video_id                                              title           publishedAt
0  h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1  4mvGLqa-G70                       NOV 18, 2025 | City Council  2025-11-05T22:27:04Z
2  BAvwrwjsnZM      18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3  KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4  itaRH6GLzBw       4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
'''

f_audio = f"{video_id}.mp3"
vid.download_audio(f_audio)
df = Transcription().transcribe(f_audio, text_only=False)
print(df)

'''
      start    end                                               text
0      0.00  29.28                                         All right.
1     29.28  30.28                                    Good afternoon.
2     30.28  31.28                                 Welcome, everyone.
3     31.28  36.40  I'm pleased to call to order this meeting of t...
4     36.40  38.10                                 of September 30th.
...     ...    ...                                                ...
3682  19872.78  19874.34  I thought he was waiting to speak. Back to cou...
3683  19876.92  19879.92  Thank you. That concludes our meeting. Thank you.
3684  19881.48  19911.46                                         Thank you.
3685  19911.48  19912.20                                         Thank you.
'''

model = Analyze(model_name="gpt-5-mini")
text = model.preprocess_text(df)
summary = model(
    prompt=text,
    system_prompt="Provide a concise executive summary of this hearing.",
)
print(summary)

'''
1. Bottom Line Up Front (BLUF)
San Jose’s council advanced an ambitious, data-driven “Focus Area 2.0” performance model while
approving near-term actions with statewide implications: significant police labor concessions
to stabilize staffing, a city amicus joining litigation in defense of Planned Parenthood, an
ordinance limiting masked identities for law-enforcement/immigration agents, major downtown
land acquisition to preserve future convention/sports options, and a large subsidized downtown
workforce housing loan — all overlapping statewide priorities on public safety, homelessness,
housing affordability, labor enforcement, and immigrant/community trust.

2. Key State-Level Themes and Implications
- Homelessness and shelter operations are shifting from capacity-building to systems/integration
issues (throughput, CalAIM billing, HMIS integration, county coordination). San Jose’s
[...]
'''

Channel metadata and videos can be accessed:

uploads = channel.get_uploads()
print(uploads[['video_id', 'title', 'publishedAt']])

'''
         video_id                                              title           publishedAt
0     h1sCi9oiBSc  NOV 6, 2025 | Police & Fire Department Retirem...  2025-11-08T07:05:34Z
1     4mvGLqa-G70                       NOV 18, 2025 |  City Council  2025-11-05T22:27:04Z
2     BAvwrwjsnZM       18 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T22:24:28Z
3     KGeDIw6vUDo  NOV 5, 2025 | Rules & Open Government/Committe...  2025-11-05T22:16:10Z
4     itaRH6GLzBw        4 NOVIEMBRE 2025 | Reunión del Ayuntamiento  2025-11-05T12:59:25Z
...           ...                                                ...                   ...
1747  BV2WEzVDrLw                    Fireworks Prevention en Español  2016-11-04T16:43:10Z
1748  nQWZLit5Kn0       Fireworks Prevention with Firefighter Alfred  2016-11-04T16:41:21Z
1749  2jH3dEH8gK0              SJ Journey To Fiscal SustainabilityHD  2016-11-04T00:02:36Z
1750  i2I98YY8btQ                   Bike Sharing arrives in San José  2016-11-03T23:59:27Z
1751  BpJ911ynFN0                     Parks & Rec. 2013 Junior Games  2016-11-03T23:57:56Z

'''

channel.get_metadata()

{
  "kind": "youtube#channelListResponse",
  "etag": "I-t6Dq6TbsrHZb-C8Tvw3iLjn-0",
  "pageInfo": {
    "totalResults": 1,
    "resultsPerPage": 5
  },
  "items": [
    {
      "kind": "youtube#channel",
      "etag": "4WmqmG5PoRLHq5DgHM_Iix4UEJE",
      "id": "UCeDiMzJEUbPgaruDcXnD4Cg",
      "snippet": {
        "title": "City of San Jose, CA",
        "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
        "customUrl": "@cityofsanjosecalifornia",
        "publishedAt": "2013-07-15T19:52:00Z",
        "localized": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world."
        },
        "country": "US"
      },
      "contentDetails": {
        "relatedPlaylists": {
          "likes": "",
          "uploads": "UUeDiMzJEUbPgaruDcXnD4Cg"
        }
      },
      "statistics": {
        "viewCount": "1428701",
        "subscriberCount": "5340",
        "hiddenSubscriberCount": false,
        "videoCount": "1741"
      },
      "topicDetails": {
        "topicIds": [
          "/m/098wr",
          "/m/05qt0"
        ],
        "topicCategories": [
          "https://en.wikipedia.org/wiki/Society",
          "https://en.wikipedia.org/wiki/Politics"
        ]
      },
      "status": {
        "privacyStatus": "public",
        "isLinked": true,
        "longUploadsStatus": "longUploadsUnspecified"
      },
      "brandingSettings": {
        "channel": {
          "title": "City of San Jose, CA",
          "description": "With almost one million residents, San José is one of the safest, and most diverse cities in the United States. It is Northern California’s largest city and the 13th largest in the nation. Colloquially known as the Capital of Silicon Valley, San José’s transformation into a global innovation center has resulted in one of the nation’s highest concentrations of technology companies and expertise in the world.",
          "unsubscribedTrailer": "mEd25UErtPw",
          "country": "US"
        },
        "image": {
          "bannerExternalUrl": "https://yt3.googleusercontent.com/Vp-n5GjLp9EkbgaWcJntExB2442KAHU3zYqo5NTMsJpiY2vCIxIlZwlLJxkeEE-EzvQ8oabm"
        }
      },
      "contentOwnerDetails": {}
    }
  ],
  "download_date": "2025-11-10T11:25:38.516298"
}

Usage notes

Analyze module:

  • Analyze wraps OpenAI chat calls and records per-call usage in Analyze.usage.
  • Caching uses cache/<model_name>. Set force=True to skip cache reads, and cache_result=False to skip writes; both can be overridden per call.
  • Set websearch=True on Analyze(...) to include the OpenAI web_search tool in requests.
  • Per-call overrides include seed, timeout, force, and cache_result.
from CAIR import Analyze

model = Analyze(model_name="gpt-5-mini", force=True, websearch=True)
content = model(
    prompt="Summarize the hearing in 5 bullets.",
    system_prompt="You are a concise analyst.",
    cache_result=True,
)
print(model.usage)

Transcription module:

  • Transcription.transcribe_s3(s3_location, text_only=...) streams audio directly from S3 and reuses the same post-processing as transcribe(...).
  • s3_location must be a full S3 URI like s3://my-bucket/path/to/audio.mp3.
  • Transcription(method=...) supports whisper and faster_whisper.
  • compute_vad=True enables Silero VAD and adds is_vad to row-based transcript output.
  • Silero VAD prefers CUDA when available and falls back to CPU automatically.
  • Progress bars are enabled by default for Silero VAD, VAD stitching, and faster_whisper segment consumption. Set vad_progress=False, stitch_progress=False, or output_progress=False to disable them.
  • force=True skips cache reads for that call while still writing fresh results.
from CAIR import Transcription

t = Transcription()
df = t.transcribe_s3("s3://my-bucket/path/to/audio.mp3", text_only=False)
print(df[["start", "end", "text"]].head())
from CAIR import Transcription

t = Transcription(
    method="faster_whisper",
    model_size="distil-large-v3",
    compute_vad=True,
    vad_progress=True,
    stitch_progress=True,
    output_progress=True,
)
df = t.transcribe("meeting_audio.wav", text_only=False)
print(df[["start", "end", "text", "is_vad"]].head())

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

civic_ai_recap-0.10.7.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

civic_ai_recap-0.10.7-py3-none-any.whl (20.6 kB view details)

Uploaded Python 3

File details

Details for the file civic_ai_recap-0.10.7.tar.gz.

File metadata

  • Download URL: civic_ai_recap-0.10.7.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.12

File hashes

Hashes for civic_ai_recap-0.10.7.tar.gz
Algorithm Hash digest
SHA256 93019c47b51d3c4e68ec9fe3e39f6671dc23f4b10b5bbe01035d1412bfea4988
MD5 b6ba123b6799108dbaa5663f7046d7e7
BLAKE2b-256 ef700fb88b0330c01f5dc3fc6731ffa9922de63d0e2b9db6663635276e8eca5f

See more details on using hashes here.

File details

Details for the file civic_ai_recap-0.10.7-py3-none-any.whl.

File metadata

File hashes

Hashes for civic_ai_recap-0.10.7-py3-none-any.whl
Algorithm Hash digest
SHA256 5d2672a7b9e8e650a5f9615bafd91a1e6af7173fff31828fef1fb92165a6dc2b
MD5 9bd7a023fe1249abedb6822fa5a1b1ff
BLAKE2b-256 8739be979dfed945db206670f2c19913cdbddc4c517d0852032b39dd9bc4fa6f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page