Extract transcripts from HAR files, particularly from Fathom video calls
Project description
Fathom Extractor
A Python tool for extracting transcripts from HAR (HTTP Archive) files, particularly optimized for Fathom video call transcripts.
Features
- 🎥 Fathom Video Support: Specialized extraction for Fathom video call transcripts
- 🔍 Generic Transcript Detection: Finds transcripts from various APIs (Whisper, Deepgram, etc.)
- 📄 Multiple Output Formats: JSON, clean text, and beautiful Markdown with YAML frontmatter
- 🎯 Smart Pattern Matching: Automatically detects transcript-related network requests
- 📋 Rich Metadata: Extracts speakers, Q&A clips, AI notes, and meeting summaries
- ⚡ CLI Tool: Easy-to-use command-line interface
Installation
From PyPI (when published)
pip install fathom-extractor
From Source
git clone https://github.com/igutekunst/fathom-extractor.git
cd fathom-extractor
pip install -e .
Quick Start
- Download a HAR file (see How to Download HAR Files)
- Extract transcripts:
fathom-extractor recording.har
- Get beautiful output:
fathom-extractor recording.har -m transcript.md -c clean.txt -v
Usage
Basic Usage
# Extract to JSON (default)
fathom-extractor recording.har
# Specify output file
fathom-extractor recording.har -o my_transcripts.json
# Create multiple output formats
fathom-extractor recording.har -m beautiful.md -c readable.txt
Command Line Options
fathom-extractor [-h] [-o OUTPUT] [-c CLEAN] [-m MARKDOWN] [-v] [--version] har_file
positional arguments:
har_file Path to the HAR file to extract transcripts from
options:
-h, --help show this help message and exit
-o OUTPUT, --output OUTPUT
Output JSON file (default: extracted_transcripts.json)
-c CLEAN, --clean CLEAN
Also create a clean, readable transcript file
-m MARKDOWN, --markdown MARKDOWN
Create a beautiful markdown transcript with YAML frontmatter
-v, --verbose Enable verbose output
--version show program version and exit
Examples
# Basic extraction
fathom-extractor meeting.har
# Extract with all output formats and verbose logging
fathom-extractor meeting.har -o data.json -c transcript.txt -m report.md -v
# Just create a markdown report
fathom-extractor meeting.har -m meeting_notes.md
How to Download HAR Files
HAR (HTTP Archive) files capture all network traffic from your browser. Here's how to download them:
Chrome/Chromium
-
Open Developer Tools
- Press
F12orCtrl+Shift+I(Windows/Linux) - Press
Cmd+Option+I(Mac) - Or right-click → "Inspect"
- Press
-
Go to Network Tab
- Click the "Network" tab in Developer Tools
- Make sure recording is enabled (red circle should be active)
-
Navigate and Capture
- Go to your Fathom video page or transcript page
- Let the page fully load and display the transcript
- Scroll through the transcript if needed
-
Download HAR File
- Right-click in the Network tab
- Select "Save all as HAR with content"
- Choose a filename and save
Firefox
-
Open Developer Tools
- Press
F12orCtrl+Shift+I(Windows/Linux) - Press
Cmd+Option+I(Mac)
- Press
-
Go to Network Tab
- Click the "Network" tab
- Ensure recording is active
-
Capture Traffic
- Navigate to your transcript page
- Wait for full page load
-
Export HAR
- Click the gear icon (⚙️) in the Network tab
- Select "Save All As HAR"
Safari
-
Enable Developer Menu
- Safari → Preferences → Advanced
- Check "Show Develop menu in menu bar"
-
Open Web Inspector
- Develop → Show Web Inspector
- Go to Network tab
-
Capture and Export
- Navigate to transcript page
- Right-click in Network tab → "Export HAR"
Tips for Better Results
- Clear browser cache before recording to capture all requests
- Disable ad blockers temporarily to avoid missing requests
- Wait for full page load before saving the HAR file
- Interact with the page (scroll, click) to trigger all network requests
- For Fathom: Make sure you can see the full transcript on screen
Output Formats
JSON Output
Raw extracted data with full metadata and transcript content.
Clean Text Output
Human-readable format with:
- Meeting metadata
- Speaker information
- Q&A sections
- Full transcript with timestamps
Markdown Output
Beautiful formatted document with:
- YAML frontmatter with metadata
- Structured sections with emojis
- Proper formatting for speakers and timestamps
- Q&A sections with time ranges
- Meeting summaries and AI notes
What Gets Extracted
For Fathom Videos
- 👥 Speakers: Names and email addresses
- 📋 Meeting Summary: AI-generated meeting notes
- 💬 Q&A Clips: Questions and answers with timestamps
- 🤖 AI Notes: Additional AI-generated insights
- 📄 Full Transcript: Complete conversation with speaker attribution
- ⏰ Metadata: Meeting title, duration, host information
For Generic Transcripts
- 📝 Transcript Text: Raw or structured transcript data
- 🕒 Timestamps: When available
- 👤 Speaker Information: If present in the data
- 📊 Confidence Scores: From speech recognition APIs
Supported Sources
- Fathom Video: Full support for Fathom's transcript format
- OpenAI Whisper: API responses
- Deepgram: Transcript API responses
- Rev.ai: Speech-to-text API responses
- Google Speech-to-Text: API responses
- Azure Speech: API responses
- AWS Transcribe: API responses
- Generic APIs: Any API returning transcript-like JSON
Python API
You can also use the tool programmatically:
from fathom_extractor import HARTranscriptExtractor
# Create extractor
extractor = HARTranscriptExtractor('recording.har')
# Extract all transcripts
transcripts = extractor.extract_all_transcripts()
# Save in different formats
extractor.save_transcripts(transcripts, 'output.json')
extractor.create_clean_transcript(transcripts, 'clean.txt')
extractor.create_markdown_transcript(transcripts, 'beautiful.md')
# Access transcript data
for transcript in transcripts:
print(f"Source: {transcript['source']}")
print(f"URL: {transcript['url']}")
if transcript['source'] == 'fathom':
data = transcript['transcript_data']
print(f"Speakers: {len(data.get('speakers', []))}")
print(f"Q&A Clips: {len(data.get('qa_clips', []))}")
Troubleshooting
No Transcripts Found
If the tool doesn't find any transcripts:
- Check the HAR file: Make sure you captured network traffic while viewing the transcript
- Verify page loading: Ensure the transcript was fully loaded when you captured the HAR
- Try verbose mode: Use
-vflag to see what URLs were analyzed - Check browser: Some browsers or extensions might block certain requests
Incomplete Transcripts
If transcripts are missing content:
- Scroll through the page: Some transcripts load content dynamically
- Wait longer: Let the page fully load before capturing
- Check network requests: Look for additional API calls in the Network tab
Large HAR Files
HAR files can be large. If you encounter memory issues:
- Clear browser data before recording
- Close other tabs to reduce network noise
- Use incognito/private mode to avoid extension interference
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
License
MIT License - see LICENSE file for details.
Author
Isaac Harrison Gutekunst
- GitHub: @igutekunst
- Email: isaac@gutekunst.com
Changelog
v1.0.0
- Initial release
- Fathom video transcript extraction
- Generic transcript API support
- Multiple output formats
- CLI tool with comprehensive options
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fathom_extractor-1.0.0.tar.gz.
File metadata
- Download URL: fathom_extractor-1.0.0.tar.gz
- Upload date:
- Size: 16.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3d3ed6fa42064211237769e1b5366b1f038a946dc3d97f097184b453af7e37d9
|
|
| MD5 |
a68244aef4cd2592a2be3223bf48742d
|
|
| BLAKE2b-256 |
59644703988914668add0691df8bf5dedc558c43e00d5592b93dd38b45b50d77
|
File details
Details for the file fathom_extractor-1.0.0-py3-none-any.whl.
File metadata
- Download URL: fathom_extractor-1.0.0-py3-none-any.whl
- Upload date:
- Size: 13.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.10
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9162f7efccf91593bcc94c60b39161124d397c4ee03d11725780d7b0ad643344
|
|
| MD5 |
3f73cc7ee48df4a2a613e5eac45c76e4
|
|
| BLAKE2b-256 |
c08f547762f000b34f62632e5b1a61db2c616bb921dc94d0b10c06197b3fe407
|