Skip to main content

Human-readable Microsoft Teams meeting transcripts.

Project description

Teams transcript formatter

PyPI version Python versions License

The purpose of this tool is to make Microsoft Teams meeting transcripts easier to read and analyse using tools such as NVivo, QualCoder, or (slightly unconventional) Obsidian via obsidian-chat-view plugin.

It processes .vtt transcripts downloaded from Microsoft Teams/Stream, merges adjacent blocks from the same speaker, and outputs a clean, formatted text file. Speaker names can optionally be renamed and assigned prefixes, and the output format is customisable via a template.

Installation

Run with uvx

No installation required — run it once-off with uvx:

uvx teams-transcript-formatter transcript.vtt

Install with pip or uv

Install from PyPI:

pip install teams-transcript-formatter
# or
uv tool install teams-transcript-formatter

After installation, teams-transcript-formatter will be available on your PATH:

teams-transcript-formatter transcript.vtt

From source

If you want to make changes to the source code you can clone the repository and install in editable mode:

git clone https://github.com/jmarshrossney/teams-transcript-formatter
cd teams-transcript-formatter
uv sync

Usage

Command-line tool

The teams-transcript-formatter script takes one or more .vtt files and prints the formatted output to stdout. To save the output to .txt files instead (with the naming convention <original_stem>_formatted.txt), use the -o flag to specify an output directory.

# Basic: keep original speaker names, default formatting
teams-transcript-formatter transcript.vtt

# Rename speakers (e.g. for an interview)
teams-transcript-formatter \
    --rename "John Smith=Interviewer" --rename "Jane Doe=Student" \
    --prefix "Interviewer=> " --prefix "Student=< " \
    transcript.vtt

# Custom output format
teams-transcript-formatter \
    --rename "John Smith=JS" --rename "Jane Doe=JD" \
    --template "{speaker}: {speech} [{timestamp}]" \
    transcript.vtt

Run teams-transcript-formatter -h for full guidance, including shell completion.

Flags

Flag Description
--rename Map original speaker names to display names: "OriginalName=DisplayName". Repeat for each speaker.
--prefix Assign a prefix to each display name: "DisplayName=>". Repeat for each speaker.
--template Python format string for output. Placeholders: {prefix}, {speaker}, {speech}, {timestamp}.
-o, --output Directory to save .txt files. If not given, prints to stdout.
--force Overwrite existing output files instead of refusing
-q, --quiet Suppress all non-error output
--version Show the version and exit
-h, --help Show the help message and exit

Examples

Say we have a Teams transcript file named transcript.vtt:

$ head -11 transcript.vtt
WEBVTT

91b3f3c3-44c6-4a8b-8c0a-add105d816bd/32-0
00:00:10.087 --> 00:00:13.130
<v John Smith>Hello, I am the interviewer.</v>

91b3f3c3-44c6-4a8b-8c0a-add105d816bd/32-1
00:00:13.130 --> 00:00:16.270
<v Jane Doe>Nice. I am the student being interviewed,
and I have many things to say.</v>

Default format

No flags — original speaker names, default template, print to stdout.

$ teams-transcript-formatter transcript.vtt
John Smith | Hello, I am the interviewer. | 00:00:10

Jane Doe | Nice. I am the student being interviewed, and I have many things to say. | 00:00:13

Rename speakers

Map original names to display names with --rename.

$ teams-transcript-formatter \
    --rename "John Smith=Interviewer" --rename "Jane Doe=Student" \
    -o . transcript.vtt
$ head -3 transcript_formatted.txt
Interviewer | Hello, I am the interviewer. | 00:00:10

Student | Nice. I am the student being interviewed, and I have many things to say. | 00:00:13

Add prefixes

Combine --rename with --prefix to visually distinguish speakers. Prefixes are keyed on the display name (after renaming).

$ teams-transcript-formatter \
    --rename "John Smith=Interviewer" --rename "Jane Doe=Student" \
    --prefix "Interviewer=> " --prefix "Student=< " \
    -o . transcript.vtt
$ head -3 transcript_formatted.txt
> Interviewer | Hello, I am the interviewer. | 00:00:10

< Student | Nice. I am the student being interviewed, and I have many things to say. | 00:00:13

Custom output template

Control the output format with --template. Available placeholders: {prefix}, {speaker}, {speech}, {timestamp}.

$ teams-transcript-formatter \
    --rename "John Smith=JS" --rename "Jane Doe=JD" \
    --template "[{timestamp}] {speaker}: {speech}" \
    -o . transcript.vtt
$ head -3 transcript_formatted.txt
[00:00:10] JS: Hello, I am the interviewer.

[00:00:13] JD: Nice. I am the student being interviewed, and I have many things to say.

Full customisation

All three flags together — rename, prefix, and template.

$ teams-transcript-formatter \
    --rename "John Smith=Interviewer" --rename "Jane Doe=Student" \
    --prefix "Interviewer=> " --prefix "Student=< " \
    --template "{prefix}{speaker}: {speech} [{timestamp}]" \
    -o . transcript.vtt
$ head -3 transcript_formatted.txt
> Interviewer: Hello, I am the interviewer. [00:00:10]

< Student: Nice. I am the student being interviewed, and I have many things to say. [00:00:13]

Selective prefixes

Pass an empty value to --prefix to suppress the prefix for a given speaker.

$ teams-transcript-formatter \
    --rename "John Smith=Interviewer" --rename "Jane Doe=Student" \
    --prefix "Interviewer=> " --prefix "Student=" \
    -o . transcript.vtt
$ head -3 transcript_formatted.txt
> Interviewer | Hello, I am the interviewer. | 00:00:10

Student | Nice. I am the student being interviewed, and I have many things to say. | 00:00:13

Privacy

Speaker names can be replaced using the --rename flag. All other redactions of sensitive and identifiable information must be performed before running this script.

Tip: the auto-generated transcripts can be edited in-situ using the Microsoft Stream app.

Remember to delete the original transcripts after running this script!

Roadmap & contributing

There are some fairly simple additions that would make this more generally useful:

  • Handle meetings with >2 participants
  • User can configure how names are handled
  • Configure the output format, e.g. using a template
  • Handle Zoom meetings
  • Output to different file formats (realistically, .docx would probably be the most useful to folks.)

Suggestions for improvements are welcome. Contributions even more so! Just open an issue or pull request.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

teams_transcript_formatter-0.3.3.tar.gz (21.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

teams_transcript_formatter-0.3.3-py3-none-any.whl (21.7 kB view details)

Uploaded Python 3

File details

Details for the file teams_transcript_formatter-0.3.3.tar.gz.

File metadata

  • Download URL: teams_transcript_formatter-0.3.3.tar.gz
  • Upload date:
  • Size: 21.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for teams_transcript_formatter-0.3.3.tar.gz
Algorithm Hash digest
SHA256 98ca7d00a08b51f7b19e12de15a4e6d5d100c630de751c7e4b940473d37e2fb5
MD5 528f7c1b5170d774a5fe3fa5626e0b28
BLAKE2b-256 0798b04ad0b5c37210f6baa5237fff623803de3bc07c3adaad65f6f2a7c16028

See more details on using hashes here.

File details

Details for the file teams_transcript_formatter-0.3.3-py3-none-any.whl.

File metadata

  • Download URL: teams_transcript_formatter-0.3.3-py3-none-any.whl
  • Upload date:
  • Size: 21.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.9 {"installer":{"name":"uv","version":"0.11.9","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for teams_transcript_formatter-0.3.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e2706769820439f6b12a1e686e5f559389c9f6ea384ded0286e5b596a6d0f435
MD5 7073ac561e0f084a04cfbb4c74a39930
BLAKE2b-256 80f9eb24ee1a4dc600871c7db5880eef2421d0990daf9824eb70bd7aecc1cd8d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page