Audio transcription and speech analysis for the analyser family

Project description

speech-analyser

Transcribes audio and video files and returns speech metrics: word count, speaking rate, filler word detection, silence ratio, and a quality score with natural-language insights. Optionally identifies individual speakers.

Part of the analyser family.

Install

pip install speech-analyser

Requires Python 3.11+. Uses CPU by default; GPU (CUDA) is used automatically if available.

For speaker diarization, install the extra and set a Hugging Face token:

pip install "speech-analyser[diarization]"
export HF_TOKEN=hf_...

Usage

Python

from speech_analyser import SpeechAnalyser

lens = SpeechAnalyser()             # model_size="base" by default
result = lens.analyse("recording.mp3")

m = result["speech_metrics"]
print(f"Duration:  {result['duration']:.1f}s")
print(f"Words:     {m['word_count']} ({m['speaking_rate_wpm']} wpm, {m['pace_category']})")
print(f"Quality:   {m['quality_score']}/100")
print(result["transcript"])

CLI

# Human-readable summary
speech-analyser analyse recording.mp3

# Larger model for better accuracy
speech-analyser analyse lecture.wav --model small

# Machine-readable JSON
speech-analyser analyse recording.m4a --json

# Speaker diarization
speech-analyser analyse interview.mp3 --diarize

# Start the HTTP server
speech-analyser serve --port 8001

HTTP API

curl -X POST http://localhost:8001/analyse \
  -F "file=@recording.mp3"

Supported formats

Audio: .mp3 .wav .m4a .ogg .flac .aac .wma .opus

Video: .mp4 .mov .avi .mkv .webm — audio track is extracted automatically.

Model sizes

Model	Speed	Accuracy
`tiny`	fastest	lowest
`base`	fast	good (default)
`small`	medium	better
`medium`	slow	very good
`large-v3`	slowest	best

Models download on first use (~75 MB for base, ~1.5 GB for large-v3).

Output

{
  "transcript": "Good morning everyone...",
  "language": "en",
  "duration": 62.4,
  "segments": [{"start": 0.0, "end": 3.2, "text": "Good morning everyone", "speaker": null}],
  "speech_metrics": {
    "word_count": 120,
    "speaking_rate_wpm": 115.4,
    "pace_category": "natural",
    "filler_word_count": 3,
    "filler_word_rate": 0.025,
    "filler_words_found": ["um", "basically"],
    "silence_ratio": 0.18,
    "actual_speaking_time": 51.2,
    "quality_score": 78,
    "quality_factors": {"clarity": 23, "depth": 18, "balance": 18, "pace": 19},
    "quality_ratings": {"clarity": "excellent", "depth": "good", "balance": "good", "pace": "good"},
    "insights": {
      "strengths": ["Very few filler words — speech is clear"],
      "observations": ["Speaking rate is slightly slow — aim for 130–170 wpm"]
    }
  },
  "diarization_available": false,
  "speakers": null,
  "talk_time": null,
  "file_path": "/path/to/recording.mp3",
  "file_size": 2048000
}

When diarization is enabled, speakers contains per-speaker word count, duration, and percentage; talk_time.is_balanced flags whether one speaker dominates.

The analyser family

Low-level analysis tools. Each accepts files directly and returns structured JSON. Build your own UI or pipeline on top.

Package	Handles
speech-analyser	audio and video files — transcript and speech metrics
video-analyser	video files — frames, scenes, and visual quality
document-analyser	PDF, DOCX, PPTX, TXT — text and readability
code-analyser	source code — style, complexity, and quality metrics
records-analyser	CSV, Excel, SQLite, Parquet, JSON — data profiling
auto-analyser	any file — detects format and routes to the right tool

Licence

MIT

Project details

Release history Release notifications | RSS feed

This version

0.3.1

May 23, 2026

0.3.0

May 23, 2026

0.2.3

May 8, 2026

0.2.2

May 7, 2026

0.2.1

May 7, 2026

0.2.0

May 7, 2026

0.1.0

May 5, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_analyser-0.3.1.tar.gz (253.9 kB view details)

Uploaded May 23, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

speech_analyser-0.3.1-py3-none-any.whl (17.6 kB view details)

Uploaded May 23, 2026 Python 3

File details

Details for the file speech_analyser-0.3.1.tar.gz.

File metadata

Download URL: speech_analyser-0.3.1.tar.gz
Upload date: May 23, 2026
Size: 253.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for speech_analyser-0.3.1.tar.gz
Algorithm	Hash digest
SHA256	`698dd448d2e287796f5287a578cf99d196bea7005ebac1392fec280eca473ebd`
MD5	`e52c0f72a758d45d6cce7cfaac91e2e9`
BLAKE2b-256	`65eccab9a3043d047d3c3b72ac84c22b842b95d154b48f0a506603593b843c84`

See more details on using hashes here.

File details

Details for the file speech_analyser-0.3.1-py3-none-any.whl.

File metadata

Download URL: speech_analyser-0.3.1-py3-none-any.whl
Upload date: May 23, 2026
Size: 17.6 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for speech_analyser-0.3.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2cfaa84aba699b99e5fe3ca878c7e62434c1b3525fb6ec2e54d80cf874775832`
MD5	`2a279537fcd3b2a0047641627bfb1673`
BLAKE2b-256	`2c03f3e9f384210474b5f52d06b061f97a79eaccce949299f8d31c0fbb0699b5`

See more details on using hashes here.

speech-analyser 0.3.1

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

speech-analyser

Install

Usage

Python

CLI

HTTP API

Supported formats

Model sizes

Output

The analyser family

Licence

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes