Audio transcription and speech analysis for the analyser family
Project description
speech-analyser
Transcribes audio and video files and returns speech metrics: word count, speaking rate, filler word detection, silence ratio, and a quality score with natural-language insights. Optionally identifies individual speakers.
Part of the analyser family.
Install
pip install speech-analyser
Requires Python 3.11+. Uses CPU by default; GPU (CUDA) is used automatically if available.
For speaker diarization, install the extra and set a Hugging Face token:
pip install "speech-analyser[diarization]"
export HF_TOKEN=hf_...
Usage
Python
from speech_analyser import SpeechAnalyser
lens = SpeechAnalyser() # model_size="base" by default
result = lens.analyse("recording.mp3")
m = result["speech_metrics"]
print(f"Duration: {result['duration']:.1f}s")
print(f"Words: {m['word_count']} ({m['speaking_rate_wpm']} wpm, {m['pace_category']})")
print(f"Quality: {m['quality_score']}/100")
print(result["transcript"])
CLI
# Human-readable summary
speech-analyser analyse recording.mp3
# Larger model for better accuracy
speech-analyser analyse lecture.wav --model small
# Machine-readable JSON
speech-analyser analyse recording.m4a --json
# Speaker diarization
speech-analyser analyse interview.mp3 --diarize
# Start the HTTP server
speech-analyser serve --port 8001
HTTP API
curl -X POST http://localhost:8001/analyse \
-F "file=@recording.mp3"
Supported formats
Audio: .mp3 .wav .m4a .ogg .flac .aac .wma .opus
Video: .mp4 .mov .avi .mkv .webm — audio track is extracted automatically.
Model sizes
| Model | Speed | Accuracy |
|---|---|---|
tiny |
fastest | lowest |
base |
fast | good (default) |
small |
medium | better |
medium |
slow | very good |
large-v3 |
slowest | best |
Models download on first use (~75 MB for base, ~1.5 GB for large-v3).
Output
{
"transcript": "Good morning everyone...",
"language": "en",
"duration": 62.4,
"segments": [{"start": 0.0, "end": 3.2, "text": "Good morning everyone", "speaker": null}],
"speech_metrics": {
"word_count": 120,
"speaking_rate_wpm": 115.4,
"pace_category": "natural",
"filler_word_count": 3,
"filler_word_rate": 0.025,
"filler_words_found": ["um", "basically"],
"silence_ratio": 0.18,
"actual_speaking_time": 51.2,
"quality_score": 78,
"quality_factors": {"clarity": 23, "depth": 18, "balance": 18, "pace": 19},
"quality_ratings": {"clarity": "excellent", "depth": "good", "balance": "good", "pace": "good"},
"insights": {
"strengths": ["Very few filler words — speech is clear"],
"observations": ["Speaking rate is slightly slow — aim for 130–170 wpm"]
}
},
"diarization_available": false,
"speakers": null,
"talk_time": null,
"file_path": "/path/to/recording.mp3",
"file_size": 2048000
}
When diarization is enabled, speakers contains per-speaker word count, duration, and percentage; talk_time.is_balanced flags whether one speaker dominates.
The analyser family
Low-level analysis tools. Each accepts files directly and returns structured JSON. Build your own UI or pipeline on top.
| Package | Handles |
|---|---|
| speech-analyser | audio and video files — transcript and speech metrics |
| video-analyser | video files — frames, scenes, and visual quality |
| document-analyser | PDF, DOCX, PPTX, TXT — text and readability |
| code-analyser | source code — style, complexity, and quality metrics |
| records-analyser | CSV, Excel, SQLite, Parquet, JSON — data profiling |
| auto-analyser | any file — detects format and routes to the right tool |
Licence
MIT
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file speech_analyser-0.3.1.tar.gz.
File metadata
- Download URL: speech_analyser-0.3.1.tar.gz
- Upload date:
- Size: 253.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
698dd448d2e287796f5287a578cf99d196bea7005ebac1392fec280eca473ebd
|
|
| MD5 |
e52c0f72a758d45d6cce7cfaac91e2e9
|
|
| BLAKE2b-256 |
65eccab9a3043d047d3c3b72ac84c22b842b95d154b48f0a506603593b843c84
|
File details
Details for the file speech_analyser-0.3.1-py3-none-any.whl.
File metadata
- Download URL: speech_analyser-0.3.1-py3-none-any.whl
- Upload date:
- Size: 17.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
2cfaa84aba699b99e5fe3ca878c7e62434c1b3525fb6ec2e54d80cf874775832
|
|
| MD5 |
2a279537fcd3b2a0047641627bfb1673
|
|
| BLAKE2b-256 |
2c03f3e9f384210474b5f52d06b061f97a79eaccce949299f8d31c0fbb0699b5
|