Skip to main content

Audio transcription and speech analysis for the analyser family

Project description

speech-analyser

Transcribes audio and video files and returns speech metrics: word count, speaking rate, filler word detection, silence ratio, and a quality score with natural-language insights. Optionally identifies individual speakers.

Part of the analyser family.

Install

pip install speech-analyser

Requires Python 3.11+. Uses CPU by default; GPU (CUDA) is used automatically if available.

For speaker diarization, install the extra and set a Hugging Face token:

pip install "speech-analyser[diarization]"
export HF_TOKEN=hf_...

Usage

Python

from speech_analyser import SpeechAnalyser

lens = SpeechAnalyser()             # model_size="base" by default
result = lens.analyse("recording.mp3")

m = result["speech_metrics"]
print(f"Duration:  {result['duration']:.1f}s")
print(f"Words:     {m['word_count']} ({m['speaking_rate_wpm']} wpm, {m['pace_category']})")
print(f"Quality:   {m['quality_score']}/100")
print(result["transcript"])

CLI

# Human-readable summary
speech-analyser analyse recording.mp3

# Larger model for better accuracy
speech-analyser analyse lecture.wav --model small

# Machine-readable JSON
speech-analyser analyse recording.m4a --json

# Speaker diarization
speech-analyser analyse interview.mp3 --diarize

# Start the HTTP server
speech-analyser serve --port 8001

HTTP API

curl -X POST http://localhost:8001/analyse \
  -F "file=@recording.mp3"

Supported formats

Audio: .mp3 .wav .m4a .ogg .flac .aac .wma .opus

Video: .mp4 .mov .avi .mkv .webm — audio track is extracted automatically.

Model sizes

Model Speed Accuracy
tiny fastest lowest
base fast good (default)
small medium better
medium slow very good
large-v3 slowest best

Models download on first use (~75 MB for base, ~1.5 GB for large-v3).

Output

{
  "transcript": "Good morning everyone...",
  "language": "en",
  "duration": 62.4,
  "segments": [{"start": 0.0, "end": 3.2, "text": "Good morning everyone", "speaker": null}],
  "speech_metrics": {
    "word_count": 120,
    "speaking_rate_wpm": 115.4,
    "pace_category": "natural",
    "filler_word_count": 3,
    "filler_word_rate": 0.025,
    "filler_words_found": ["um", "basically"],
    "silence_ratio": 0.18,
    "actual_speaking_time": 51.2,
    "quality_score": 78,
    "quality_factors": {"clarity": 23, "depth": 18, "balance": 18, "pace": 19},
    "quality_ratings": {"clarity": "excellent", "depth": "good", "balance": "good", "pace": "good"},
    "insights": {
      "strengths": ["Very few filler words — speech is clear"],
      "observations": ["Speaking rate is slightly slow — aim for 130–170 wpm"]
    }
  },
  "diarization_available": false,
  "speakers": null,
  "talk_time": null,
  "file_path": "/path/to/recording.mp3",
  "file_size": 2048000
}

When diarization is enabled, speakers contains per-speaker word count, duration, and percentage; talk_time.is_balanced flags whether one speaker dominates.

The analyser family

Low-level analysis tools. Each accepts files directly and returns structured JSON. Build your own UI or pipeline on top.

Package Handles
speech-analyser audio and video files — transcript and speech metrics
video-analyser video files — frames, scenes, and visual quality
document-analyser PDF, DOCX, PPTX, TXT — text and readability
code-analyser source code — style, complexity, and quality metrics
records-analyser CSV, Excel, SQLite, Parquet, JSON — data profiling
auto-analyser any file — detects format and routes to the right tool

Licence

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

speech_analyser-0.3.1.tar.gz (253.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

speech_analyser-0.3.1-py3-none-any.whl (17.6 kB view details)

Uploaded Python 3

File details

Details for the file speech_analyser-0.3.1.tar.gz.

File metadata

  • Download URL: speech_analyser-0.3.1.tar.gz
  • Upload date:
  • Size: 253.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.0

File hashes

Hashes for speech_analyser-0.3.1.tar.gz
Algorithm Hash digest
SHA256 698dd448d2e287796f5287a578cf99d196bea7005ebac1392fec280eca473ebd
MD5 e52c0f72a758d45d6cce7cfaac91e2e9
BLAKE2b-256 65eccab9a3043d047d3c3b72ac84c22b842b95d154b48f0a506603593b843c84

See more details on using hashes here.

File details

Details for the file speech_analyser-0.3.1-py3-none-any.whl.

File metadata

File hashes

Hashes for speech_analyser-0.3.1-py3-none-any.whl
Algorithm Hash digest
SHA256 2cfaa84aba699b99e5fe3ca878c7e62434c1b3525fb6ec2e54d80cf874775832
MD5 2a279537fcd3b2a0047641627bfb1673
BLAKE2b-256 2c03f3e9f384210474b5f52d06b061f97a79eaccce949299f8d31c0fbb0699b5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page