Skip to main content

Advanced FLAC authenticity analyzer - Detects MP3-to-FLAC transcodes with high precision

Project description

๐ŸŽต FLAC Detective

FLAC Detective Banner

Python Version PyPI version Documentation Status License Status Coverage Badge codecov Code style: black Pre-commit

Advanced FLAC Authenticity Analyzer for Detecting MP3-to-FLAC Transcodes

FLAC Detective is a professional-grade command-line tool that analyzes FLAC audio files to detect MP3-to-FLAC transcodes with high precision. Using advanced spectral analysis and an 11-rule scoring system, it helps you maintain an authentic lossless music collection.


โœจ Key Features

  • ๐ŸŽฏ High Precision Detection: 11-rule scoring system with intelligent protection mechanisms
  • ๐Ÿ“Š 4-Level Verdict System: Clear confidence ratings from AUTHENTIC to FAKE_CERTAIN
  • โšก Performance Optimized: 80% faster than baseline through smart caching and parallel processing
  • ๐Ÿ” Advanced Analysis: Spectral analysis, compression artifact detection, and multi-segment validation
  • ๐Ÿ›ก๏ธ Protection Layers: Prevents false positives for vinyl rips, cassette transfers, and high-quality MP3s
  • ๐Ÿ“ Flexible Output: Console reports with Rich formatting, JSON export, and detailed logging
  • ๐Ÿ”ง Robust Error Handling: Automatic retries, partial file reading, and comprehensive diagnostic tracking
  • ๐Ÿ”จ Automatic Repair: Corrupted FLAC files are automatically repaired with full metadata preservation

๐Ÿš€ Quick Start

Installation

Option 1: Install via pip (Recommended)

pip install flac-detective

Option 2: Run with Docker

# Pull from GitHub Container Registry
docker pull ghcr.io/guillainm/flac-detective:latest

# Analyze files
docker run --rm -v /path/to/audio:/data ghcr.io/guillainm/flac-detective:latest /data

๐Ÿ“ฆ See Docker Guide for complete Docker usage documentation.

Basic Usage

Command Line

# Analyze current directory
flac-detective .

# Analyze specific directory
flac-detective /path/to/music

# Generate JSON report
flac-detective /path/to/music --format json

# Verbose output with detailed analysis
flac-detective /path/to/music --verbose

Docker

# Analyze a directory
docker run --rm -v /path/to/audio:/data ghcr.io/guillainm/flac-detective:latest /data

# With repair enabled
docker run --rm -v /path/to/audio:/data ghcr.io/guillainm/flac-detective:latest /data --repair

# Generate JSON report
docker run --rm -v /path/to/audio:/data ghcr.io/guillainm/flac-detective:latest /data --format json > report.json

๐Ÿ“– How It Works

Detection Rules

FLAC Detective uses 11 independent rules with additive scoring (0-150 points):

Rule Description Points
Rule 1 MP3 Spectral Signature (CBR patterns) +50
Rule 2 Cutoff Frequency Analysis +50
Rule 3 Bitrate Inflation Detection +50
Rule 4 Suspicious 24-bit Detection +30
Rule 5 High Variance Protection (VBR) -40
Rule 6 High Quality Protection -30
Rule 7 Vinyl & Silence Analysis -100
Rule 8 Nyquist Exception -50
Rule 9 Compression Artifacts +30
Rule 10 Multi-Segment Consistency Variable
Rule 11 Cassette Detection -60

Verdict System

Based on the total score, FLAC Detective assigns one of four verdicts:

Score โ‰ค 30   โ†’ โœ… AUTHENTIC      (High confidence - genuine lossless)
Score 31-60  โ†’ โšก WARNING        (Manual review recommended)
Score 61-85  โ†’ โš ๏ธ  SUSPICIOUS    (Likely transcode)
Score โ‰ฅ 86   โ†’ โŒ FAKE_CERTAIN   (Definite transcode)

Protection Mechanisms

The tool implements a multi-layer protection system to prevent false positives:

  1. Absolute Protection (Rule 8): Protects files with cutoff near Nyquist frequency
  2. MP3 320k Protection (Rule 1): Exception for high-quality MP3 320 kbps
  3. Analog Source Protection (Rules 7, 11): Detects vinyl rips and cassette transfers
  4. Dynamic Protection (Rule 10): Validates consistency across file segments

๐Ÿ†• What's New in v0.8.0

Automatic FLAC Repair with Metadata Preservation

  • Smart Corruption Detection: Automatically identifies corrupted FLAC files during analysis
  • Decode-Through-Errors: Recovers maximum audio data from corrupted files using flac --decode-through-errors
  • Complete Metadata Preservation: Extracts and restores all tags (TITLE, ARTIST, ALBUM, etc.) and album art
  • Automatic Source Replacement: Replaces corrupted files with repaired versions (creates .corrupted.bak backups)
  • Integrity Verification: All repaired files validated with flac --test before replacement
  • 6-Step Repair Process:
    1. Extract metadata (tags + pictures)
    2. Decode with error recovery
    3. Re-encode to clean FLAC
    4. Restore all metadata
    5. Verify integrity
    6. Replace source with backup

Enhanced Diagnostics

  • Detailed repair logging with step-by-step progress
  • Diagnostic tracking for all repair operations
  • Clear success/failure indicators
  • Reduces false "CORRUPTED" verdicts

Energy-Based Cutoff Detection

  • Critical Fix: Bass-heavy music no longer misidentified as MP3
  • Added 15 kHz minimum threshold to distinguish bass from MP3 artifacts
  • Impact: 77% reduction in false positives

Quality Improvements

  • False positives: 198 โ†’ 46 (-77%)
  • Authentic detection: 59 โ†’ 244 (+314%)
  • Overall quality score: 20.2% โ†’ 83.6%

๐Ÿ’ป Usage Examples

Command Line

# Basic analysis
flac-detective /path/to/music

# Save report to file
flac-detective /path/to/music --output report.txt

# JSON output for automation
flac-detective /path/to/music --format json > results.json

# Verbose mode with detailed rule execution
flac-detective /path/to/music --verbose

Python API

from flac_detective import FLACAnalyzer
from pathlib import Path

# Create analyzer
analyzer = FLACAnalyzer(sample_duration=30.0)

# Analyze a file
result = analyzer.analyze_file(Path('song.flac'))

print(f"Verdict: {result['verdict']}")
print(f"Score: {result['score']}/100")
print(f"Reason: {result['reason']}")

๐Ÿ“š Full API documentation available at flac-detective.readthedocs.io


๐Ÿ“ฆ Requirements

Python Dependencies

  • Python 3.8 or higher
  • numpy >= 1.20.0
  • scipy >= 1.7.0
  • mutagen >= 1.45.0
  • soundfile >= 0.10.0
  • rich >= 13.0.0

Optional System Dependencies

The flac command-line tool is recommended for advanced features:

Linux (Debian/Ubuntu):

sudo apt-get install flac

macOS:

brew install flac

Windows: Download from Xiph.org FLAC


๐Ÿ—๏ธ Development

Installation from Source

# Clone the repository
git clone https://github.com/GuillainM/FLAC_Detective.git
cd FLAC_Detective

# Create virtual environment
python -m venv venv

# Activate virtual environment
# Linux/macOS:
source venv/bin/activate
# Windows:
venv\Scripts\activate

# Install in development mode
pip install -e ".[dev]"

Running Tests

# Run all tests
pytest

# Run with coverage report
pytest --cov=flac_detective --cov-report=html

# Run specific test file
pytest tests/test_new_scoring_rules.py -v

Version Management & Releases

FLAC Detective uses Commitizen for automated changelog generation and version management.

# Install pre-commit hooks (includes commit message validation)
pre-commit install --hook-type commit-msg

# Create a conventional commit interactively
cz commit

# Bump version and update CHANGELOG automatically
cz bump --changelog

# Or use the helper script
python scripts/bump_version.py --dry-run  # Preview changes
python scripts/bump_version.py --push     # Bump and push to trigger release

All commits must follow the Conventional Commits format:

  • feat: - New features (bumps MINOR version)
  • fix: - Bug fixes (bumps PATCH version)
  • docs: - Documentation changes
  • refactor: - Code refactoring
  • perf: - Performance improvements

See docs/ci-cd/CHANGELOG_AUTOMATION.md for detailed documentation.

Project Structure

src/flac_detective/
โ”œโ”€โ”€ analysis/
โ”‚   โ”œโ”€โ”€ new_scoring/          # 11-rule scoring system
โ”‚   โ”‚   โ”œโ”€โ”€ rules/            # Individual rule implementations
โ”‚   โ”‚   โ”œโ”€โ”€ calculator.py     # Score orchestration
โ”‚   โ”‚   โ””โ”€โ”€ verdict.py        # Score interpretation
โ”‚   โ”œโ”€โ”€ spectrum.py           # Spectral analysis
โ”‚   โ””โ”€โ”€ audio_cache.py        # Optimized file reading
โ”œโ”€โ”€ reporting/                # Report generation
โ””โ”€โ”€ main.py                   # CLI entry point

๐Ÿ“š Documentation

๐Ÿ“– Official Documentation

Read the full documentation on Read the Docs

The complete documentation includes:

  • User Guide: Getting started, usage examples, troubleshooting
  • API Reference: Complete Python API documentation with examples
  • Technical Documentation: Architecture, algorithms, scoring rules
  • Development Guide: Contributing, testing, code quality

๐Ÿ“„ Additional Resources

Complete documentation available in the docs/ directory:


๐ŸŽฏ Use Cases

โœ… Ideal For

  • Library Maintenance: Clean your music collection of fake lossless files
  • Quality Verification: Validate FLAC authenticity before archiving
  • Batch Processing: Analyze large music libraries efficiently
  • Format Validation: Ensure genuine lossless quality for critical listening

โš ๏ธ Limitations

  • Only analyzes FLAC files (other lossless formats not supported)
  • Designed for batch analysis, not real-time processing
  • Detects transcodes, not subjective audio quality
  • May require manual review for edge cases (WARNING verdicts)

๐Ÿค Contributing

Contributions are welcome! Please read our CONTRIBUTING.md for detailed guidelines and CODE_OF_CONDUCT.md for community standards.

๐Ÿ“‹ Issue Templates

We provide templates for different types of contributions:

  1. ๐Ÿ› Bug Report: Report bugs or unexpected behavior
  2. โœจ Feature Request: Suggest new features or enhancements
  3. โšก Performance Issue: Report slow performance or resource issues
  4. ๐Ÿ“ Documentation Issue: Report documentation problems
  5. โ“ Question: Ask questions about usage

View Issue Templates Guide for detailed information.

How to Contribute

  1. Report Issues: Use the appropriate issue template
  2. Suggest Features: Submit a feature request
  3. Start Discussions: Join GitHub Discussions
  4. Submit PRs: Read CONTRIBUTING.md first, then fork the repo, create a feature branch, and submit a pull request
  5. Improve Docs: Documentation improvements are always appreciated

Community Guidelines

Please follow our Code of Conduct to maintain a welcoming and inclusive environment for all contributors.

Development Workflow

# Fork and clone the repository
git clone https://github.com/YOUR_USERNAME/FLAC_Detective.git
cd FLAC_Detective

# Install development dependencies
pip install -e ".[dev]"

# Set up pre-commit hooks for code quality
python scripts/setup_precommit.py
# Or manually: pre-commit install

# Create feature branch
git checkout -b feature/amazing-feature

# Make changes and run tests
pytest tests/unit/ -v                    # Unit tests
pytest tests/integration/ -v             # Integration tests
pytest --cov=flac_detective              # With coverage

# Code quality checks (runs automatically on commit via pre-commit hooks)
pre-commit run --all-files               # Run all checks manually
black src tests                          # Format code
isort src tests                          # Sort imports
flake8 src tests                         # Lint code
mypy src                                 # Type check

# Commit and push (pre-commit hooks run automatically)
git commit -m "Add amazing feature"
git push origin feature/amazing-feature

# Open Pull Request on GitHub

Python Version Requirements:

  • Supported: Python 3.8 - 3.12
  • Testing: Use Python 3.8-3.12 for running tests (scipy/numpy compatibility)

Running Tests:

# Run all unit tests
pytest tests/unit/ -v

# Run integration tests
pytest tests/integration/ -v

# Run with coverage report
pytest --cov=flac_detective --cov-report=html

# See tests/TESTING_STATUS.md for detailed testing guide

๐Ÿ”’ Security

Security is a priority for FLAC Detective. We use multiple automated tools to ensure code and dependency security.

Security Features

  • ๐Ÿ›ก๏ธ Dependabot: Automated dependency updates for security patches
  • ๐Ÿ” CodeQL: Static code analysis for vulnerability detection
  • ๐Ÿšจ Bandit: Python security linter
  • ๐Ÿ“ฆ Safety & Pip-audit: Dependency vulnerability scanners
  • ๐Ÿ“‹ Security Policy: Responsible disclosure process

Reporting Vulnerabilities

Please do NOT report security vulnerabilities through public GitHub issues.

Email security issues to: guillain@poulpe.us

See SECURITY.md for:

  • Supported versions
  • Reporting guidelines
  • Security best practices
  • Vulnerability disclosure process

Security Documentation


๐Ÿ“ License

This project is licensed under the MIT License - see the LICENSE file for details.


๐Ÿ™ Acknowledgments

  • Audio analysis community for MP3 compression research
  • Contributors to NumPy, SciPy, and Soundfile libraries
  • Beta testers and community feedback

๐Ÿ“ž Support


FLAC Detective v0.9.0 - Maintaining authentic lossless audio collections

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flac_detective-0.9.0.tar.gz (210.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flac_detective-0.9.0-py3-none-any.whl (105.9 kB view details)

Uploaded Python 3

File details

Details for the file flac_detective-0.9.0.tar.gz.

File metadata

  • Download URL: flac_detective-0.9.0.tar.gz
  • Upload date:
  • Size: 210.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for flac_detective-0.9.0.tar.gz
Algorithm Hash digest
SHA256 029ddae254576eef0d548737c0f02a5b1aa3d16d108937164591d98e48e84bca
MD5 3798784cacc276579f8f57f949108e30
BLAKE2b-256 9d22f3a235626b67b9bd7fb47635a1658c9911b7cea6fa9b076f38821a937d45

See more details on using hashes here.

File details

Details for the file flac_detective-0.9.0-py3-none-any.whl.

File metadata

  • Download URL: flac_detective-0.9.0-py3-none-any.whl
  • Upload date:
  • Size: 105.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.10.19

File hashes

Hashes for flac_detective-0.9.0-py3-none-any.whl
Algorithm Hash digest
SHA256 fdbc1bbab4b3b4b64f6a3181a7b7d336fe69051487ad847c372bb9f8dcc8efed
MD5 42cf91a48c80337659363dd728cfea21
BLAKE2b-256 8fee93059e0fadf82a3d4660b24fff68a44a6c3deec246f4c9afdd40feba2b1d

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page