Advanced FLAC authenticity analyzer - Detects MP3-to-FLAC transcodes with high precision
Project description
๐ต FLAC Detective
Advanced FLAC Authenticity Analyzer for Detecting MP3-to-FLAC Transcodes
FLAC Detective is a professional-grade command-line tool that analyzes FLAC audio files to detect MP3-to-FLAC transcodes with high precision. Using advanced spectral analysis and an 11-rule scoring system, it helps you maintain an authentic lossless music collection.
๐ What's new in v0.14.0 โ Stereo CNN (May 2026)
v0.13 gated around Rule 12's blind spot on band-limited music; v0.14 fixes it. The insight: the model was listening in mono, but MP3 joint-stereo coding leaves its clearest fingerprint in the side channel (LโR) โ exactly where a band-limited transcode is otherwise invisible. A controlled probe nailed it: on band-limited material a mid-only CNN is a coin flip (AUC 0.51), while the same CNN on mid+side hits 0.72, even at 320 kbps. So we retrained EfficientNet-B0 with a 2-channel (mid+side) input.
| Held-out test | v3 (mono) | v4 (stereo) |
|---|---|---|
| Balanced accuracy | 0.834 | 0.905 |
| Recall (transcoded) | 86.9 % | 94.1 % |
| Specificity (recall on authentic) | 80.0 % | 86.9 % |
On the real library of 11 234 authentic FLACs, false positives drop in every rolloff regime; shipped as v4 + the v0.13 reliability gate, real-world specificity reaches 95.1 % (from v3's 80.2 %). The reversal of the v0.13 "fundamental limit" conclusion โ and the bit-depth confound and audit-offset bug caught along the way โ is written up in ml/README.md.
๐ What's new in v0.13.0 โ Reliability Gate (May 2026)
No new model โ a small, empirically-grounded gate that fixes v3's one weak spot: false alarms on band-limited music (baroque, historical, acoustic). An audit of all 11 234 certified-authentic FLACs showed the model's false-positive rate ran from 5 % on full-range material to 57 % below 4 kHz of rolloff โ because when a recording already rolls off that early, an MP3 transcode removes nothing detectable, so authentic and fake are physically near-identical.
We exhausted the alternatives (threshold tuning, compression ratio, stereo and in-band texture, MP3 frame-rate modulation โ none separate them) and concluded it's a near-fundamental limit. So Rule 12 now abstains where its precision is a coin flip (rolloff < 7 kHz) and defers to the heuristic rules:
| Metric | v0.12 (v3) | v0.13 |
|---|---|---|
| Specificity (recall on authentic) | 80.2 % | 92.8 % |
The only detection given up is in a regime where Rule 12 was guessing anyway โ and where a transcode is the least harmful (a 320 kbps MP3 of a 5 kHz-bandwidth source is sonically transparent). The full R&D write-up โ the audit, the four dead ends, the debunked false discovery โ is in ml/README.md.
๐ What's new in v0.12.0 โ ML v3 (May 2026)
Smaller, faster, more accurate. Same conservative philosophy.
| Metric | v0.11 (v2) | v0.12 (v3) | ฮ |
|---|---|---|---|
| Balanced accuracy | 0.811 | 0.834 | +0.023 |
| Recall on transcoded | 82.7 % | 86.9 % | +4.2 pp |
| Recall on authentic (specificity) | 80.0 % | 80.0 % | โ |
| Model size (bundled) | 43 MB | 16 MB | โ63 % |
| Architecture | ResNet-18 | EfficientNet-B0 |
4 more transcoded files out of every 100 are now caught, at the same false-positive rate. The wheel is also 27 MB lighter.
Under the hood: more training data (5 964 ร 10 codecs = 65 244 samples vs 24 451), EfficientNet-B0 pretrained replacing ResNet-18, Mixup augmentation, cosine annealing LR, and mmap-backed feature loading (the 27 GB feature tensor stays on disk so the training process plays nice on shared hosts). Full story in the CHANGELOG.
๐ฐ๏ธ What's new in v0.11.0 โ ML v2, Properly Trained (May 2026)
The 12th scoring rule, introduced in v0.10.0, was technically functional but had a 95 % false-positive rate on authentic FLAC files. v0.11.0 ships a properly-trained model that fixes this.
| Metric | v0.10 (v1) | v0.11 (v2) |
|---|---|---|
| Balanced accuracy | ~0.55 | 0.81 |
| Specificity (recall on authentic) | 4.5 % | 80 % |
| Precision (transcoded) | 87.5 % | 97.6 % |
| Threshold needed for safe use | 0.85 (hack) | 0.5 (natural) |
| Architecture | Custom 5-block CNN | ResNet-18 (ImageNet-pretrained) |
| Model size | 1.6 MB | 43 MB |
The 80 % specificity is the headline: out of 333 known-authentic test files, v1 misclassified 318 as transcoded; v2 misclassifies 68. Almost a 20ร drop in false positives.
The path to a working model was five training attempts that taught specific lessons (focal loss double-balancing, biased F1 selection, insufficient model capacity, and the root cause: feature extraction was downsampling audio to 22 kHz, erasing the very MP3 cutoff signal we were trying to learn). The full story is in the CHANGELOG and ml/README.md.
- Opt-in via
pip install "flac-detective[ml]". PyTorch and librosa are optional โ without them, Rule 12 is a graceful no-op and the existing 11-rule pipeline runs unchanged. - Trained on Hetzner GPU (RTX 4000 Ada) over 2 237 certified-authentic FLACs (CD rips verified by EAC / XLD / Audiochecker logs) plus 22 258 transcodes generated across 10 codec/bitrate combinations (MP3 CBR 128/192/256/320, MP3 VBR V0/V2, AAC 192/256, Opus 128, Vorbis q5).
- Reproducible: the full training pipeline lives in
ml/. Eight scripts, onerun_pipeline.shto chain them, ~2 h end-to-end on a modest GPU.
For the v0.9.7 โ v0.10.1 fix trail (circular import, Docker image, documentation refresh, CLI catch-up, branch protection, โฆ) see the CHANGELOG.
โจ Key Features
- ๐ฏ High Precision Detection: 11-rule scoring system with intelligent protection mechanisms
- ๐ 4-Level Verdict System: Clear confidence ratings from AUTHENTIC to FAKE_CERTAIN
- โก Performance Optimized: 80% faster than baseline through smart caching and parallel processing
- ๐ Advanced Analysis: Spectral analysis, compression artifact detection, and multi-segment validation
- ๐ก๏ธ Protection Layers: Prevents false positives for vinyl rips, cassette transfers, and high-quality MP3s
- ๐ Flexible Output: Console reports with Rich formatting, JSON export, and detailed logging
- ๐ง Robust Error Handling: Automatic retries, partial file reading, and comprehensive diagnostic tracking
- ๐จ Automatic Repair: Corrupted FLAC files are automatically repaired with full metadata preservation
- ๐ค CNN classifier (optional): A small ML model bundled with the package adds a 12th scoring rule on borderline cases.
pip install "flac-detective[ml]"to enable.
๐ Quick Start
Installation
# Install via pip (Recommended)
pip install flac-detective
# OR with the optional CNN classifier (Rule 12)
pip install "flac-detective[ml]"
# OR run with Docker (multi-arch: linux/amd64 + linux/arm64)
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
Upgrading to the latest version
pip install flac-detective does not upgrade an existing install โ if
you already have an older version, pip prints Requirement already satisfied and exits without doing anything. To get the latest release,
add the --upgrade flag (short form -U):
# Upgrade to the latest version on PyPI
pip install --upgrade flac-detective
# Same thing with the optional ML extra
pip install --upgrade "flac-detective[ml]"
# Verify the new version
flac-detective --version
# Docker: pull again to refresh the image
docker pull ghcr.io/guillain-rdcde/flac_detective:latest
๐ฆ See Getting Started for complete installation instructions.
Basic Usage
# Analyze current directory
flac-detective .
# Analyze specific directory
flac-detective /path/to/music
# Interactive mode (prompts for paths, accepts drag-and-drop in Windows cmd)
flac-detective
Common Options
# Show version and help
flac-detective --version
flac-detective --help
# Verbose log + JSON output to a custom path
flac-detective -v --format json --output report.json /music
# Quick scan (15 s sample instead of default 30 s)
flac-detective --sample-duration 15 /music
๐ See User Guide for detailed usage examples and command line options.
Try it Now (No Installation Required)
Option 1: Docker with Sample File
# Download a sample FLAC file (public domain)
curl -O https://archive.org/download/test_flac/sample.flac
# Run analysis with Docker (mount current directory)
docker run --rm -v "$(pwd)":/data ghcr.io/guillain-rdcde/flac_detective:latest /data/sample.flac
Option 2: Quick Python Test
# Using Python (if you have pip installed)
pip install flac-detective
flac-detective --version
flac-detective --help
Option 3: Interactive Demo Script โญ (Best for Quick Test)
# Clone and run demo with synthetic test files
git clone https://github.com/Guillain-RDCDE/FLAC_Detective.git
cd FLAC_Detective
pip install -e .
python examples/quick_test.py
This creates test files and shows FLAC Detective in action in 30 seconds!
Option 4: GitHub Codespaces (Fully Interactive Online)
- Click the "Code" button โ "Codespaces" โ "Create codespace"
- Wait for environment setup (~30 seconds)
- Run:
pip install -e . && python examples/quick_test.py
No sample files? The tool works with any FLAC file from your music collection!
๐ฌ Demo
Live Demo
Watch FLAC Detective analyze files with real-time progress bars and colored output!
Example Output
======================================================================
FLAC AUTHENTICITY ANALYZER
Detection of MP3s transcoded to FLAC
======================================================================
โ Analyzing audio files... โโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโโ 15% 0:02:34
======================================================================
ANALYSIS COMPLETE
======================================================================
FLAC files analyzed: 245
Authentic files: 215 (87.8%)
Fake/Suspicious files: 12 (4.9%)
Text report: flac_report_20251220_143022.txt
======================================================================
โก Performance
FLAC Detective is optimized for both speed and accuracy:
- Speed: 2-5 seconds per file (30s sample, default)
- Throughput: 700-1,800 files/hour on modern hardware
- Memory: ~150-300 MB peak usage
- Optimization: 80% faster than baseline through intelligent caching and parallel processing
- Scalability: Handles libraries with 10,000+ files efficiently
Customizable Performance:
# Faster analysis (15s per file) - good for quick scans
flac-detective /music --sample-duration 15
# Balanced (30s per file) - default, recommended
flac-detective /music
# More thorough (60s per file) - maximum accuracy
flac-detective /music --sample-duration 60
โ Frequently Asked Questions
Does it work on Windows/Mac/Linux?
Yes! FLAC Detective is cross-platform and works on:
- โ Windows (7, 10, 11)
- โ macOS (10.14+)
- โ Linux (all major distributions)
How accurate is the detection?
FLAC Detective uses an 11-rule scoring system with protection layers:
- High confidence: >95% accuracy for AUTHENTIC and FAKE_CERTAIN verdicts
- Protection mechanisms: Prevents false positives for vinyl rips, cassette transfers, and high-quality sources
- 4-level system: AUTHENTIC, WARNING, SUSPICIOUS, FAKE_CERTAIN for nuanced results
Will it damage or modify my files?
No! FLAC Detective is read-only by default:
- โ Only analyzes files, never modifies them
- โ Safe for your entire music collection
- โ
Optional
--repairflag for corrupted files (preserves all metadata)
Can I trust the results?
Yes, but use common sense:
- โ AUTHENTIC (score โค30): Very high confidence, keep the file
- โก WARNING (31-60): Borderline case, manual verification recommended
- โ ๏ธ SUSPICIOUS (61-85): High confidence transcode, consider replacing
- โ FAKE_CERTAIN (โฅ86): Multiple indicators, definitely a transcode
For critical decisions, use complementary tools (e.g., Spek for visual spectral analysis) to confirm.
What file formats are supported?
Currently:
- โ FLAC files (.flac)
- ๐ Future: WAV, ALAC, APE (planned for v1.0)
How long does analysis take?
- Single file: 2-5 seconds (30s sample)
- 100 files: ~5-10 minutes
- 1,000 files: ~50-90 minutes
- 10,000 files: ~8-15 hours
Use --sample-duration 15 for faster scans of large libraries.
Can I use it in my own application?
Yes! FLAC Detective provides a Python API:
from flac_detective import FLACAnalyzer
analyzer = FLACAnalyzer()
result = analyzer.analyze_file("song.flac")
print(result['verdict']) # AUTHENTIC, WARNING, SUSPICIOUS, or FAKE_CERTAIN
See examples/ directory for integration examples.
Is it free and open source?
Yes! MIT License:
- โ Free for personal and commercial use
- โ Open source on GitHub
- โ Contributions welcome
How can I contribute?
See CONTRIBUTING.md for:
- Bug reports and feature requests
- Code contributions
- Documentation improvements
- Testing and feedback
๐ Documentation
Detailed documentation is available in the docs/ directory:
- Documentation Index - Overview and navigation
- Getting Started - Installation and first analysis
- User Guide - Complete usage guide with examples
- Technical Details - Deep dive into detection rules and algorithms
- API Reference - Python API documentation
- Contributing - Development guide
๐ฏ Use Cases
- Library Maintenance: Clean your music collection of fake lossless files
- Quality Verification: Validate FLAC authenticity before archiving
- Batch Processing: Analyze large music libraries efficiently
- Format Validation: Ensure genuine lossless quality for critical listening
๐ก Quick Examples
See the examples/ directory for ready-to-run scripts:
- basic_usage.py - Simple file and directory analysis
- batch_processing.py - Process multiple directories with statistics
- json_export.py - Export results to JSON for further processing
- api_integration.py - Advanced API usage and integration patterns
๐ค Contributing
Contributions are welcome! Please read our CONTRIBUTING.md for detailed guidelines and CODE_OF_CONDUCT.md for community standards.
๐ Security
For security policy and vulnerability reporting, please see SECURITY.md.
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Support
- Issues: GitHub Issues
- Discussions: GitHub Discussions
- Security: see SECURITY.md
๐ Acknowledgements
Thanks to the community members who took the time to report bugs and confirm fixes โ first issues are special.
- @GearKite โ Filed #7 with a clean traceback that pinpointed the circular import in v0.9.6, and #6 spotting the underscore-vs-dash Docker image name.
- @Aakiles โ Diagnosed the circular import end-to-end and shipped a working patch via comment. The v0.9.7 fix is a refinement of his approach.
- @AnotherMuggle and @tomelephant-git โ Confirmed the fix across operating systems, including Windows 11 LTSC.
- @AKHwyJunkie โ Confirmed the v0.9.6 import crash, validating @GearKite's report.
- @pblue3 โ First reported the Docker image inaccessibility (#6).
โญ Star History
FLAC Detective - Maintaining authentic lossless audio collections
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flac_detective-0.14.1.tar.gz.
File metadata
- Download URL: flac_detective-0.14.1.tar.gz
- Upload date:
- Size: 15.4 MB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9608ef0a59ee309aeda83d3bb4048a99a5f76c1c0455c22828107bf025ea0edf
|
|
| MD5 |
c110c3eeb6769d9044ebdc9cc2bdd6c6
|
|
| BLAKE2b-256 |
bdc896252a8656c157b42d69264d40c9f569c942053f24118d8386c20156724d
|
Provenance
The following attestation bundles were made for flac_detective-0.14.1.tar.gz:
Publisher:
release.yml on Guillain-RDCDE/FLAC_Detective
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flac_detective-0.14.1.tar.gz -
Subject digest:
9608ef0a59ee309aeda83d3bb4048a99a5f76c1c0455c22828107bf025ea0edf - Sigstore transparency entry: 1681445584
- Sigstore integration time:
-
Permalink:
Guillain-RDCDE/FLAC_Detective@1127dd0454327452c9b32d090d83377cee35d779 -
Branch / Tag:
refs/tags/v0.14.1 - Owner: https://github.com/Guillain-RDCDE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1127dd0454327452c9b32d090d83377cee35d779 -
Trigger Event:
push
-
Statement type:
File details
Details for the file flac_detective-0.14.1-py3-none-any.whl.
File metadata
- Download URL: flac_detective-0.14.1-py3-none-any.whl
- Upload date:
- Size: 15.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
7377daf8e8c8129740724511b1e57e0504d587a302063b4549bfabb3d517595f
|
|
| MD5 |
fcddd5beb0b5ea9d067761b7e076f9a5
|
|
| BLAKE2b-256 |
ce7d45fd44e119adcac5e2187b4e8405c64de9120ab0052382c17b54eedb689d
|
Provenance
The following attestation bundles were made for flac_detective-0.14.1-py3-none-any.whl:
Publisher:
release.yml on Guillain-RDCDE/FLAC_Detective
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
flac_detective-0.14.1-py3-none-any.whl -
Subject digest:
7377daf8e8c8129740724511b1e57e0504d587a302063b4549bfabb3d517595f - Sigstore transparency entry: 1681445724
- Sigstore integration time:
-
Permalink:
Guillain-RDCDE/FLAC_Detective@1127dd0454327452c9b32d090d83377cee35d779 -
Branch / Tag:
refs/tags/v0.14.1 - Owner: https://github.com/Guillain-RDCDE
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
release.yml@1127dd0454327452c9b32d090d83377cee35d779 -
Trigger Event:
push
-
Statement type: