Human vs AI audio detection via Shannon entropy features
Project description
AudioSentinel
AudioSentinel detects whether an audio file is human-recorded or AI-generated, using Shannon entropy features and a Random Forest classifier.
- ✅ 100% accuracy on 294-sample blind test
- ✅ 30/30 cross-verified on held-out samples
- ✅ Lightweight — no GPU required, runs on CPU in <1s per file
- ✅ 52 handcrafted features: temporal, spectral & phase entropy + MFCC + spectral descriptors
Install
pip install audiosentinel
Or from source:
git clone https://github.com/yourname/audiosentinel
cd audiosentinel
pip install -e .
Quick Start
from audiosentinel import predict_audio, predict_int, predict_batch
# Full result with confidence
predict_audio('recording.wav')
# File : recording.wav
# Result : HUMAN
# Confidence : 74.8%
# P(AI)=0.252 P(Human)=0.748
# Integer only — 0=AI, 1=Human
label = predict_int('recording.wav')
print(label) # 1
# Batch
import glob
results = predict_batch(glob.glob('audio/*.wav'))
for r in results:
print(r['label'], r['prob_human'])
CLI
audiosentinel recording.wav
audiosentinel path/to/audio/*.wav
API Reference
predict_audio(path, verbose=True) → dict
| Key | Type | Description |
|---|---|---|
label |
str | "HUMAN" or "AI" |
pred |
int | 1 = Human, 0 = AI |
prob_ai |
float | Probability of AI origin |
prob_human |
float | Probability of Human origin |
predict_int(path) → int
Returns 0 (AI) or 1 (Human) only.
predict_batch(paths, verbose=False) → list[dict]
Runs inference on a list of WAV paths. Returns None for failed files.
extract_entropy_features(path) → dict
Returns raw 52-feature dict for a WAV file.
How It Works
- Audio loaded at 24kHz, silence trimmed
- Temporal entropy — Shannon entropy over time-domain frames
- Spectral entropy — Shannon entropy over STFT magnitude frames
- Phase entropy — Shannon entropy over STFT phase frames
- MFCC — 13 coefficients × mean + std = 26 features
- Spectral descriptors — ZCR, RMS, centroid, rolloff × mean + std = 8 features
- Random Forest (200 trees) classifies the 52-feature vector
Training Data
| Source | Class | Samples |
|---|---|---|
| LibriSpeech | Human | 1,500 |
| Kokoro TTS | AI | 1,500 |
Sample rate: 24kHz — all samples resampled internally.
Performance
| Model | CV Accuracy |
|---|---|
| LogReg (3-feat) | 83.2% |
| LogReg (all) | 94.7% |
| Gradient Boost | 95.5% |
| Random Forest | 96.7% |
| Tuned RF (final) | 100.0% |
Blind test (294 samples, unseen): 100% — 0 misclassifications
Limitations
- Trained on Kokoro TTS — confidence may vary on other TTS engines
- Best performance on speech audio; music/noise not tested
- Requires
scikit-learn==1.6.1to match model pickle version
Support This Work
If AudioSentinel is useful to you, consider buying a coffee or supporting development:
☕ Buy Me a Coffee: https://buymeacoffee.com
🤝 GitHub Sponsors: https://github.com/sponsors
Crypto donations welcome:
| Chain | Address |
|---|---|
| BTC | bc1qxz2qgfkh0fgs7ff3m0ft6wtluzk5rqhv472vws |
| ETH | 0x70282a83f0d6ef2f207d252cf3f7874c7663f625 |
| SOL | 91s2TYpn5P2W5xXyEk3q8nFPusY937YEiCNdFCKiYirz |
| LTC | ltc1qfcucqw08kus6vncc8egft7feswgflp0wee7rxj |
License
MIT — see LICENSE
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file audiosentinel-0.1.0.tar.gz.
File metadata
- Download URL: audiosentinel-0.1.0.tar.gz
- Upload date:
- Size: 200.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
89d86c73c8c117bd2ef6a048f82763c41855ec3f7dfcb1e1b49bfb43461ac9fe
|
|
| MD5 |
1d1c4aef8c6e1207a425c82dc8d159dc
|
|
| BLAKE2b-256 |
2328e3172187579eff7c121514070e90445e6a1e43aa54829cfdac744957bb1c
|
File details
Details for the file audiosentinel-0.1.0-py3-none-any.whl.
File metadata
- Download URL: audiosentinel-0.1.0-py3-none-any.whl
- Upload date:
- Size: 206.8 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f972a106e4df0940e639afb92d0f77c218547d0a946a1433c3a1f4ae8f0db24
|
|
| MD5 |
5867830911e375b74ebf734935bd81ff
|
|
| BLAKE2b-256 |
a562c443f48ac17ba9f69a16f7c0e529b979335449c0e300e4a9789145f865c7
|