Skip to main content

Human vs AI audio detection via Shannon entropy features

Project description

AudioSentinel

Python License Accuracy

AudioSentinel detects whether an audio file is human-recorded or AI-generated, using Shannon entropy features and a Random Forest classifier.

  • ✅ 100% accuracy on 294-sample blind test
  • ✅ 30/30 cross-verified on held-out samples
  • ✅ Lightweight — no GPU required, runs on CPU in <1s per file
  • ✅ 52 handcrafted features: temporal, spectral & phase entropy + MFCC + spectral descriptors

Install

pip install audiosentinel

Or from source:

git clone https://github.com/yourname/audiosentinel
cd audiosentinel
pip install -e .

Quick Start

from audiosentinel import predict_audio, predict_int, predict_batch

# Full result with confidence
predict_audio('recording.wav')
# File       : recording.wav
# Result     : HUMAN
# Confidence : 74.8%
# P(AI)=0.252  P(Human)=0.748

# Integer only — 0=AI, 1=Human
label = predict_int('recording.wav')
print(label)  # 1

# Batch
import glob
results = predict_batch(glob.glob('audio/*.wav'))
for r in results:
    print(r['label'], r['prob_human'])

CLI

audiosentinel recording.wav
audiosentinel path/to/audio/*.wav

API Reference

predict_audio(path, verbose=True) → dict

Key Type Description
label str "HUMAN" or "AI"
pred int 1 = Human, 0 = AI
prob_ai float Probability of AI origin
prob_human float Probability of Human origin

predict_int(path) → int

Returns 0 (AI) or 1 (Human) only.

predict_batch(paths, verbose=False) → list[dict]

Runs inference on a list of WAV paths. Returns None for failed files.

extract_entropy_features(path) → dict

Returns raw 52-feature dict for a WAV file.


How It Works

  1. Audio loaded at 24kHz, silence trimmed
  2. Temporal entropy — Shannon entropy over time-domain frames
  3. Spectral entropy — Shannon entropy over STFT magnitude frames
  4. Phase entropy — Shannon entropy over STFT phase frames
  5. MFCC — 13 coefficients × mean + std = 26 features
  6. Spectral descriptors — ZCR, RMS, centroid, rolloff × mean + std = 8 features
  7. Random Forest (200 trees) classifies the 52-feature vector

Training Data

Source Class Samples
LibriSpeech Human 1,500
Kokoro TTS AI 1,500

Sample rate: 24kHz — all samples resampled internally.


Performance

Model CV Accuracy
LogReg (3-feat) 83.2%
LogReg (all) 94.7%
Gradient Boost 95.5%
Random Forest 96.7%
Tuned RF (final) 100.0%

Blind test (294 samples, unseen): 100% — 0 misclassifications


Limitations

  • Trained on Kokoro TTS — confidence may vary on other TTS engines
  • Best performance on speech audio; music/noise not tested
  • Requires scikit-learn==1.6.1 to match model pickle version

Support This Work

If AudioSentinel is useful to you, consider buying a coffee or supporting development:

Buy Me a Coffee: https://buymeacoffee.com
🤝 GitHub Sponsors: https://github.com/sponsors

Crypto donations welcome:

Chain Address
BTC bc1qxz2qgfkh0fgs7ff3m0ft6wtluzk5rqhv472vws
ETH 0x70282a83f0d6ef2f207d252cf3f7874c7663f625
SOL 91s2TYpn5P2W5xXyEk3q8nFPusY937YEiCNdFCKiYirz
LTC ltc1qfcucqw08kus6vncc8egft7feswgflp0wee7rxj

License

MIT — see LICENSE

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

audiosentinel-0.1.0.tar.gz (200.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

audiosentinel-0.1.0-py3-none-any.whl (206.8 kB view details)

Uploaded Python 3

File details

Details for the file audiosentinel-0.1.0.tar.gz.

File metadata

  • Download URL: audiosentinel-0.1.0.tar.gz
  • Upload date:
  • Size: 200.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for audiosentinel-0.1.0.tar.gz
Algorithm Hash digest
SHA256 89d86c73c8c117bd2ef6a048f82763c41855ec3f7dfcb1e1b49bfb43461ac9fe
MD5 1d1c4aef8c6e1207a425c82dc8d159dc
BLAKE2b-256 2328e3172187579eff7c121514070e90445e6a1e43aa54829cfdac744957bb1c

See more details on using hashes here.

File details

Details for the file audiosentinel-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: audiosentinel-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 206.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.13

File hashes

Hashes for audiosentinel-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3f972a106e4df0940e639afb92d0f77c218547d0a946a1433c3a1f4ae8f0db24
MD5 5867830911e375b74ebf734935bd81ff
BLAKE2b-256 a562c443f48ac17ba9f69a16f7c0e529b979335449c0e300e4a9789145f865c7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page