Machine learning system for detecting AI-generated audio using Benford's Law and advanced spectral features
Project description
AI Audio Detector
A machine learning system for detecting AI-generated audio using Benford's Law analysis and advanced audio feature extraction. The system employs ensemble learning with adaptive model updating capabilities.
Features
- Multi-Model Ensemble: Uses Random Forest, Gradient Boosting, SGD, and Passive Aggressive classifiers
- Benford's Law Analysis: Analyzes frequency distributions for AI detection patterns
- Comprehensive Audio Features: Extracts spectral, temporal, and compression-related features
- Adaptive Learning: Supports incremental model updates with new data
- Batch Processing: Parallel processing for large audio datasets
- Spectrogram Generation: Creates and compares various types of spectrograms
- Interactive CLI: User-friendly command-line interface
Supported Audio Formats
- WAV (.wav)
- MP3 (.mp3)
- FLAC (.flac)
- OGG (.ogg)
- M4A (.m4a)
- AAC (.aac)
Installation
Option 1: Install from PyPI (Recommended)
pip install ai-audio-detector
Option 2: Install from Source
- Clone the repository:
git clone https://github.com/ajprice16/AI_Audio_Detection.git
cd AI_Audio_Detection
- Install dependencies:
pip install -r requirements.txt
System Dependencies
On Ubuntu/Debian:
sudo apt-get install libsndfile1 ffmpeg
On macOS:
brew install libsndfile ffmpeg
Quick Start
Training Initial Models
-
Prepare your data: Organize your audio files into two directories:
human_audio/- Human-generated audio filesai_audio/- AI-generated audio files
-
Run the detector:
If installed from PyPI:
ai-audio-detector --interactive
# or
ai-audio-detector --predict-file path/to/audio.wav
If running from source:
python -m ai_audio_detector --interactive
# or
python -m ai_audio_detector --predict-file path/to/audio.wav
- Choose option 1 to train new models and follow the prompts.
Command Line Usage
Train models:
ai-audio-detector --train --human-dir path/to/human/audio --ai-dir path/to/ai/audio
Predict single file:
ai-audio-detector --predict-file path/to/audio.wav
Predict batch:
ai-audio-detector --predict-batch path/to/audio/directory
Interactive mode:
ai-audio-detector --interactive
Predicting Single Files
Interactive mode:
ai-audio-detector --interactive
# Choose option 2 and enter the path to your audio file
Direct command:
ai-audio-detector --predict-file path/to/audio.wav
Batch Prediction
Interactive mode:
ai-audio-detector --interactive
# Choose option 3 and enter the directory path
Direct command:
ai-audio-detector --predict-batch path/to/audio/directory
Advanced Usage
Programmatic Usage
from ai_audio_detector import AIAudioDetector
from pathlib import Path
# Initialize detector
detector = AIAudioDetector(base_dir=Path.cwd())
# Train models
human_features = detector.extract_features_from_directory("human_audio/", is_ai_directory=False)
ai_features = detector.extract_features_from_directory("ai_audio/", is_ai_directory=True)
all_features = human_features + ai_features
df_results = pd.DataFrame(all_features)
training_results = detector.train_models(df_results)
# Make predictions
result = detector.predict_file("test_audio.wav")
print(f"Prediction: {'AI' if result['is_ai'] else 'Human'}")
print(f"Confidence: {result['confidence']:.3f}")
Adaptive Learning
The system supports adaptive learning to improve accuracy with new data:
# Add new AI data
detector.add_ai_data("new_ai_audio/", retrain_batch_models=True)
# Add new human data
detector.add_human_data("new_human_audio/", retrain_batch_models=True)
# Add mixed data batch
directories = [
{'path': 'dataset1/', 'is_ai': True},
{'path': 'dataset2/', 'is_ai': False}
]
detector.add_mixed_data_batch(directories, retrain_batch_models=True)
Features Extracted
Benford's Law Features
- Chi-square test statistics
- Kolmogorov-Smirnov test statistics
- Mean absolute deviation from expected distribution
- Maximum deviation
- Entropy measures
Spectral Features
- Spectral centroid, bandwidth, rolloff
- MFCCs (13 coefficients + standard deviations)
- Chroma features
- Spectral contrast
- Zero crossing rate
Temporal Features
- RMS energy (mean and standard deviation)
- Tempo estimation
- Spectral flatness
- Dynamic range
- Peak-to-RMS ratio
Compression Features
- Estimated bit depth
- Clipping detection
- DC offset
- High frequency content ratio
Model Architecture
The system uses an ensemble of four different models:
-
Incremental Models (for adaptive learning):
- SGD Classifier with log loss
- Passive Aggressive Classifier
-
Batch Models (for maximum accuracy):
- Random Forest (200 estimators)
- Gradient Boosting (200 estimators)
All features are standardized using StandardScaler, and final predictions use ensemble averaging.
Configuration
Modify config.yaml to customize:
- Model parameters
- Feature extraction settings
- Processing options
- Output directories
Command Line Options
- Train new models - Initial training from audio directories
- Predict single file - Analyze one audio file
- Predict batch - Analyze all files in a directory
- Update models - Adaptive learning with new data
- Add AI data - Add new AI samples to existing models
- Add Human data - Add new human samples to existing models
- Batch directories - Add multiple directories at once
- Training history - View model training history
- Data balance - Check AI vs Human data balance
- Create visualizations - Generate analysis plots
- Generate spectrograms - Create spectrograms for audio files
- Spectrogram comparison - Compare AI vs Human spectrograms
Output Files
models/ai_audio_detector.joblib- Trained models and metadatatraining_results.csv- Detailed training data and featuresai_detection_analysis.png- Visualization plotsspectrograms/- Generated spectrogram imagesspectrogram_comparisons/- Side-by-side comparisons
Performance Considerations
- Multiprocessing: Automatically used for batches > 3 files
- Memory Management: Spectrograms are generated efficiently with proper cleanup
- Scalability: Incremental learning allows handling large datasets over time
Requirements
- Python 3.7+
- librosa (audio processing)
- scikit-learn (machine learning)
- pandas, numpy (data manipulation)
- matplotlib (visualization)
- scipy (statistical tests)
Contributing
- Fork the repository
- Create a feature branch
- Make your changes
- Add tests if applicable
- Submit a pull request
License
This project is licensed under the MIT License - see the LICENSE file for details.
Acknowledgments
- Uses Benford's Law for detecting artificial patterns in audio
- Built on librosa for robust audio feature extraction
- Employs scikit-learn for machine learning capabilities
Citation
If you use this work in your research, please cite:
@software{ai_audio_detector,
title={AI Audio Detector: Machine Learning System for Detecting AI-Generated Audio},
author={Alex Price},
year={2025},
url={https://github.com/yourusername/ai-audio-detector}
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_audio_detector-1.1.0.tar.gz.
File metadata
- Download URL: ai_audio_detector-1.1.0.tar.gz
- Upload date:
- Size: 35.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
be9bf784b0030a6db23d659eb828f71aa46fcafcfda7401c8fb45079c291700a
|
|
| MD5 |
404e37e2bf19699ad778fe7bf54b157d
|
|
| BLAKE2b-256 |
03289d0c2d3d02438e73b8f58193b18b02ac2b321501d944eeb557da09e7f3ce
|
File details
Details for the file ai_audio_detector-1.1.0-py3-none-any.whl.
File metadata
- Download URL: ai_audio_detector-1.1.0-py3-none-any.whl
- Upload date:
- Size: 34.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.9.23
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5cca6a3005a3618b5f2b50338e2fdcf15e7b3c1d1c898bc91ce201cd96135598
|
|
| MD5 |
172d33b00598346a258765f58f84d4d2
|
|
| BLAKE2b-256 |
fdb8764da42b101f30737518a5a2467b660e8eecb19168ad7c19e46acb7431ad
|