An English AI-generated text detection library
Project description
AI Text Detector
An English AI-generated text detection library using statistical features and machine learning.
Overview
ai-text-detector is a Python library that detects whether English text is likely AI-generated or human-written. It uses 60 statistical features and an XGBoost classifier trained on the RAID benchmark dataset.
Key Features:
- English text detection
- Fast and lightweight detection
- Pre-trained model included
- Simple, intuitive API
- Batch processing support
Installation
pip install ai-text-detector
Requirements
- Python >= 3.7
- numpy
- scipy
- xgboost
- scikit-learn
Quick Start
from ai_detector import AITextDetector
# Initialize detector
detector = AITextDetector()
# Detect AI-generated text
text = """
Artificial intelligence is transforming the world in unprecedented ways.
Machine learning algorithms are becoming increasingly sophisticated,
enabling computers to perform tasks that were once thought to be
exclusively human.
"""
result = detector.detect(text)
print(f"AI Probability: {result['ai_probability']:.2%}")
print(f"Label: {result['label']}")
print(f"Confidence: {result['confidence']}")
Output:
AI Probability: 85.32%
Label: AI
Confidence: high
Usage
Basic Detection
from ai_detector import AITextDetector
detector = AITextDetector()
result = detector.detect("Your English text here...")
if result['is_ai']:
print(f"This text is likely AI-generated ({result['ai_probability']:.2%})")
else:
print(f"This text is likely human-written ({result['ai_probability']:.2%})")
Getting AI Score Only
detector = AITextDetector()
ai_score = detector.get_ai_score("Your English text here...")
print(f"AI probability: {ai_score:.2%}")
Boolean Classification
detector = AITextDetector()
if detector.is_ai_generated("Your English text here..."):
print("This is AI-generated text")
Batch Processing
detector = AITextDetector()
texts = ["Text 1...", "Text 2...", "Text 3..."]
results = detector.detect_batch(texts)
for text, result in zip(texts, results):
print(f"{text[:50]}... -> {result['label']} ({result['ai_probability']:.2%})")
Custom Parameters
# Adjust decision threshold (default: 0.5)
detector = AITextDetector(threshold=0.7)
# Set minimum character length (default: 100)
detector = AITextDetector(min_chars=50)
# Both parameters
detector = AITextDetector(threshold=0.7, min_chars=50)
Result Format
The detect() method returns a dictionary with the following fields:
| Field | Type | Description |
|---|---|---|
ai_probability |
float | Probability (0-1) that text is AI-generated |
is_ai |
bool | True if classified as AI (based on threshold) |
confidence |
str | 'high', 'medium', or 'low' confidence |
label |
str | 'AI' or 'Human' classification |
warning |
str (optional) | Warning if text is too short |
Confidence Levels
- high: Probability differs from threshold by > 0.3
- medium: Probability differs from threshold by 0.15-0.3
- low: Probability differs from threshold by < 0.15
Text Length Recommendations
For best results, use English text with at least 100 characters. Shorter texts will return a warning and low confidence.
Model Information
Get information about the loaded model:
detector = AITextDetector()
info = detector.get_model_info()
print(f"Model version: {info['version']}")
print(f"Model AUC: {info['auc']:.4f}")
print(f"Model accuracy: {info['accuracy']:.4f}")
Model Details
- Features: 60 statistical features (compression ratio, entropy, burstiness, etc.)
- Training Data: RAID benchmark (English text from multiple domains)
- Algorithm: XGBoost classifier
- Language Support: English text only
Limitations
- English Only: This model is trained on English text and works best with English content
- Text Length: Short texts (< 100 characters) have unreliable results
- Domain Specific: Model trained on general text; specialized domains may vary
- Evolving AI: As AI models improve, detection accuracy may decrease
Contributing
Contributions are welcome! Please feel free to submit issues or pull requests.
License
This project is licensed under the MIT License.
Citation
If you use this library in your research, please cite:
@software{ai_text_detector,
title={AI Text Detector: English AI-Generated Text Detection},
author={Your Name},
year={2025},
url={https://github.com/yourusername/ai-text-detector}
}
References
This model is based on research using the RAID benchmark:
- RAID: A Benchmark for AI-Generated Text Detection (ACL 2024)
Support
For issues, questions, or contributions, please visit the GitHub repository.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ai_text_detector-1.0.1.tar.gz.
File metadata
- Download URL: ai_text_detector-1.0.1.tar.gz
- Upload date:
- Size: 3.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
edf4d54dae182b895d5733a4dcb419c5f112f17082d7099459db1ba5ea2958df
|
|
| MD5 |
25b93949535f0d6dc1e11eed2a116a7f
|
|
| BLAKE2b-256 |
d9ea664a6f5039fa8523468c796e2ca3b480de5703af4a3c129ab86dfac5ae84
|
File details
Details for the file ai_text_detector-1.0.1-py3-none-any.whl.
File metadata
- Download URL: ai_text_detector-1.0.1-py3-none-any.whl
- Upload date:
- Size: 3.1 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3c47999489c0e62af75fb8e2f7935448071d1ae588c9306212d20e79d5bf5f1e
|
|
| MD5 |
e962006fbda68b1768cddfdeba7df85f
|
|
| BLAKE2b-256 |
37477c7f4f343602e57ba649d6a483c443a2fa58f0417a6765f081fcc4195b87
|