Skip to main content

An English AI-generated text detection library

Project description

AI Text Detector

An English AI-generated text detection library using statistical features and machine learning.

Overview

ai-text-detector is a Python library that detects whether English text is likely AI-generated or human-written. It uses 60 statistical features and an XGBoost classifier trained on the RAID benchmark dataset.

Key Features:

  • English text detection
  • Fast and lightweight detection
  • Pre-trained model included
  • Simple, intuitive API
  • Batch processing support

Installation

pip install ai-text-detector

Requirements

  • Python >= 3.7
  • numpy
  • scipy
  • xgboost
  • scikit-learn

Quick Start

from ai_detector import AITextDetector

# Initialize detector
detector = AITextDetector()

# Detect AI-generated text
text = """
Artificial intelligence is transforming the world in unprecedented ways.
Machine learning algorithms are becoming increasingly sophisticated,
enabling computers to perform tasks that were once thought to be
exclusively human.
"""

result = detector.detect(text)

print(f"AI Probability: {result['ai_probability']:.2%}")
print(f"Label: {result['label']}")
print(f"Confidence: {result['confidence']}")

Output:

AI Probability: 85.32%
Label: AI
Confidence: high

Usage

Basic Detection

from ai_detector import AITextDetector

detector = AITextDetector()
result = detector.detect("Your English text here...")

if result['is_ai']:
    print(f"This text is likely AI-generated ({result['ai_probability']:.2%})")
else:
    print(f"This text is likely human-written ({result['ai_probability']:.2%})")

Getting AI Score Only

detector = AITextDetector()
ai_score = detector.get_ai_score("Your English text here...")
print(f"AI probability: {ai_score:.2%}")

Boolean Classification

detector = AITextDetector()
if detector.is_ai_generated("Your English text here..."):
    print("This is AI-generated text")

Batch Processing

detector = AITextDetector()
texts = ["Text 1...", "Text 2...", "Text 3..."]
results = detector.detect_batch(texts)

for text, result in zip(texts, results):
    print(f"{text[:50]}... -> {result['label']} ({result['ai_probability']:.2%})")

Custom Parameters

# Adjust decision threshold (default: 0.5)
detector = AITextDetector(threshold=0.7)

# Set minimum character length (default: 100)
detector = AITextDetector(min_chars=50)

# Both parameters
detector = AITextDetector(threshold=0.7, min_chars=50)

Result Format

The detect() method returns a dictionary with the following fields:

Field Type Description
ai_probability float Probability (0-1) that text is AI-generated
is_ai bool True if classified as AI (based on threshold)
confidence str 'high', 'medium', or 'low' confidence
label str 'AI' or 'Human' classification
warning str (optional) Warning if text is too short

Confidence Levels

  • high: Probability differs from threshold by > 0.3
  • medium: Probability differs from threshold by 0.15-0.3
  • low: Probability differs from threshold by < 0.15

Text Length Recommendations

For best results, use English text with at least 100 characters. Shorter texts will return a warning and low confidence.

Model Information

Get information about the loaded model:

detector = AITextDetector()
info = detector.get_model_info()
print(f"Model version: {info['version']}")
print(f"Model AUC: {info['auc']:.4f}")
print(f"Model accuracy: {info['accuracy']:.4f}")

Model Details

  • Features: 60 statistical features (compression ratio, entropy, burstiness, etc.)
  • Training Data: RAID benchmark (English text from multiple domains)
  • Algorithm: XGBoost classifier
  • Language Support: English text only

Limitations

  1. English Only: This model is trained on English text and works best with English content
  2. Text Length: Short texts (< 100 characters) have unreliable results
  3. Domain Specific: Model trained on general text; specialized domains may vary
  4. Evolving AI: As AI models improve, detection accuracy may decrease

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is licensed under the MIT License.

Citation

If you use this library in your research, please cite:

@software{ai_text_detector,
  title={AI Text Detector: English AI-Generated Text Detection},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/ai-text-detector}
}

References

This model is based on research using the RAID benchmark:

  • RAID: A Benchmark for AI-Generated Text Detection (ACL 2024)

Support

For issues, questions, or contributions, please visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_text_detector-1.0.1.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_text_detector-1.0.1-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file ai_text_detector-1.0.1.tar.gz.

File metadata

  • Download URL: ai_text_detector-1.0.1.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for ai_text_detector-1.0.1.tar.gz
Algorithm Hash digest
SHA256 edf4d54dae182b895d5733a4dcb419c5f112f17082d7099459db1ba5ea2958df
MD5 25b93949535f0d6dc1e11eed2a116a7f
BLAKE2b-256 d9ea664a6f5039fa8523468c796e2ca3b480de5703af4a3c129ab86dfac5ae84

See more details on using hashes here.

File details

Details for the file ai_text_detector-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_text_detector-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3c47999489c0e62af75fb8e2f7935448071d1ae588c9306212d20e79d5bf5f1e
MD5 e962006fbda68b1768cddfdeba7df85f
BLAKE2b-256 37477c7f4f343602e57ba649d6a483c443a2fa58f0417a6765f081fcc4195b87

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page