Skip to main content

A multilingual AI-generated text detection library

Project description

AI Text Detector

An English AI-generated text detection library using statistical features and machine learning.

Overview

ai-text-detector is a Python library that detects whether English text is likely AI-generated or human-written. It uses 60 statistical features and an XGBoost classifier trained on the RAID benchmark dataset.

Key Features:

  • English text detection
  • Fast and lightweight detection
  • Pre-trained model included
  • Simple, intuitive API
  • Batch processing support

Installation

pip install ai-text-detector

Requirements

  • Python >= 3.7
  • numpy
  • scipy
  • xgboost
  • scikit-learn

Quick Start

from ai_detector import AITextDetector

# Initialize detector
detector = AITextDetector()

# Detect AI-generated text
text = """
Artificial intelligence is transforming the world in unprecedented ways.
Machine learning algorithms are becoming increasingly sophisticated,
enabling computers to perform tasks that were once thought to be
exclusively human.
"""

result = detector.detect(text)

print(f"AI Probability: {result['ai_probability']:.2%}")
print(f"Label: {result['label']}")
print(f"Confidence: {result['confidence']}")

Output:

AI Probability: 85.32%
Label: AI
Confidence: high

Usage

Basic Detection

from ai_detector import AITextDetector

detector = AITextDetector()
result = detector.detect("Your English text here...")

if result['is_ai']:
    print(f"This text is likely AI-generated ({result['ai_probability']:.2%})")
else:
    print(f"This text is likely human-written ({result['ai_probability']:.2%})")

Getting AI Score Only

detector = AITextDetector()
ai_score = detector.get_ai_score("Your English text here...")
print(f"AI probability: {ai_score:.2%}")

Boolean Classification

detector = AITextDetector()
if detector.is_ai_generated("Your English text here..."):
    print("This is AI-generated text")

Batch Processing

detector = AITextDetector()
texts = ["Text 1...", "Text 2...", "Text 3..."]
results = detector.detect_batch(texts)

for text, result in zip(texts, results):
    print(f"{text[:50]}... -> {result['label']} ({result['ai_probability']:.2%})")

Custom Parameters

# Adjust decision threshold (default: 0.5)
detector = AITextDetector(threshold=0.7)

# Set minimum character length (default: 100)
detector = AITextDetector(min_chars=50)

# Both parameters
detector = AITextDetector(threshold=0.7, min_chars=50)

Result Format

The detect() method returns a dictionary with the following fields:

Field Type Description
ai_probability float Probability (0-1) that text is AI-generated
is_ai bool True if classified as AI (based on threshold)
confidence str 'high', 'medium', or 'low' confidence
label str 'AI' or 'Human' classification
warning str (optional) Warning if text is too short

Confidence Levels

  • high: Probability differs from threshold by > 0.3
  • medium: Probability differs from threshold by 0.15-0.3
  • low: Probability differs from threshold by < 0.15

Text Length Recommendations

For best results, use English text with at least 100 characters. Shorter texts will return a warning and low confidence.

Model Information

Get information about the loaded model:

detector = AITextDetector()
info = detector.get_model_info()
print(f"Model version: {info['version']}")
print(f"Model AUC: {info['auc']:.4f}")
print(f"Model accuracy: {info['accuracy']:.4f}")

Model Details

  • Features: 60 statistical features (compression ratio, entropy, burstiness, etc.)
  • Training Data: RAID benchmark (English text from multiple domains)
  • Algorithm: XGBoost classifier
  • Language Support: English text only

Limitations

  1. English Only: This model is trained on English text and works best with English content
  2. Text Length: Short texts (< 100 characters) have unreliable results
  3. Domain Specific: Model trained on general text; specialized domains may vary
  4. Evolving AI: As AI models improve, detection accuracy may decrease

Contributing

Contributions are welcome! Please feel free to submit issues or pull requests.

License

This project is licensed under the MIT License.

Citation

If you use this library in your research, please cite:

@software{ai_text_detector,
  title={AI Text Detector: English AI-Generated Text Detection},
  author={Your Name},
  year={2025},
  url={https://github.com/yourusername/ai-text-detector}
}

References

This model is based on research using the RAID benchmark:

  • RAID: A Benchmark for AI-Generated Text Detection (ACL 2024)

Support

For issues, questions, or contributions, please visit the GitHub repository.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ai_text_detector-1.0.0.tar.gz (3.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ai_text_detector-1.0.0-py3-none-any.whl (3.1 MB view details)

Uploaded Python 3

File details

Details for the file ai_text_detector-1.0.0.tar.gz.

File metadata

  • Download URL: ai_text_detector-1.0.0.tar.gz
  • Upload date:
  • Size: 3.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.11

File hashes

Hashes for ai_text_detector-1.0.0.tar.gz
Algorithm Hash digest
SHA256 136f44bfb53d28852acd39fb94eef229c3443cae292ec2d6aef600611f2f4ddb
MD5 814744a7986d4f8bd4917155cc13e9d6
BLAKE2b-256 63ed7e4ae1609cd86ff02097b61a98160e4733845b77f4d4bcb272d270e1952c

See more details on using hashes here.

File details

Details for the file ai_text_detector-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for ai_text_detector-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 dcc52d2dbcbdd879292e75c087c8ce87da6d1e35ebee6fd8ccabf7cd55c993ec
MD5 148c4a66f875bd806ca48746b8113c63
BLAKE2b-256 8bcdc55b11769180a9f341fda463e24ce07eca61f3f75cb460604d26ba61fb30

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page