Skip to main content

Transparent, probabilistic classification of text as human-generated or LLM-generated

Project description

llm-detector

Research WIP: Transparent, probabilistic classification of text as human-generated or LLM-generated.

Installation

pip install llm-detector

Quick Start

from llm_detector import classify_text

# Simple classification
result = classify_text("Your text here")
print(f"LLM probability: {result['p_llm']:.2%}")
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")

Advanced Usage

Using the Runtime API

from llm_detector import DetectorRuntime
from llm_detector.assets import default_artifacts

# Initialize detector with default models
with default_artifacts() as (model_path, baseline_path):
    detector = DetectorRuntime(
        model_path=model_path,
        baseline_path=baseline_path
    )

    # Single text classification
    result = detector.predict("This is a sample text.")
    print(f"LLM: {result.p_llm:.2%}, Human: {result.p_human:.2%}")

    # Access detailed metrics
    print(f"Confidence: {result.confidence:.4f}")
    print(f"Document metrics:", result.details['document_metrics'])

Detailed Results with Diagnostics

from llm_detector import classify_text

result = classify_text(
    "Your text here",
    include_diagnostics=True
)

# Access classification
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")
print(f"LLM probability: {result['p_llm']:.4f}")
print(f"Confidence: {result['confidence']:.4f}")

# Diagnostic metrics for analysis
if 'diagnostics' in result:
    diag = result['diagnostics']
    print(f"Simple mean: {diag.get('simple_mean', 0):.4f}")
    print(f"Max score: {diag.get('max_score', 0):.4f}")

CLI Usage

# Classify text from command line
llm-detector --text "Your text here"

# Classify from file
llm-detector --file input.txt

# Get detailed output with diagnostics
llm-detector --text "Your text" --show-diagnostics --json

Research Notes

This is an active research project exploring transparent statistical methods for LLM detection. The approach combines:

  • Statistical features: Lexical diversity, punctuation patterns, repetition metrics
  • Tokenizer divergence: Cross-tokenizer efficiency and consistency metrics
  • Ensemble aggregation: Logit-weighted mean with diagnostic fallbacks

Current limitations:

  • Performance varies by text length (best with 3+ sentences)
  • Optimized for general English text
  • Continuous model updates as LLM capabilities evolve

Development

# Install with development dependencies
pip install -e ".[dev,training]"

# Run tests
pytest

# Train custom models (requires training extras)
python -m llm_detector.training.cli --help

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_detector-0.1.0.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_detector-0.1.0-py3-none-any.whl (6.4 MB view details)

Uploaded Python 3

File details

Details for the file llm_detector-0.1.0.tar.gz.

File metadata

  • Download URL: llm_detector-0.1.0.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for llm_detector-0.1.0.tar.gz
Algorithm Hash digest
SHA256 0423b49732cb28a4f9d6cab18eff0b77550753c3d6370b3ad7f68d9a3af5250e
MD5 770b7142a49ef0a455fea67b122fe8e2
BLAKE2b-256 c5da2ad7b56687a0fb8152fd3a43060067284c503f12da1625a89d048e8423c3

See more details on using hashes here.

File details

Details for the file llm_detector-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_detector-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f2461924c9fdc04e9f671a1e72f66608d6a193deeaf4e219d2c1ceb446be77f7
MD5 990dbe6d60527c06886300f35fc81808
BLAKE2b-256 08f6d7ce3a6ffae96ed63c12e587881ae3e10a7f8e74c5e67b84fabc63985aa7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page