Skip to main content

Transparent, probabilistic classification of text as human-generated or LLM-generated

Project description

llm-detector

Research WIP: Transparent, probabilistic classification of text as human-generated or LLM-generated.

Installation

pip install llm-detector

Quick Start

from llm_detector import classify_text

# Simple classification
result = classify_text("Your text here")
print(f"LLM probability: {result['p_llm']:.2%}")
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")

Advanced Usage

Using the Runtime API

from llm_detector import DetectorRuntime
from llm_detector.assets import default_artifacts

# Initialize detector with default models
with default_artifacts() as (model_path, baseline_path):
    detector = DetectorRuntime(
        model_path=model_path,
        baseline_path=baseline_path
    )

    # Single text classification
    result = detector.predict("This is a sample text.")
    print(f"LLM: {result.p_llm:.2%}, Human: {result.p_human:.2%}")

    # Access detailed metrics
    print(f"Confidence: {result.confidence:.4f}")
    print(f"Document metrics:", result.details['document_metrics'])

Detailed Results with Diagnostics

from llm_detector import classify_text

result = classify_text(
    "Your text here",
    include_diagnostics=True
)

# Access classification
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")
print(f"LLM probability: {result['p_llm']:.4f}")
print(f"Confidence: {result['confidence']:.4f}")

# Diagnostic metrics for analysis
if 'diagnostics' in result:
    diag = result['diagnostics']
    print(f"Simple mean: {diag.get('simple_mean', 0):.4f}")
    print(f"Max score: {diag.get('max_score', 0):.4f}")

CLI Usage

# Classify text from command line
llm-detector --text "Your text here"

# Classify from file
llm-detector --file input.txt

# Get detailed output with diagnostics
llm-detector --text "Your text" --show-diagnostics --json

Research Notes

This is an active research project exploring transparent statistical methods for LLM detection. The approach combines:

  • Statistical features: Lexical diversity, punctuation patterns, repetition metrics
  • Tokenizer divergence: Cross-tokenizer efficiency and consistency metrics
  • Ensemble aggregation: Logit-weighted mean with diagnostic fallbacks

Current limitations:

  • Performance varies by text length (best with 3+ sentences)
  • Optimized for general English text
  • Continuous model updates as LLM capabilities evolve

Development

# Install with development dependencies
pip install -e ".[dev,training]"

# Run tests
pytest

# Train custom models (requires training extras)
python -m llm_detector.training.cli --help

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_detector-0.2.0.tar.gz (6.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_detector-0.2.0-py3-none-any.whl (6.6 MB view details)

Uploaded Python 3

File details

Details for the file llm_detector-0.2.0.tar.gz.

File metadata

  • Download URL: llm_detector-0.2.0.tar.gz
  • Upload date:
  • Size: 6.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for llm_detector-0.2.0.tar.gz
Algorithm Hash digest
SHA256 1b18dc81ee1af3f6302ca876a131b5501644d88036fc472adad0cd17cd3addd9
MD5 6f1c5204af801ea6b11464f95762ec69
BLAKE2b-256 cabbf096f4f907b1efd0f774fb65f5fd1bee30f18d3b1fc8db89fd066523eb78

See more details on using hashes here.

File details

Details for the file llm_detector-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_detector-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 f43644c62be888d64ea238de44df6dce220f376cddca2ace879baf3044404673
MD5 11f09f8c5d306492cb8eba00afa9c427
BLAKE2b-256 c1e84a807c6ba4fa455fb95ad9d77bd31f225a4a48fbe4b714ec5cd2e9a096e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page