Skip to main content

Transparent, probabilistic classification of text as human-generated or LLM-generated

Project description

llm-detector

Research WIP: Transparent, probabilistic classification of text as human-generated or LLM-generated.

Installation

pip install llm-detector

Quick Start

from llm_detector import classify_text

# Simple classification
result = classify_text("Your text here")
print(f"LLM probability: {result['p_llm']:.2%}")
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")

Advanced Usage

Using the Runtime API

from llm_detector import DetectorRuntime
from llm_detector.assets import default_artifacts

# Initialize detector with default models
with default_artifacts() as (model_path, baseline_path):
    detector = DetectorRuntime(
        model_path=model_path,
        baseline_path=baseline_path
    )

    # Single text classification
    result = detector.predict("This is a sample text.")
    print(f"LLM: {result.p_llm:.2%}, Human: {result.p_human:.2%}")

    # Access detailed metrics
    print(f"Confidence: {result.confidence:.4f}")
    print(f"Document metrics:", result.details['document_metrics'])

Detailed Results with Diagnostics

from llm_detector import classify_text

result = classify_text(
    "Your text here",
    include_diagnostics=True
)

# Access classification
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")
print(f"LLM probability: {result['p_llm']:.4f}")
print(f"Confidence: {result['confidence']:.4f}")

# Diagnostic metrics for analysis
if 'diagnostics' in result:
    diag = result['diagnostics']
    print(f"Simple mean: {diag.get('simple_mean', 0):.4f}")
    print(f"Max score: {diag.get('max_score', 0):.4f}")

CLI Usage

# Classify text from command line
llm-detector --text "Your text here"

# Classify from file
llm-detector --file input.txt

# Get detailed output with diagnostics
llm-detector --text "Your text" --show-diagnostics --json

Research Notes

This is an active research project exploring transparent statistical methods for LLM detection. The approach combines:

  • Statistical features: Lexical diversity, punctuation patterns, repetition metrics
  • Tokenizer divergence: Cross-tokenizer efficiency and consistency metrics
  • Ensemble aggregation: Logit-weighted mean with diagnostic fallbacks

Current limitations:

  • Performance varies by text length (best with 3+ sentences)
  • Optimized for general English text
  • Continuous model updates as LLM capabilities evolve

Development

# Install with development dependencies
pip install -e ".[dev,training]"

# Run tests
pytest

# Train custom models (requires training extras)
python -m llm_detector.training.cli --help

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_detector-0.1.1.tar.gz (6.3 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

llm_detector-0.1.1-py3-none-any.whl (6.4 MB view details)

Uploaded Python 3

File details

Details for the file llm_detector-0.1.1.tar.gz.

File metadata

  • Download URL: llm_detector-0.1.1.tar.gz
  • Upload date:
  • Size: 6.3 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.13

File hashes

Hashes for llm_detector-0.1.1.tar.gz
Algorithm Hash digest
SHA256 66cfbaa68ca2a9cf400648010894dd1c614604b9a611572c516651302414f5ef
MD5 40254bf3f6dd42cf1499590fe3dfc528
BLAKE2b-256 2e613243a95769c826e0a95ea520f47d571196cabff5749eebdeae07d5c31f73

See more details on using hashes here.

File details

Details for the file llm_detector-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for llm_detector-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 3f2a0b751669d33910ac70b71edfe9f8cca80703e069f724f9e3e7519c07ee5c
MD5 7bc7702e80a6e272b33dcd8b6df33330
BLAKE2b-256 22d4c29c68113ad96a0f2ea3a52181054bb4cae2f6caabb2542ebd3c2512496f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page