Transparent, probabilistic classification of text as human-generated or LLM-generated
Project description
llm-detector
Research WIP: Transparent, probabilistic classification of text as human-generated or LLM-generated.
Installation
pip install llm-detector
Quick Start
from llm_detector import classify_text
# Simple classification
result = classify_text("Your text here")
print(f"LLM probability: {result['p_llm']:.2%}")
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")
Advanced Usage
Using the Runtime API
from llm_detector import DetectorRuntime
from llm_detector.assets import default_artifacts
# Initialize detector with default models
with default_artifacts() as (model_path, baseline_path):
detector = DetectorRuntime(
model_path=model_path,
baseline_path=baseline_path
)
# Single text classification
result = detector.predict("This is a sample text.")
print(f"LLM: {result.p_llm:.2%}, Human: {result.p_human:.2%}")
# Access detailed metrics
print(f"Confidence: {result.confidence:.4f}")
print(f"Document metrics:", result.details['document_metrics'])
Detailed Results with Diagnostics
from llm_detector import classify_text
result = classify_text(
"Your text here",
include_diagnostics=True
)
# Access classification
print(f"Classification: {'LLM' if result['is_llm'] else 'Human'}")
print(f"LLM probability: {result['p_llm']:.4f}")
print(f"Confidence: {result['confidence']:.4f}")
# Diagnostic metrics for analysis
if 'diagnostics' in result:
diag = result['diagnostics']
print(f"Simple mean: {diag.get('simple_mean', 0):.4f}")
print(f"Max score: {diag.get('max_score', 0):.4f}")
CLI Usage
# Classify text from command line
llm-detector --text "Your text here"
# Classify from file
llm-detector --file input.txt
# Get detailed output with diagnostics
llm-detector --text "Your text" --show-diagnostics --json
Research Notes
This is an active research project exploring transparent statistical methods for LLM detection. The approach combines:
- Statistical features: Lexical diversity, punctuation patterns, repetition metrics
- Tokenizer divergence: Cross-tokenizer efficiency and consistency metrics
- Ensemble aggregation: Logit-weighted mean with diagnostic fallbacks
Current limitations:
- Performance varies by text length (best with 3+ sentences)
- Optimized for general English text
- Continuous model updates as LLM capabilities evolve
Development
# Install with development dependencies
pip install -e ".[dev,training]"
# Run tests
pytest
# Train custom models (requires training extras)
python -m llm_detector.training.cli --help
License
MIT
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_detector-0.1.0.tar.gz.
File metadata
- Download URL: llm_detector-0.1.0.tar.gz
- Upload date:
- Size: 6.3 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
0423b49732cb28a4f9d6cab18eff0b77550753c3d6370b3ad7f68d9a3af5250e
|
|
| MD5 |
770b7142a49ef0a455fea67b122fe8e2
|
|
| BLAKE2b-256 |
c5da2ad7b56687a0fb8152fd3a43060067284c503f12da1625a89d048e8423c3
|
File details
Details for the file llm_detector-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_detector-0.1.0-py3-none-any.whl
- Upload date:
- Size: 6.4 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f2461924c9fdc04e9f671a1e72f66608d6a193deeaf4e219d2c1ceb446be77f7
|
|
| MD5 |
990dbe6d60527c06886300f35fc81808
|
|
| BLAKE2b-256 |
08f6d7ce3a6ffae96ed63c12e587881ae3e10a7f8e74c5e67b84fabc63985aa7
|