Skip to main content

Offline-first AI content detection and humanization engine for LaTeX/Markdown documents

Project description

LUCID

Linguistic Understanding, Classification, Identification & Defense — See through the text.

Offline-first AI content detection and humanization engine for LaTeX, Markdown, and plain text documents. Runs entirely on local hardware via Ollama for LLM inference and ONNX-optimized models for detection/evaluation.

Features

  • AI Detection — RoBERTa classifier + statistical features + ensemble scoring
  • Humanization — Ollama-powered paraphrasing with adversarial refinement loop
  • Semantic Evaluation — MiniLM embedding similarity, DeBERTa NLI entailment, BERTScore quality
  • Format-Preserving — LaTeX byte-position reconstruction, Markdown line-range replacement, plain text paragraph segmentation
  • Checkpoint/Resume — JSON checkpoints after each chunk, resume interrupted runs
  • Batch Processing — Process entire directories of documents

Requirements

  • Python 3.12+
  • Ollama running locally (for humanization)
  • 16GB RAM minimum (32GB recommended for quality profile)
  • macOS (Apple Silicon optimized) or Linux x86-64

Installation

# Clone and install with uv
git clone https://github.com/AetherForge/lucid.git
cd lucid
uv sync

# Verify installation
uv run lucid --version

First-Run Setup

# Guided setup — checks Ollama, downloads models
uv run lucid setup

# Setup for a specific profile
uv run lucid setup --profile quality

Quick Start

# Check model availability
uv run lucid models

# Download missing models
uv run lucid models --download

# Detect AI content in a document
uv run lucid detect paper.tex
uv run lucid detect paper.tex --output-format json

# Run full pipeline (detect → humanize → evaluate → reconstruct)
uv run lucid pipeline paper.tex -o paper_humanized.tex

# Humanize a document directly
uv run lucid humanize paper.tex -o paper_humanized.tex

# Process a directory of documents
uv run lucid detect ./papers/

# View current configuration
uv run lucid config

CLI Reference

lucid [OPTIONS] COMMAND [ARGS]...

Global Options:
  --profile [fast|balanced|quality]  Quality profile
  --config PATH                      Custom config TOML file
  -v, --verbose                      Verbose output
  -q, --quiet                        Suppress all output
  --version                          Show version

Commands:
  detect     Detect AI-generated content in a document
  humanize   Humanize AI-detected content in a document
  pipeline   Full detect → humanize → validate pipeline
  config     View or modify configuration
  models     Check or download required models
  setup      First-run setup: check Ollama, download models

detect

lucid detect <INPUT> [OPTIONS]
  --output-format [json|text]   Report format (default: text)
  --threshold FLOAT             Detection threshold override
  -o, --output PATH             Write report to file

humanize

lucid humanize <INPUT> [OPTIONS]
  -o, --output PATH                  Output file path
  --model TEXT                       Override Ollama model tag
  --adversarial / --no-adversarial   Enable adversarial loop (default: on)

pipeline

lucid pipeline <INPUT> [OPTIONS]
  -o, --output PATH                  Output file path
  --report PATH                      Write report file
  --output-format [json|text|annotated]  Report format (default: json)
  --resume / --no-resume             Resume from checkpoint (default: on)
  --checkpoint-dir PATH              Checkpoint directory

setup

lucid setup [OPTIONS]
  --profile [fast|balanced|quality]   Profile to set up (default: balanced)

Configuration

LUCID uses TOML configuration with three built-in profiles:

Profile Model Size Speed Quality Use Case
fast 3B Fastest Good Quick passes, drafts
balanced 7B Moderate Better Default for most documents
quality 14B+ Slow Best Final submissions
# View config
uv run lucid config

# Override settings
uv run lucid config --set detection.use_binoculars true

Configuration files: config/default.toml, config/profiles/.

Model Recommendations

Profile Default Model Size RAM Required License
fast phi3:3.8b 2.4GB 8GB MIT
balanced qwen2.5:7b 4.5GB 12GB Apache 2.0
quality llama3.1:8b 4.9GB 16GB Meta Community

Profile Comparison

Feature fast balanced quality
Statistical detection No Yes Yes
Binoculars (Tier 3) No No Yes
Adversarial iterations 1 3 5
LaTeX validation No Yes Yes
Embedding threshold 0.75 0.80 0.85
BERTScore threshold 0.82 0.88 0.90

Web UI

LUCID includes an optional Gradio web interface for browser-based detection and humanization.

# Install web extras
uv sync --extra web

# Launch web UI
uv run lucid-web

The web UI provides two tabs: Detect (upload and analyze documents) and Full Pipeline (detect, humanize, and download results).

Architecture

Input Document
    │
    ▼
┌─────────┐     ┌──────────┐     ┌────────────┐     ┌───────────┐     ┌──────────────┐
│  Parser  │────▶│ Detector │────▶│ Humanizer  │────▶│ Evaluator │────▶│Reconstructor │
│          │     │          │     │            │     │           │     │              │
│ LaTeX    │     │ RoBERTa  │     │ Ollama LLM │     │ MiniLM    │     │ Position-    │
│ Markdown │     │ Stats    │     │ Adversarial│     │ DeBERTa   │     │ based        │
│ Plain    │     │ Ensemble │     │ Loop       │     │ BERTScore │     │ Replacement  │
└─────────┘     └──────────┘     └────────────┘     └───────────┘     └──────────────┘
    │                                                                         │
    └─────────────── Checkpoint after each chunk ─────────────────────────────┘

Project Structure

src/lucid/
├── cli.py              # Click CLI interface
├── pipeline.py         # Pipeline orchestrator
├── checkpoint.py       # Checkpoint/resume system
├── progress.py         # Rich progress reporting
├── output.py           # Output formatting (JSON, text, annotated)
├── config.py           # TOML config with profile merging
├── parser/             # Document parsers (LaTeX, Markdown, plain text)
├── detector/           # AI detection (RoBERTa, statistical, ensemble)
├── humanizer/          # Ollama paraphrasing with adversarial refinement
├── evaluator/          # Semantic evaluation (embedding, NLI, BERTScore)
├── reconstructor/      # Format-preserving document reconstruction
└── models/
    ├── manager.py      # Model lifecycle management
    ├── download.py     # Model availability checker and downloader
    └── results.py      # Result dataclasses

Benchmarks

Metric Target
Detection TPR (AI text) >85% at 5% FPR
Evasion rate (single-pass) >70%
Evasion rate (adversarial) >85%
Semantic similarity >0.85 embedding, >0.88 BERTScore

Run benchmarks: uv run pytest tests/benchmarks/ -m benchmark -v

Full results: docs/benchmarks/

Development

# Install with dev dependencies
uv sync --extra dev

# Run unit tests
uv run pytest

# Run integration tests
uv run pytest -m integration

# Run all tests
uv run pytest -m ""

# Lint
uv run ruff check src/ tests/

# Run example scripts
uv run python examples/detect_latex.py tests/corpus/latex/simple.tex
uv run python examples/full_pipeline.py tests/corpus/markdown/simple.md

# Type check
uv run mypy src/lucid/

License

MIT — See LICENSE for details.

See RESPONSIBLE_USE.md for the ethical framework and responsible use policy.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lucid_ai-0.1.0.tar.gz (329.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lucid_ai-0.1.0-py3-none-any.whl (82.7 kB view details)

Uploaded Python 3

File details

Details for the file lucid_ai-0.1.0.tar.gz.

File metadata

  • Download URL: lucid_ai-0.1.0.tar.gz
  • Upload date:
  • Size: 329.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for lucid_ai-0.1.0.tar.gz
Algorithm Hash digest
SHA256 75c31d4b68bc3e97190b1a9fb24a939e44fa4c01363d8f429008d31285dd7ef6
MD5 e06fcbf3a1601b584ab1785b0552daf2
BLAKE2b-256 f2ea85a7b9751299cf8dd876c8bfadbb7364de5145f4b85c1c9ff708d22cb1f4

See more details on using hashes here.

File details

Details for the file lucid_ai-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: lucid_ai-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 82.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for lucid_ai-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0898ea89b4a3f0255ee5cc4a74360feab20fc839c7070da21bc39b6217e046cf
MD5 fa8af7e2cb68e084355c3867fc4b7da4
BLAKE2b-256 0321760c6e0db951fcdda6a65736ed7cad99426f64772da25ce1f903d0e4c353

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page