Algorithmic text humanization with AI detection, tone analysis, paraphrasing, and spinning — 20-stage pipeline, 14 languages, zero dependencies

These details have not been verified by PyPI

Project links

Project description

TextHumanize

The most advanced open-source text naturalization engine

Normalize style, improve readability, and ensure brand-safe content — offline, private, and blazing fast

42,375 lines of code · 75 Python modules · 17-stage pipeline · 14 languages + universal · 1,802 tests

Quick Start · Features · Documentation · Live Demo · License

TextHumanize is a pure-algorithmic text processing engine that normalizes style, improves readability, and removes mechanical patterns from text. No neural networks, no API keys, no internet — just 42K+ lines of finely tuned rules, dictionaries, and statistical methods.

Built-in toolkit: AI Detection · Paraphrasing · Tone Analysis · Watermark Cleaning · Content Spinning · Coherence Analysis · Readability Scoring · Stylistic Fingerprinting · Auto-Tuner · Perplexity Analysis · Plagiarism Detection · Async API · SSE Streaming

Platforms: Python (full) · TypeScript/JavaScript (core) · PHP (full)

Languages: 🇷🇺 RU · 🇺🇦 UK · 🇬🇧 EN · 🇩🇪 DE · 🇫🇷 FR · 🇪🇸 ES · 🇵🇱 PL · 🇧🇷 PT · 🇮🇹 IT · 🇸🇦 AR · 🇨🇳 ZH · 🇯🇵 JA · 🇰🇷 KO · 🇹🇷 TR · 🌍 any language via universal processor

Why TextHumanize?

Problem: Machine-generated text has uniform sentence lengths, bureaucratic vocabulary, formulaic connectors, and low stylistic diversity — reducing readability, engagement, and brand authenticity.

Solution: TextHumanize algorithmically normalizes text style while preserving meaning. Configurable intensity, deterministic output, full change reports. No cloud APIs, no rate limits, no data leaks.

Advantage	Details
30,000+ chars/sec	Process a full article in milliseconds, not seconds
100% private	All processing is local — your text never leaves your machine
Precise control	Intensity 0–100, 9 profiles, keyword preservation, max change ratio
14 languages	Full dictionaries for 14 languages; statistical processor for any other
Zero dependencies	Pure Python stdlib — no pip packages, no model downloads, starts in <100ms
Reproducible	Seed-based PRNG — same input + same seed = identical output
AI detection	13-metric ensemble + 35-feature statistical detector — no ML required
Enterprise-ready	Dual license, 1,802 tests, CI/CD, benchmarks, on-prem deployment

Comparison with Competitors

Criterion	TextHumanize	Online Humanizers	GPT/LLM Rewriting
Works offline	✅	❌	❌
Privacy	✅ Local only	❌ Third-party servers	❌ Cloud API
Speed	30K+ chars/sec	2–10 sec (network)	~500 chars/sec
Cost per 1M chars	$0	$10–50/month	$15–60 (GPT-4)
API key required	No	Yes	Yes
Deterministic	✅ Seed-based	❌	❌
Languages	14 + universal	1–3	10+ but expensive
Built-in AI detector	✅ 13 metrics	❌ or basic	❌
Max change control	✅ `max_change_ratio`	❌	❌ Unpredictable
Open source	✅	❌	❌
Self-hosted	✅	❌	❌

vs. Other Open-Source Libraries

Feature	TextHumanize	Typical Alternatives
Pipeline stages	17	2–4
Languages	14 + universal	1–2
AI detection	✅ 13 metrics + statistical ML	❌
Python tests	1,802	10–50
Codebase size	42,375 lines	500–2K
Platforms	Python + JS + PHP	Single
Plugin system	✅	❌
Tone analysis	✅ 7 levels	❌
REST API	✅ 12 endpoints	❌
Readability metrics	✅ 6 indices	0–1
Morphological engine	✅ 4 languages	❌

Installation

pip install texthumanize

From source:

git clone https://github.com/ksanyok/TextHumanize.git
cd TextHumanize && pip install -e .

PHP / TypeScript

# PHP
cd php/ && composer install

# TypeScript
cd js/ && npm install

Quick Start

from texthumanize import humanize, analyze, detect_ai, explain

# Humanize text
result = humanize("This text utilizes a comprehensive methodology for implementation.", lang="en")
print(result.text)       # → "This text uses a complete method for setup."
print(result.change_ratio)  # → 0.15
print(result.quality_score) # → 0.85

# With profile and intensity
result = humanize(text, lang="en", profile="web", intensity=70)

# AI Detection — 13-metric ensemble
ai = detect_ai("Text to check for AI generation.", lang="en")
print(f"AI: {ai['score']:.0%} | {ai['verdict']} | Confidence: {ai['confidence']:.0%}")

# Analyze text metrics
report = analyze("Text to analyze.", lang="en")
print(f"Artificiality: {report.artificiality_score:.1f}/100")

# Full change report
print(explain(result))

All Features at a Glance

from texthumanize import (
    humanize, humanize_batch, humanize_chunked, humanize_ai,
    detect_ai, detect_ai_batch, detect_ai_sentences, detect_ai_mixed,
    paraphrase, analyze_tone, adjust_tone,
    detect_watermarks, clean_watermarks,
    spin, spin_variants, analyze_coherence, full_readability,
    AutoTuner, BenchmarkSuite, STYLE_PRESETS,
)

# Paraphrasing
print(paraphrase("The system works efficiently.", lang="en"))

# Tone — 7-level formality scale
tone = analyze_tone("Please submit the documentation.", lang="en")
casual = adjust_tone("It is imperative to proceed.", target="casual", lang="en")

# Watermarks
clean = clean_watermarks("Te\u200bxt wi\u200bth hid\u200bden chars")

# Spinning
variants = spin_variants("Original text.", count=5, lang="en")

# Batch + chunked processing
results = humanize_batch(["Text 1", "Text 2"], lang="en", max_workers=4)
result = humanize_chunked(large_doc, chunk_size=3000, lang="ru")

# Async API — native asyncio support
from texthumanize import async_humanize, async_detect_ai
result = await async_humanize("Text to process", lang="en")
ai = await async_detect_ai("Text to check", lang="en")

Before & After

Before (AI-generated):

Furthermore, it is important to note that the implementation of cloud computing facilitates the optimization of business processes. Additionally, the utilization of microservices constitutes a significant advancement.

After (TextHumanize, profile="web", intensity=70):

But cloud computing helps optimize how businesses work. Also, microservices are a big step forward.

Feature Matrix

Category	Feature	Python	JS	PHP
Core	`humanize()` — 17-stage pipeline	✅	✅	✅
	`humanize_batch()` — parallel processing	✅	—	✅
	`humanize_chunked()` — large text support	✅	—	✅
	`humanize_ai()` — three-tier AI + rules	✅	—	—
	`analyze()` — artificiality scoring	✅	✅	✅
	`explain()` — change report	✅	—	✅
AI Detection	`detect_ai()` — 13-metric + statistical ML	✅	✅	✅
	`detect_ai_batch()` — batch detection	✅	—	—
	`detect_ai_sentences()` — per-sentence	✅	—	—
	`detect_ai_mixed()` — mixed content	✅	—	—
NLP	`paraphrase()` — syntactic transforms	✅	—	✅
	`POSTagger` — rule-based POS (EN/RU/UK/DE)	✅	—	—
	`CJKSegmenter` — zh/ja/ko word segmentation	✅	—	—
	`SyntaxRewriter` — 8 sentence transforms	✅	—	—
	`WordLanguageModel` — perplexity (14 langs)	✅	—	—
	`CollocEngine` — PMI collocation scoring	✅	—	—
Tone	`analyze_tone()` — formality analysis	✅	—	✅
	`adjust_tone()` — 7-level adjustment	✅	—	✅
Watermarks	`detect_watermarks()` — 5 types	✅	—	✅
	`clean_watermarks()` — removal	✅	—	✅
Spinning	`spin()` / `spin_variants()`	✅	—	✅
Analysis	`analyze_coherence()` — paragraph flow	✅	—	✅
	`full_readability()` — 6 indices	✅	—	✅
	Stylistic fingerprinting	✅	—	—
Quality	`BenchmarkSuite` — 6-dimension scoring	✅	—	—
	`FingerprintRandomizer` — anti-detection	✅	—	—
Advanced	Style presets (5 personas)	✅	—	—
	Auto-Tuner (feedback loop)	✅	—	—
	Plugin system	✅	—	✅
	REST API (12 endpoints)	✅	—	—
	CLI (15+ commands)	✅	—	—
Languages	Full dictionary support	14	2	14
	Universal processor	✅	✅	✅

Profiles

Profile	Use Case	Sentence Length	Colloquialisms	Default Intensity
`chat`	Messaging, social media	8–18 words	High	80
`web`	Blog posts, articles	10–22 words	Medium	60
`seo`	SEO content (keyword-safe)	12–25 words	None	40
`docs`	Technical documentation	12–28 words	None	50
`formal`	Academic, legal	15–30 words	None	30
`academic`	Research papers	15–30 words	None	25
`marketing`	Sales, promo copy	8–20 words	Medium	70
`social`	Social media posts	6–15 words	High	85
`email`	Business emails	10–22 words	Medium	50

Style presets: student · copywriter · scientist · journalist · blogger

result = humanize(text, profile="seo", intensity=40,
                  constraints={"keep_keywords": ["API", "cloud"]})

Processing Pipeline

Input → Watermark Cleaning → Segmentation → CJK Segmentation → Typography
      → Debureaucratization → Structure → Repetitions → Liveliness
      → Paraphrasing → Syntax Rewriting → Tone Harmonization → Universal
      → Naturalization → Word LM Quality Gate → Readability → Grammar
      → Coherence Repair → Fingerprint Diversification → Validation → Output

17 stages with adaptive intensity (auto-reduces processing for already-natural text) and graduated retry (retries at lower intensity if change ratio exceeds limit).

AI Detection

13-metric ensemble + 35-feature statistical detector. No ML models, no APIs.

Metric	What It Measures
AI Patterns	Formulaic phrases ("it is important to note", "furthermore")
Burstiness	Sentence length uniformity (humans vary, AI doesn't)
Opening Diversity	Repetitive sentence starts
Entropy	Word predictability (Shannon entropy)
Vocabulary	Lexical richness (type-to-token ratio)
Perplexity	Character-level predictability
+ 7 more	Stylometry, coherence, grammar perfection, punctuation, rhythm, readability, Zipf

Ensemble: Weighted sum (50%) + Strong signal detector (30%) + Majority voting (20%)

Verdicts: human_written (< 35%) · mixed (35–65%) · ai_generated (≥ 65%)

result = detect_ai("Text to check.", lang="en")
print(f"{result['score']:.0%} — {result['verdict']}")

# Per-sentence detection
for s in detect_ai_sentences(text, lang="en"):
    print(f"{'🤖' if s['label'] == 'ai' else '👤'} {s['text'][:80]}")

CLI

texthumanize input.txt -l en -p web -i 70 -o output.txt
texthumanize input.txt --detect-ai
texthumanize input.txt --analyze
texthumanize input.txt --paraphrase -o out.txt
texthumanize input.txt --tone casual
texthumanize dummy --api --port 8080
echo "Text" | texthumanize - -l en

REST API

python -m texthumanize.api --port 8080

Method	Endpoint	Description
`POST`	`/humanize`	Humanize text
`POST`	`/detect-ai`	AI detection (single or batch)
`POST`	`/analyze`	Text metrics
`POST`	`/paraphrase`	Paraphrase
`POST`	`/tone/analyze`	Tone analysis
`POST`	`/tone/adjust`	Tone adjustment
`POST`	`/watermarks/detect`	Detect watermarks
`POST`	`/watermarks/clean`	Clean watermarks
`POST`	`/spin`	Text spinning
`POST`	`/coherence`	Coherence analysis
`POST`	`/readability`	Readability metrics
`GET`	`/health`	Health check

curl -X POST http://localhost:8080/humanize \
  -H "Content-Type: application/json" \
  -d '{"text": "Your text here.", "lang": "en", "profile": "web"}'

Language Support

Language	Code	Bureaucratic	Synonyms	Collocations
Russian	`ru`	70+	50+	408
Ukrainian	`uk`	50+	48	38
English	`en`	40+	35+	1,578
German	`de`	64	45	125
French	`fr`	20	20	128
Spanish	`es`	18	18	126
Polish	`pl`	18	18	34
Portuguese	`pt`	16	17	36
Italian	`it`	16	17	38
Arabic	`ar`	81	80	—
Chinese	`zh`	80	80	—
Japanese	`ja`	60+	60+	—
Korean	`ko`	60+	60+	—
Turkish	`tr`	60+	60+	—

Universal processor works for any language using statistical methods (burstiness, perplexity, punctuation normalization).

Performance

All benchmarks on Apple Silicon (M-series), Python 3.12, single thread.

Function	Text Size	Avg Latency	Per 1K Words	Peak Memory
`humanize()`	30 words	0.1 ms	~5 ms	4 KB
`humanize()`	80 words	1.5 ms	~19 ms	4 KB
`humanize()`	400 words	0.1 ms	< 1 ms	6 KB
`detect_ai()`	30 words	4.3 ms	—	22 KB
`detect_ai()`	80 words	36.8 ms	—	71 KB
`detect_ai()`	400 words	162 ms	—	196 KB
`analyze()`	80 words	478 ms	—	362 KB
`paraphrase()`	80 words	0.2 ms	—	8 KB

Property	Value
LRU cache hit	11× faster than cold call
External network calls	0 (offline-first)
Deterministic (same seed)	✅ Always
Pipeline timeout	30 s (configurable)
Rate limiting (API)	10 req/s per IP, burst 20

Run benchmarks yourself: python benchmarks/run_benchmark.py

Plugin System

from texthumanize import Pipeline, humanize

def add_disclaimer(text: str, lang: str) -> str:
    return text + "\n\n---\nProcessed by TextHumanize."

Pipeline.register_hook(add_disclaimer, after="naturalization")
result = humanize("Your text here.")
Pipeline.clear_plugins()

Available stages: watermark → segmentation → typography → debureaucratization → structure → repetitions → liveliness → universal → naturalization → validation → restore

Architecture

texthumanize/                    # 75 Python modules, 42,375 lines
├── core.py                      # Facade: humanize(), analyze(), detect_ai()
├── pipeline.py                  # 17-stage pipeline + adaptive intensity
├── api.py                       # REST API server (12 endpoints)
├── cli.py                       # CLI (15+ commands)
├── exceptions.py                # Exception hierarchy
│
├── analyzer.py                  # Artificiality scoring + 6 readability metrics
├── detectors.py                 # AI detector: 13 metrics + ensemble
├── statistical_detector.py      # 35-feature ML classifier
├── pos_tagger.py                # POS tagger (EN/RU/UK/DE)
├── collocation_engine.py        # PMI collocation scoring (2,511 collocations)
├── word_lm.py                   # Word-level LM (14 langs)
│
├── normalizer.py                # Typography (stage 2)
├── decancel.py                  # Debureaucratization (stage 3)
├── structure.py                 # Sentence diversification (stage 4)
├── naturalizer.py               # Burstiness + perplexity (stage 10)
├── paraphraser_ext.py           # Semantic paraphrasing (stage 7)
├── syntax_rewriter.py           # Structural transforms (stage 7b)
├── grammar_fix.py               # Grammar correction (stage 12)
├── coherence_repair.py          # Coherence repair (stage 13)
├── validator.py                 # Quality validation (stage 14)
│
├── tone.py                      # Tone analysis & adjustment
├── watermark.py                 # Watermark detection & cleaning
├── spinner.py                   # Text spinning
├── coherence.py                 # Coherence analysis
├── morphology.py                # Morphological engine (RU/UK/EN/DE)
├── ...                          # 30+ more modules
│
└── lang/                        # 14 language packs + registry
    ├── en.py, ru.py, de.py ...  # Data only, no logic
    └── ar.py, zh.py, ja.py ...  # Including CJK + RTL

Design principles: Modular · Declarative rules · Idempotent · Safe defaults · Extensible · Zero dependencies · Lazy imports

Testing & Quality

Platform	Tests	Status
Python	1,802	✅ All passing
PHP	223	✅ All passing
TypeScript	28	✅ All passing
Total	2,053	✅

pytest -q                          # 1802 passed
ruff check texthumanize/           # Lint
mypy texthumanize/                 # Type check
cd php && php vendor/bin/phpunit   # 223 tests

CI/CD runs on every push: Python 3.9–3.13 + PHP 8.1–8.3 matrix, ruff, mypy, pytest with coverage ≥70%.

Security

Aspect	Implementation
Input limits	1 MB text, 5 MB API body
Network calls	Zero. No telemetry, no analytics
Dependencies	Zero. Pure stdlib
Regex safety	All linear-time, no user input compiled to regex
Reproducibility	Seed-based PRNG, deterministic output
Sandboxing	Resource limits documented for production

Docker

docker build -t texthumanize .
docker run -p 8080:8080 texthumanize

# API mode
docker run -p 8080:8080 texthumanize --api --port 8080

# Process a file
docker run -v $(pwd):/data texthumanize /data/input.txt -o /data/output.txt

For Business & Enterprise

Requirement	How TextHumanize Delivers
Predictability	Seed-based PRNG — same input + seed = identical output
Privacy	100% local. Zero network calls. No data leaves your server
Auditability	Every call returns change_ratio, quality_score, similarity, explain() report
Integration	Python SDK · JS SDK · PHP SDK · CLI · REST API · Docker
Reliability	2,053 tests across 3 platforms, CI/CD with ruff + mypy
No vendor lock-in	Zero dependencies. No cloud APIs, no API keys, no rate limits
Language coverage	14 full language packs + universal processor for any language

Contributing

See CONTRIBUTING.md for development setup, testing, and PR guidelines.

License & Pricing

TextHumanize uses a dual license model:

Use Case	License	Cost
Personal / Academic / Open-source	Free License	Free
Commercial — 1 dev, 1 project	Indie	$199/year
Commercial — up to 5 devs	Startup	$499/year
Commercial — up to 20 devs	Business	$1,499/year
Enterprise / On-prem / SLA	Enterprise	Contact us

All commercial licenses include full source code, updates for 1 year, and email support.

Full licensing details → · See LICENSE for legal text · Contact: ksanyok@me.com

Documentation · Live Demo · GitHub · Issues · Discussions · Commercial License

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

0.28.1

Apr 5, 2026

0.28.0

Apr 4, 2026

0.27.1

Mar 3, 2026

0.27.0

Mar 3, 2026

0.25.0

Mar 1, 2026

This version

0.24.0

Mar 1, 2026

0.23.0

Feb 28, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

texthumanize-0.24.0.tar.gz (1.4 MB view details)

Uploaded Mar 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

texthumanize-0.24.0-py3-none-any.whl (1.2 MB view details)

Uploaded Mar 1, 2026 Python 3

File details

Details for the file texthumanize-0.24.0.tar.gz.

File metadata

Download URL: texthumanize-0.24.0.tar.gz
Upload date: Mar 1, 2026
Size: 1.4 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for texthumanize-0.24.0.tar.gz
Algorithm	Hash digest
SHA256	`80580edbc1dcb41d863c3c9cb12ab7b4b786b2edfe9d3b156ecb887ae7f23e2f`
MD5	`c9d65dda1d054415f8a61b9344ac37fb`
BLAKE2b-256	`ae6d97d1a83d338682ef26b9b5cdc5bb5b3f44823d49425e1901ab750074ba90`

See more details on using hashes here.

File details

Details for the file texthumanize-0.24.0-py3-none-any.whl.

File metadata

Download URL: texthumanize-0.24.0-py3-none-any.whl
Upload date: Mar 1, 2026
Size: 1.2 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.11

File hashes

Hashes for texthumanize-0.24.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`56b7d7d1157d27fc8c6cd0be58b351b662b5eefd6d62757a0ead81d64be97a6e`
MD5	`4aa8152c6b4660dedb0b39d8786c7b7d`
BLAKE2b-256	`4c76df9bc951320781af9d42dfaa89b2a0ef00e33f2960d7c6a38aeaa6dae8a5`

See more details on using hashes here.

texthumanize 0.24.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

TextHumanize

The most advanced open-source text naturalization engine

Why TextHumanize?

Comparison with Competitors

vs. Other Open-Source Libraries

Installation

Quick Start

All Features at a Glance

Before & After

Feature Matrix

Profiles

Processing Pipeline

AI Detection

CLI

REST API

Language Support

Performance

Plugin System

Architecture

Testing & Quality

Security

Docker

For Business & Enterprise

Contributing

License & Pricing

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes