Offline AI-text humanizer that preserves images, tables, and research context.
Project description
texthumanizer 📝
Offline AI-text humanizer & plagiarism reducer for students and researchers. No internet needed after model download. Preserves research context, citations, and semantic meaning.
✨ Features
| Feature | Detail |
|---|---|
| Humanize AI text | Rewrites ChatGPT / Claude / Gemini output to sound natural |
| Plagiarism reduction | Paraphrase-based, not just synonym swap |
| Semantic preservation | Meaning, tone, and argument structure kept intact |
| Research-aware | Citations [1], abbreviations DNA, units 95%, et al. — all preserved |
| DOCX support | Paragraph-level processing; headings untouched |
| 100% offline | T5-based model runs locally after first download (~250 MB) |
| Lightweight | CPU-friendly, no GPU required |
📦 Installation
Full install (recommended)
pip install texthumanizer[all]
Minimal (text only, no docx)
pip install texthumanizer[ml]
With DOCX support only
pip install texthumanizer[ml,docx]
First run downloads the T5 model (~250 MB) from HuggingFace once and caches it locally.
🚀 Quick Start
1. Humanize pasted text
from texthumanizer import TextHumanizer
th = TextHumanizer()
ai_text = """
Artificial intelligence has rapidly transformed numerous sectors of society,
demonstrating unprecedented capabilities in natural language processing,
computer vision, and decision-making systems.
"""
result = th.humanize_text(ai_text)
print(result)
2a. Humanize a .docx → save new .docx
from texthumanizer import TextHumanizer
th = TextHumanizer()
# Saves "humanized_my_essay.docx" next to the original
output_path = th.humanize_doc("my_essay.docx", output="doc")
print(f"Saved: {output_path}")
# Custom output path
th.humanize_doc("my_essay.docx", output="doc", output_path="D:/final_essay.docx")
2b. Humanize a .docx → get plain text back
from texthumanizer import TextHumanizer
th = TextHumanizer()
text = th.humanize_doc("my_essay.docx", output="text")
print(text)
⚙️ Configuration
th = TextHumanizer(
diversity=0.7, # 0.0 = minimal changes, 1.0 = maximum rewriting (default: 0.7)
device=-1, # -1 = CPU, 0 = GPU (default: -1)
verbose=True, # Show progress (default: True)
)
| Parameter | Range | Effect |
|---|---|---|
diversity=0.3 |
Low | Light rewording, very safe for technical papers |
diversity=0.7 |
Medium | Balanced — good for essays and reports ✅ |
diversity=0.9 |
High | Heavy rewriting — good for blog posts or general text |
🖥️ CLI Usage
Interactive mode
python -m texthumanizer.cli
Direct text
python -m texthumanizer.cli text "Your AI-generated text here" --diversity 0.7
Pipe from file
cat essay.txt | python -m texthumanizer.cli text
DOCX → humanized DOCX
python -m texthumanizer.cli doc essay.docx --output doc
DOCX → print text
python -m texthumanizer.cli doc essay.docx --output text
🔬 How It Works
Input Text
│
▼
[Mask technical terms] ← citations, abbreviations, units, years
│
▼
[Split into sentences] ← smart splitter (handles abbreviations)
│
▼
[T5 Paraphrasing model] ← humarin/chatgpt_paraphraser_on_T5_base
│ temperature + top-k + top-p sampling
▼
[Restore masked terms] ← [1], DNA, 2023 put back exactly
│
▼
Output Text
Why T5 and not GPT-style? T5 is an encoder–decoder model trained specifically on paraphrase tasks. It is:
- Much smaller (~250 MB vs multi-GB GPT models)
- CPU-friendly and fast
- Better at preserving meaning than decoder-only models
📋 What Gets Preserved
| Type | Example | Preserved? |
|---|---|---|
| Academic citations | [1], [1,2,3] |
✅ |
| Author citations | (Smith et al., 2021) |
✅ |
| Abbreviations | DNA, AI, COVID, LSTM |
✅ |
| Years | 2023, 1990 |
✅ |
| Percentages | 95%, 3.5% |
✅ |
| Scientific units | kg, MHz, nm, kcal |
✅ |
| Figure/Table refs | Fig. 3, Table 1 |
✅ |
| DOIs / URLs | doi:10.xxx, https://... |
✅ |
| Latin abbreviations | et al., e.g., i.e. |
✅ |
| Headings (in .docx) | Section titles | ✅ untouched |
🧪 Example Output
Input (AI-generated):
The utilization of machine learning algorithms has demonstrated significant efficacy in the domain of medical diagnosis, achieving accuracy rates exceeding 95% in several clinical trials [1,2].
Output (humanized):
Using machine learning methods has shown strong results in medical diagnosis, reaching accuracy levels above 95% in a number of clinical studies [1,2].
💡 Tips for Best Results
- Research papers: Use
diversity=0.4–0.6to keep technical accuracy - Essays / assignments: Use
diversity=0.7(default) - Blog posts / creative writing: Use
diversity=0.8–0.9 - Process section-by-section for long papers for best control
- GPU users: set
device=0for ~5× speed improvement
🔬 How It Works (In-place Replacement)
Unlike other humanizers that strip away formatting, texthumanizer uses an "In-place run-level replacement" strategy:
- It creates a temporary copy of your
.docx. - It identifies text-bearing "runs" within each paragraph.
- It humanizes the text while skipping runs that contain images or drawings.
- It injects the new text back into the original XML structure, keeping your layout 100% intact.
📄 License
MIT License — free for personal and academic use.
⚠️ Disclaimer & Ethics
This tool is designed to assist researchers in improving the readability of their own writing. It is NOT intended for academic dishonesty or bypassing plagiarism checks for unoriginal work. Use responsibly and always cite your AI assistance if required by your institution.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file texthumanizer-1.1.0.tar.gz.
File metadata
- Download URL: texthumanizer-1.1.0.tar.gz
- Upload date:
- Size: 14.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e0b7b9e2de2c808be770b98c6dabec7ba9a39b19ace78279b792602efa755410
|
|
| MD5 |
e028b59da564fe672a372008ecc407d2
|
|
| BLAKE2b-256 |
0286d2ab3d929522e2d397b3e6fea2cff63e800495724341019bff9dc826e6e7
|
File details
Details for the file texthumanizer-1.1.0-py3-none-any.whl.
File metadata
- Download URL: texthumanizer-1.1.0-py3-none-any.whl
- Upload date:
- Size: 13.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9569326ca802e8f04086d89188b59adb5df0d027837af61a14c6fead32111644
|
|
| MD5 |
61628e89355aad8ebb43cf1bdcc4456d
|
|
| BLAKE2b-256 |
2aa30299bb34c7ba175dc1cb0e32e9d0ee6f9520f352350badf09ad899ef95a0
|