Skip to main content

Offline AI-text humanizer that preserves images, tables, and research context.

Project description

texthumanizer 📝

Offline AI-text humanizer & plagiarism reducer for students and researchers. No internet needed after model download. Preserves research context, citations, and semantic meaning.


✨ Features

Feature Detail
Humanize AI text Rewrites ChatGPT / Claude / Gemini output to sound natural
Plagiarism reduction Paraphrase-based, not just synonym swap
Semantic preservation Meaning, tone, and argument structure kept intact
Research-aware Citations [1], abbreviations DNA, units 95%, et al. — all preserved
DOCX support Paragraph-level processing; headings untouched
100% offline T5-based model runs locally after first download (~250 MB)
Lightweight CPU-friendly, no GPU required

📦 Installation

Full install (recommended)

pip install texthumanizer[all]

Minimal (text only, no docx)

pip install texthumanizer[ml]

With DOCX support only

pip install texthumanizer[ml,docx]

First run downloads the T5 model (~250 MB) from HuggingFace once and caches it locally.


🚀 Quick Start

1. Humanize pasted text

from texthumanizer import TextHumanizer

th = TextHumanizer()

ai_text = """
Artificial intelligence has rapidly transformed numerous sectors of society,
demonstrating unprecedented capabilities in natural language processing,
computer vision, and decision-making systems.
"""

result = th.humanize_text(ai_text)
print(result)

2a. Humanize a .docx → save new .docx

from texthumanizer import TextHumanizer

th = TextHumanizer()

# Saves "humanized_my_essay.docx" next to the original
output_path = th.humanize_doc("my_essay.docx", output="doc")
print(f"Saved: {output_path}")

# Custom output path
th.humanize_doc("my_essay.docx", output="doc", output_path="D:/final_essay.docx")

2b. Humanize a .docx → get plain text back

from texthumanizer import TextHumanizer

th = TextHumanizer()
text = th.humanize_doc("my_essay.docx", output="text")
print(text)

⚙️ Configuration

th = TextHumanizer(
    diversity=0.7,    # 0.0 = minimal changes, 1.0 = maximum rewriting (default: 0.7)
    device=-1,        # -1 = CPU, 0 = GPU (default: -1)
    verbose=True,     # Show progress (default: True)
)
Parameter Range Effect
diversity=0.3 Low Light rewording, very safe for technical papers
diversity=0.7 Medium Balanced — good for essays and reports ✅
diversity=0.9 High Heavy rewriting — good for blog posts or general text

🖥️ CLI Usage

Interactive mode

python -m texthumanizer.cli

Direct text

python -m texthumanizer.cli text "Your AI-generated text here" --diversity 0.7

Pipe from file

cat essay.txt | python -m texthumanizer.cli text

DOCX → humanized DOCX

python -m texthumanizer.cli doc essay.docx --output doc

DOCX → print text

python -m texthumanizer.cli doc essay.docx --output text

🔬 How It Works

Input Text
    │
    ▼
[Mask technical terms]         ← citations, abbreviations, units, years
    │
    ▼
[Split into sentences]         ← smart splitter (handles abbreviations)
    │
    ▼
[T5 Paraphrasing model]        ← humarin/chatgpt_paraphraser_on_T5_base
    │                            temperature + top-k + top-p sampling
    ▼
[Restore masked terms]         ← [1], DNA, 2023 put back exactly
    │
    ▼
Output Text

Why T5 and not GPT-style? T5 is an encoder–decoder model trained specifically on paraphrase tasks. It is:

  • Much smaller (~250 MB vs multi-GB GPT models)
  • CPU-friendly and fast
  • Better at preserving meaning than decoder-only models

📋 What Gets Preserved

Type Example Preserved?
Academic citations [1], [1,2,3]
Author citations (Smith et al., 2021)
Abbreviations DNA, AI, COVID, LSTM
Years 2023, 1990
Percentages 95%, 3.5%
Scientific units kg, MHz, nm, kcal
Figure/Table refs Fig. 3, Table 1
DOIs / URLs doi:10.xxx, https://...
Latin abbreviations et al., e.g., i.e.
Headings (in .docx) Section titles ✅ untouched

🧪 Example Output

Input (AI-generated):

The utilization of machine learning algorithms has demonstrated significant efficacy in the domain of medical diagnosis, achieving accuracy rates exceeding 95% in several clinical trials [1,2].

Output (humanized):

Using machine learning methods has shown strong results in medical diagnosis, reaching accuracy levels above 95% in a number of clinical studies [1,2].


💡 Tips for Best Results

  • Research papers: Use diversity=0.40.6 to keep technical accuracy
  • Essays / assignments: Use diversity=0.7 (default)
  • Blog posts / creative writing: Use diversity=0.80.9
  • Process section-by-section for long papers for best control
  • GPU users: set device=0 for ~5× speed improvement

🔬 How It Works (In-place Replacement)

Unlike other humanizers that strip away formatting, texthumanizer uses an "In-place run-level replacement" strategy:

  1. It creates a temporary copy of your .docx.
  2. It identifies text-bearing "runs" within each paragraph.
  3. It humanizes the text while skipping runs that contain images or drawings.
  4. It injects the new text back into the original XML structure, keeping your layout 100% intact.

📄 License

MIT License — free for personal and academic use.


⚠️ Disclaimer & Ethics

This tool is designed to assist researchers in improving the readability of their own writing. It is NOT intended for academic dishonesty or bypassing plagiarism checks for unoriginal work. Use responsibly and always cite your AI assistance if required by your institution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

texthumanizer-1.1.1.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

texthumanizer-1.1.1-py3-none-any.whl (12.9 kB view details)

Uploaded Python 3

File details

Details for the file texthumanizer-1.1.1.tar.gz.

File metadata

  • Download URL: texthumanizer-1.1.1.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for texthumanizer-1.1.1.tar.gz
Algorithm Hash digest
SHA256 8f04b2f00f3ee0be81641c67f029187d6b30300ef4aa8ea7d37c500d57880e1a
MD5 acba0dc6aa457c1258cd5f7000e96369
BLAKE2b-256 c4c02c14e1ac48eb3eee0816a085226e0ab7e94eddc7f28c0647715d3bf6a3e5

See more details on using hashes here.

File details

Details for the file texthumanizer-1.1.1-py3-none-any.whl.

File metadata

  • Download URL: texthumanizer-1.1.1-py3-none-any.whl
  • Upload date:
  • Size: 12.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for texthumanizer-1.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 366d22c5f13f833baaa52ad58294e56240c20f17c3e3c829f6505789c040e632
MD5 363ed19b6420206d5c81c2b205f6ebf9
BLAKE2b-256 394c104d74fff14931bd04f38ac7db665bd36aa4f1e9857a97272bba3b13ae67

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page