Skip to main content

Offline AI-text humanizer that preserves images, tables, and research context.

Project description

texthumanizer 📝

Offline AI-text humanizer & plagiarism reducer for students and researchers. No internet needed after model download. Preserves research context, citations, and semantic meaning.


✨ Features

Feature Detail
Humanize AI text Rewrites ChatGPT / Claude / Gemini output to sound natural
Plagiarism reduction Paraphrase-based, not just synonym swap
Semantic preservation Meaning, tone, and argument structure kept intact
Research-aware Citations [1], abbreviations DNA, units 95%, et al. — all preserved
DOCX support Paragraph-level processing; headings untouched
100% offline T5-based model runs locally after first download (~250 MB)
Lightweight CPU-friendly, no GPU required

📦 Installation

Full install (recommended)

pip install texthumanizer[all]

Minimal (text only, no docx)

pip install texthumanizer[ml]

With DOCX support only

pip install texthumanizer[ml,docx]

First run downloads the T5 model (~250 MB) from HuggingFace once and caches it locally.


🚀 Quick Start

1. Humanize pasted text

from texthumanizer import TextHumanizer

th = TextHumanizer()

ai_text = """
Artificial intelligence has rapidly transformed numerous sectors of society,
demonstrating unprecedented capabilities in natural language processing,
computer vision, and decision-making systems.
"""

result = th.humanize_text(ai_text)
print(result)

2a. Humanize a .docx → save new .docx

from texthumanizer import TextHumanizer

th = TextHumanizer()

# Saves "humanized_my_essay.docx" next to the original
output_path = th.humanize_doc("my_essay.docx", output="doc")
print(f"Saved: {output_path}")

# Custom output path
th.humanize_doc("my_essay.docx", output="doc", output_path="D:/final_essay.docx")

2b. Humanize a .docx → get plain text back

from texthumanizer import TextHumanizer

th = TextHumanizer()
text = th.humanize_doc("my_essay.docx", output="text")
print(text)

⚙️ Configuration

th = TextHumanizer(
    diversity=0.7,    # 0.0 = minimal changes, 1.0 = maximum rewriting (default: 0.7)
    device=-1,        # -1 = CPU, 0 = GPU (default: -1)
    verbose=True,     # Show progress (default: True)
)
Parameter Range Effect
diversity=0.3 Low Light rewording, very safe for technical papers
diversity=0.7 Medium Balanced — good for essays and reports ✅
diversity=0.9 High Heavy rewriting — good for blog posts or general text

🖥️ CLI Usage

Interactive mode

python -m texthumanizer.cli

Direct text

python -m texthumanizer.cli text "Your AI-generated text here" --diversity 0.7

Pipe from file

cat essay.txt | python -m texthumanizer.cli text

DOCX → humanized DOCX

python -m texthumanizer.cli doc essay.docx --output doc

DOCX → print text

python -m texthumanizer.cli doc essay.docx --output text

🔬 How It Works

Input Text
    │
    ▼
[Mask technical terms]         ← citations, abbreviations, units, years
    │
    ▼
[Split into sentences]         ← smart splitter (handles abbreviations)
    │
    ▼
[T5 Paraphrasing model]        ← humarin/chatgpt_paraphraser_on_T5_base
    │                            temperature + top-k + top-p sampling
    ▼
[Restore masked terms]         ← [1], DNA, 2023 put back exactly
    │
    ▼
Output Text

Why T5 and not GPT-style? T5 is an encoder–decoder model trained specifically on paraphrase tasks. It is:

  • Much smaller (~250 MB vs multi-GB GPT models)
  • CPU-friendly and fast
  • Better at preserving meaning than decoder-only models

📋 What Gets Preserved

Type Example Preserved?
Academic citations [1], [1,2,3]
Author citations (Smith et al., 2021)
Abbreviations DNA, AI, COVID, LSTM
Years 2023, 1990
Percentages 95%, 3.5%
Scientific units kg, MHz, nm, kcal
Figure/Table refs Fig. 3, Table 1
DOIs / URLs doi:10.xxx, https://...
Latin abbreviations et al., e.g., i.e.
Headings (in .docx) Section titles ✅ untouched

🧪 Example Output

Input (AI-generated):

The utilization of machine learning algorithms has demonstrated significant efficacy in the domain of medical diagnosis, achieving accuracy rates exceeding 95% in several clinical trials [1,2].

Output (humanized):

Using machine learning methods has shown strong results in medical diagnosis, reaching accuracy levels above 95% in a number of clinical studies [1,2].


💡 Tips for Best Results

  • Research papers: Use diversity=0.40.6 to keep technical accuracy
  • Essays / assignments: Use diversity=0.7 (default)
  • Blog posts / creative writing: Use diversity=0.80.9
  • Process section-by-section for long papers for best control
  • GPU users: set device=0 for ~5× speed improvement

🔬 How It Works (In-place Replacement)

Unlike other humanizers that strip away formatting, texthumanizer uses an "In-place run-level replacement" strategy:

  1. It creates a temporary copy of your .docx.
  2. It identifies text-bearing "runs" within each paragraph.
  3. It humanizes the text while skipping runs that contain images or drawings.
  4. It injects the new text back into the original XML structure, keeping your layout 100% intact.

📄 License

MIT License — free for personal and academic use.


⚠️ Disclaimer & Ethics

This tool is designed to assist researchers in improving the readability of their own writing. It is NOT intended for academic dishonesty or bypassing plagiarism checks for unoriginal work. Use responsibly and always cite your AI assistance if required by your institution.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

texthumanizer-1.1.0.tar.gz (14.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

texthumanizer-1.1.0-py3-none-any.whl (13.0 kB view details)

Uploaded Python 3

File details

Details for the file texthumanizer-1.1.0.tar.gz.

File metadata

  • Download URL: texthumanizer-1.1.0.tar.gz
  • Upload date:
  • Size: 14.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for texthumanizer-1.1.0.tar.gz
Algorithm Hash digest
SHA256 e0b7b9e2de2c808be770b98c6dabec7ba9a39b19ace78279b792602efa755410
MD5 e028b59da564fe672a372008ecc407d2
BLAKE2b-256 0286d2ab3d929522e2d397b3e6fea2cff63e800495724341019bff9dc826e6e7

See more details on using hashes here.

File details

Details for the file texthumanizer-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: texthumanizer-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 13.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.7

File hashes

Hashes for texthumanizer-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9569326ca802e8f04086d89188b59adb5df0d027837af61a14c6fead32111644
MD5 61628e89355aad8ebb43cf1bdcc4456d
BLAKE2b-256 2aa30299bb34c7ba175dc1cb0e32e9d0ee6f9520f352350badf09ad899ef95a0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page