Generate stylistic paraphrases of texts using local transformer models.

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

These details have not been verified by PyPI

Project description

diversify-text

This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.

pip install diversify-text

Full documentation

Usage
Install
Contributing

Usage

For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the full usage guide.

Single text

from diversify_text import diversify

results = diversify("The experiment was conducted in a controlled lab setting.")

[{
    "original": "The experiment was conducted in a controlled lab setting.",
    "paraphrases": [
        "They ran the experiment in a controlled lab setting.",
        "The experiment took place in a controlled lab.",
        "A controlled lab was where the experiment was conducted.",
        "In a controlled lab, the experiment was carried out.",
        "The study was performed in a controlled lab environment.",
    ]
}]

Control number of paraphrases

results = diversify("Some text.", n=3)

[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]

Prompting method

Use the prompting method to generate paraphrases via a causal language model (default: SmolLM3-3B):

results = diversify("The experiment was conducted in a controlled lab setting.", methods=["prompting"])

Select specific prompt styles:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={
        "prompting": {
            "prompt_keys": ["simple_kew", "complex_kew", "caps_reif"]
        }
    },
)

Available prompt keys: wikipedia_paraphrase, simple_kew, complex_kew, formal_reif, simple_reif, passive_reif, caps_reif, lowcaps_reif, text_emojis_reif, less_common_verbs_reif, humanize_llm-as-coauthor_original, and all finephrase_* templates. See the full prompt reference for details.

Caching

The diversify() function automatically caches loaded models between calls. The generation model and the semantic filter are cached independently, so toggling semantic_filter does not reload the generation model and vice versa. Call clear_cache() to drop cached models and allow memory to be reclaimed when possible:

from diversify_text import clear_cache

clear_cache()

Using the class directly

You can also instantiate a Diversifier yourself for full control over the model lifecycle:

from diversify_text import Diversifier

div = Diversifier(device="cuda", methods=["tinystyler"])

batch_1 = div.diversify(texts_1, n=5)
batch_2 = div.diversify(texts_2, n=5)

List of texts

results = diversify([
    "The experiment was conducted in a controlled lab setting.",
    "She graduated from MIT in 2019.",
])

[
    {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
    {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
]

Customising the TinyStyler style bank

TinyStyler generates each paraphrase by conditioning on a style example — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.

The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via method_kwargs.

A style bank can be a dict[str, list[str]] or a list[list[str]]:

from diversify_text import diversify
from diversify_text.styles import DEFAULT_STYLE_BANK

custom_bank = {
    "academic": ["The results demonstrate a statistically significant effect."],
    "enthusiastic": ["We found something really interesting — check this out!"],
    "telegraphic": ["Key finding: effect confirmed. Details follow."],
}

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"style_bank": custom_bank}},
)

DEFAULT_STYLE_BANK is exported from diversify_text.styles so you can build on it:

from diversify_text.styles import DEFAULT_STYLE_BANK

extended_bank = {
    **DEFAULT_STYLE_BANK,
    "scientific": ["The data clearly indicate a statistically significant result."],
}

You can also select specific styles by key name with styles, instead of cycling through the entire bank. The number of paraphrases is determined by the number of selected styles:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
)

Creating a custom method

from diversify_text import Diversifier
from diversify_text.method import DiversificationMethod


class MyMethod(DiversificationMethod):
    name = "my_method"

    def generate(self, texts, *, n, max_new_tokens, temperature, top_p, **kwargs):
        return [[f"{text} :: variant {i}" for i in range(n)] for text in texts]


results = Diversifier(methods=[MyMethod()]).diversify("Hello", n=3)

[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]

Install

pip install diversify-text

Requires Python 3.10+.

Contributing

Development setup

[!NOTE] You must have uv installed. Full installation guide: https://docs.astral.sh/uv/getting-started/installation/

git clone https://github.com/AnnaWegmann/diversify_text.git
cd diversify_text
uv sync --group dev
source .venv/bin/activate

Running tests

# Run all tests
pytest

# Run a specific test file
pytest tests/test_core.py

# Run a specific test class or method
pytest tests/test_core.py::TestDiversifier
pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result

Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).

Working with uv

Adding packages with `uv add`

To add packages to your project, always use uv add rather than uv pip install. This ensures that your dependencies are properly managed and recorded in your pyproject.toml.

uv add <package-name>

Adding packages to the dev group

If you need to add a package specifically for your development environment:

uv add --group dev <package-name>

Switching between dev and standard mode

After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:

uv sync --no-group dev

This will disable all additional groups and just load your main project dependencies.

Best practice: run `uv lock -U`

Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:

uv lock -U

This updates your lock file to ensure all versions are consistent and everything is in sync.

Building docs locally

uv sync --group docs
sphinx-build -b html docs docs/_build/html
open docs/_build/html/index.html

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

AnnaWegmann

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

0.2.1

Apr 15, 2026

0.2.0

Apr 9, 2026

0.1.3

Mar 12, 2026

0.1.2

Mar 12, 2026

0.1.1

Mar 12, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diversify_text-0.2.1.tar.gz (52.7 kB view details)

Uploaded Apr 15, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

diversify_text-0.2.1-py3-none-any.whl (63.2 kB view details)

Uploaded Apr 15, 2026 Python 3

File details

Details for the file diversify_text-0.2.1.tar.gz.

File metadata

Download URL: diversify_text-0.2.1.tar.gz
Upload date: Apr 15, 2026
Size: 52.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diversify_text-0.2.1.tar.gz
Algorithm	Hash digest
SHA256	`d83f2d44223b56b4ba599e9f60880430c86dc88cc0094d8ab35cc74ae8f04416`
MD5	`8db8928569ca27fe39f210d5e445928b`
BLAKE2b-256	`4da3f2bb88544b48b5c02c1d818e0d21ed92b66f8eea85dbaa50cf8646195ccc`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diversify_text-0.2.1.tar.gz:

Publisher: publish.yml on AnnaWegmann/diversify_text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diversify_text-0.2.1.tar.gz
- Subject digest: d83f2d44223b56b4ba599e9f60880430c86dc88cc0094d8ab35cc74ae8f04416
- Sigstore transparency entry: 1306126222
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: AnnaWegmann/diversify_text@638ba8acd02ad29ded0ccab7e725efdda6c9e461
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/AnnaWegmann
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@638ba8acd02ad29ded0ccab7e725efdda6c9e461
- Trigger Event: push

File details

Details for the file diversify_text-0.2.1-py3-none-any.whl.

File metadata

Download URL: diversify_text-0.2.1-py3-none-any.whl
Upload date: Apr 15, 2026
Size: 63.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diversify_text-0.2.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`371024b7a2f9c984802443dc0e6ecc042790ca56174e8c7c5822cc1e5831e7c5`
MD5	`ed516f0b46872e837584d63f6db41fa7`
BLAKE2b-256	`21aa3fd9580df7b7191589baebc799f1a7d396e1c7ebdbc61c1797a5aab199d1`

See more details on using hashes here.

Provenance

The following attestation bundles were made for diversify_text-0.2.1-py3-none-any.whl:

Publisher: publish.yml on AnnaWegmann/diversify_text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: diversify_text-0.2.1-py3-none-any.whl
- Subject digest: 371024b7a2f9c984802443dc0e6ecc042790ca56174e8c7c5822cc1e5831e7c5
- Sigstore transparency entry: 1306126334
- Sigstore integration time: Apr 15, 2026
Source repository:
- Permalink: AnnaWegmann/diversify_text@638ba8acd02ad29ded0ccab7e725efdda6c9e461
- Branch / Tag: refs/tags/v0.2.1
- Owner: https://github.com/AnnaWegmann
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@638ba8acd02ad29ded0ccab7e725efdda6c9e461
- Trigger Event: push

diversify-text 0.2.1

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

diversify-text

Table of contents

Usage

Single text

Control number of paraphrases

Prompting method

Caching

Using the class directly

List of texts

Customising the TinyStyler style bank

Creating a custom method

Install

Contributing

Development setup

Running tests

Working with uv

Adding packages with uv add

Adding packages to the dev group

Switching between dev and standard mode

Best practice: run uv lock -U

Building docs locally

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance

Adding packages with `uv add`

Best practice: run `uv lock -U`