Skip to main content

Generate stylistic paraphrases of texts using local transformer models.

Project description

diversify-text

This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.

pip install diversify-text

Full documentation

Table of contents

Usage

For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the full usage guide.

Single text

from diversify_text import diversify

results = diversify("The experiment was conducted in a controlled lab setting.")
[{
    "original": "The experiment was conducted in a controlled lab setting.",
    "paraphrases": [
        "They ran the experiment in a controlled lab setting.",
        "The experiment took place in a controlled lab.",
        "A controlled lab was where the experiment was conducted.",
        "In a controlled lab, the experiment was carried out.",
        "The study was performed in a controlled lab environment.",
    ]
}]

Control number of paraphrases

results = diversify("Some text.", n=3)
[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]

Prompting method

Use the prompting method to generate paraphrases via a causal language model (default: SmolLM3-3B):

results = diversify("The experiment was conducted in a controlled lab setting.", methods=["prompting"])

Select specific prompt styles:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    methods=["prompting"],
    method_kwargs={
        "prompting": {
            "prompt_keys": ["simple_kew", "complex_kew", "caps_reif"]
        }
    },
)

Available prompt keys: wikipedia_paraphrase, simple_kew, complex_kew, formal_reif, simple_reif, passive_reif, caps_reif, lowcaps_reif, text_emojis_reif, less_common_verbs_reif, humanize_llm-as-coauthor_original, and all finephrase_* templates. See the full prompt reference for details.

Caching

The diversify() function automatically caches loaded models between calls. The generation model and the semantic filter are cached independently, so toggling semantic_filter does not reload the generation model and vice versa. Call clear_cache() to drop cached models and allow memory to be reclaimed when possible:

from diversify_text import clear_cache

clear_cache()

Using the class directly

You can also instantiate a Diversifier yourself for full control over the model lifecycle:

from diversify_text import Diversifier

div = Diversifier(device="cuda", methods=["tinystyler"])

batch_1 = div.diversify(texts_1, n=5)
batch_2 = div.diversify(texts_2, n=5)

List of texts

results = diversify([
    "The experiment was conducted in a controlled lab setting.",
    "She graduated from MIT in 2019.",
])
[
    {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
    {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
]

Customising the TinyStyler style bank

TinyStyler generates each paraphrase by conditioning on a style example — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.

The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via method_kwargs.

A style bank can be a dict[str, list[str]] or a list[list[str]]:

from diversify_text import diversify
from diversify_text.styles import DEFAULT_STYLE_BANK

custom_bank = {
    "academic": ["The results demonstrate a statistically significant effect."],
    "enthusiastic": ["We found something really interesting — check this out!"],
    "telegraphic": ["Key finding: effect confirmed. Details follow."],
}

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"style_bank": custom_bank}},
)

DEFAULT_STYLE_BANK is exported from diversify_text.styles so you can build on it:

from diversify_text.styles import DEFAULT_STYLE_BANK

extended_bank = {
    **DEFAULT_STYLE_BANK,
    "scientific": ["The data clearly indicate a statistically significant result."],
}

You can also select specific styles by key name with styles, instead of cycling through the entire bank. The number of paraphrases is determined by the number of selected styles:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
)

Creating a custom method

from diversify_text import Diversifier
from diversify_text.method import DiversificationMethod


class MyMethod(DiversificationMethod):
    name = "my_method"

    def generate(self, texts, *, n, max_new_tokens, temperature, top_p, **kwargs):
        return [[f"{text} :: variant {i}" for i in range(n)] for text in texts]


results = Diversifier(methods=[MyMethod()]).diversify("Hello", n=3)
[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]

Install

pip install diversify-text

Requires Python 3.10+.

Contributing

Development setup

[!NOTE] You must have uv installed. Full installation guide: https://docs.astral.sh/uv/getting-started/installation/

git clone https://github.com/AnnaWegmann/diversify_text.git
cd diversify_text
uv sync --group dev
source .venv/bin/activate

Running tests

# Run all tests
pytest

# Run a specific test file
pytest tests/test_core.py

# Run a specific test class or method
pytest tests/test_core.py::TestDiversifier
pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result

Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).

Working with uv

Adding packages with uv add

To add packages to your project, always use uv add rather than uv pip install. This ensures that your dependencies are properly managed and recorded in your pyproject.toml.

uv add <package-name>

Adding packages to the dev group

If you need to add a package specifically for your development environment:

uv add --group dev <package-name>

Switching between dev and standard mode

After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:

uv sync --no-group dev

This will disable all additional groups and just load your main project dependencies.

Best practice: run uv lock -U

Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:

uv lock -U

This updates your lock file to ensure all versions are consistent and everything is in sync.

Building docs locally

uv sync --group docs
sphinx-build -b html docs docs/_build/html
open docs/_build/html/index.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diversify_text-0.2.1.tar.gz (52.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diversify_text-0.2.1-py3-none-any.whl (63.2 kB view details)

Uploaded Python 3

File details

Details for the file diversify_text-0.2.1.tar.gz.

File metadata

  • Download URL: diversify_text-0.2.1.tar.gz
  • Upload date:
  • Size: 52.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diversify_text-0.2.1.tar.gz
Algorithm Hash digest
SHA256 d83f2d44223b56b4ba599e9f60880430c86dc88cc0094d8ab35cc74ae8f04416
MD5 8db8928569ca27fe39f210d5e445928b
BLAKE2b-256 4da3f2bb88544b48b5c02c1d818e0d21ed92b66f8eea85dbaa50cf8646195ccc

See more details on using hashes here.

Provenance

The following attestation bundles were made for diversify_text-0.2.1.tar.gz:

Publisher: publish.yml on AnnaWegmann/diversify_text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file diversify_text-0.2.1-py3-none-any.whl.

File metadata

  • Download URL: diversify_text-0.2.1-py3-none-any.whl
  • Upload date:
  • Size: 63.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for diversify_text-0.2.1-py3-none-any.whl
Algorithm Hash digest
SHA256 371024b7a2f9c984802443dc0e6ecc042790ca56174e8c7c5822cc1e5831e7c5
MD5 ed516f0b46872e837584d63f6db41fa7
BLAKE2b-256 21aa3fd9580df7b7191589baebc799f1a7d396e1c7ebdbc61c1797a5aab199d1

See more details on using hashes here.

Provenance

The following attestation bundles were made for diversify_text-0.2.1-py3-none-any.whl:

Publisher: publish.yml on AnnaWegmann/diversify_text

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page