Skip to main content

Generate stylistic paraphrases of texts using local transformer models.

Project description

diversify-text

This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.

pip install diversify-text

Full documentation

Table of contents

Usage

For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the full usage guide.

Single text

from diversify_text import diversify

results = diversify("The experiment was conducted in a controlled lab setting.")
[{
    "original": "The experiment was conducted in a controlled lab setting.",
    "paraphrases": [
        "They ran the experiment in a controlled lab setting.",
        "The experiment took place in a controlled lab.",
        "A controlled lab was where the experiment was conducted.",
        "In a controlled lab, the experiment was carried out.",
        "The study was performed in a controlled lab environment.",
    ]
}]

Control number of paraphrases

results = diversify("Some text.", n_styles=3)
[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]

Using the class directly

Recommended when processing texts across several calls — the model is loaded once and reused across calls.

from diversify_text import Diversifier

div = Diversifier(device="cuda", methods=["tinystyler"])

batch_1 = div.diversify(texts_1, n_styles=5)
batch_2 = div.diversify(texts_2, n_styles=5)

List of texts

results = diversify([
    "The experiment was conducted in a controlled lab setting.",
    "She graduated from MIT in 2019.",
])
[
    {"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
    {"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
]

Customising the TinyStyler style bank

TinyStyler generates each paraphrase by conditioning on a style example — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.

The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via method_kwargs.

A style bank can be a dict[str, list[str]] or a list[list[str]]:

from diversify_text import diversify
from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK

custom_bank = {
    "academic": ["The results demonstrate a statistically significant effect."],
    "enthusiastic": ["We found something really interesting — check this out!"],
    "telegraphic": ["Key finding: effect confirmed. Details follow."],
}

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"style_bank": custom_bank}},
)

DEFAULT_STYLE_BANK is exported from diversify_text.method.tinystyler so you can build on it:

from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK

extended_bank = {
    **DEFAULT_STYLE_BANK,
    "scientific": ["The data clearly indicate a statistically significant result."],
}

You can also select specific styles by key name with styles, instead of cycling through the entire bank. The number of paraphrases is determined by the number of selected styles:

results = diversify(
    "The experiment was conducted in a controlled lab setting.",
    method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
)

Creating a custom method

from diversify_text import Diversifier
from diversify_text.method import DiversificationMethod


class MyMethod(DiversificationMethod):
    name = "my_method"

    def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
        return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]


results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]

Install

pip install diversify-text

Requires Python 3.10+.

Contributing

Development setup

[!NOTE] You must have uv installed. Full installation guide: https://docs.astral.sh/uv/getting-started/installation/

git clone https://github.com/AnnaWegmann/diversify_text.git
cd diversify_text
uv sync --group dev
source .venv/bin/activate

Running tests

# Run all tests
pytest

# Run a specific test file
pytest tests/test_core.py

# Run a specific test class or method
pytest tests/test_core.py::TestDiversifier
pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result

Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).

Working with uv

Adding packages with uv add

To add packages to your project, always use uv add rather than uv pip install. This ensures that your dependencies are properly managed and recorded in your pyproject.toml.

uv add <package-name>

Adding packages to the dev group

If you need to add a package specifically for your development environment:

uv add --group dev <package-name>

Switching between dev and standard mode

After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:

uv sync --no-group dev

This will disable all additional groups and just load your main project dependencies.

Best practice: run uv lock -U

Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:

uv lock -U

This updates your lock file to ensure all versions are consistent and everything is in sync.

Building docs locally

uv sync --group docs
sphinx-build -b html docs docs/_build/html
open docs/_build/html/index.html

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

diversify_text-0.1.2.tar.gz (37.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

diversify_text-0.1.2-py3-none-any.whl (44.8 kB view details)

Uploaded Python 3

File details

Details for the file diversify_text-0.1.2.tar.gz.

File metadata

  • Download URL: diversify_text-0.1.2.tar.gz
  • Upload date:
  • Size: 37.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.17

File hashes

Hashes for diversify_text-0.1.2.tar.gz
Algorithm Hash digest
SHA256 24026ae1415e59a8dcda746caf9b43ba83e8c8e408374b68b758935ad44dd429
MD5 2f79d074f43233bde7cce4892d5466fe
BLAKE2b-256 5539b9530c9905a715a2391a5814c13df12ea6c5a13363e8a338f5f6801789d0

See more details on using hashes here.

File details

Details for the file diversify_text-0.1.2-py3-none-any.whl.

File metadata

File hashes

Hashes for diversify_text-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 c7491373f58dbccac3ea9ad233e2034bca8132424622421895e4cd8c0368f1b1
MD5 9b06afa3df57daf8e6838a383464d812
BLAKE2b-256 40f329861969230a99a0af47f838f49d4a93fe9e85e65d738c98f20f4c53012f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page