Generate stylistic paraphrases of texts using local transformer models.
Project description
diversify-text
This package helps you generate stylistically diverse paraphrases of your own texts using huggingface transformer models locally.
pip install diversify-text
Table of contents
Usage
For file inputs (CSV, TSV, TXT), output options, punctuation splitting, and creating custom methods, see the full usage guide.
Single text
from diversify_text import diversify
results = diversify("The experiment was conducted in a controlled lab setting.")
[{
"original": "The experiment was conducted in a controlled lab setting.",
"paraphrases": [
"They ran the experiment in a controlled lab setting.",
"The experiment took place in a controlled lab.",
"A controlled lab was where the experiment was conducted.",
"In a controlled lab, the experiment was carried out.",
"The study was performed in a controlled lab environment.",
]
}]
Control number of paraphrases
results = diversify("Some text.", n_styles=3)
[{"original": "Some text.", "paraphrases": ["...", "...", "..."]}]
Caching
The diversify() function automatically caches loaded models between calls.
The generation model and the similarity filter are cached independently, so
toggling similarity_filter does not reload the generation model and vice
versa. Call clear_cache() to drop cached models and allow memory to be reclaimed when possible:
from diversify_text import clear_cache
clear_cache()
Using the class directly
You can also instantiate a Diversifier yourself for full control over the
model lifecycle:
from diversify_text import Diversifier
div = Diversifier(device="cuda", methods=["tinystyler"])
batch_1 = div.diversify(texts_1, n_styles=5)
batch_2 = div.diversify(texts_2, n_styles=5)
List of texts
results = diversify([
"The experiment was conducted in a controlled lab setting.",
"She graduated from MIT in 2019.",
])
[
{"original": "The experiment ...", "paraphrases": ["...", "...", ...]},
{"original": "She graduated ...", "paraphrases": ["...", "...", ...]},
]
Customising the TinyStyler style bank
TinyStyler generates each paraphrase by conditioning on a style example — a short sentence that demonstrates the target writing style. The style bank is the list of such examples that get cycled through when producing multiple paraphrases.
The default bank is a dictionary mapping style labels to lists of example sentences (drawn from the CORE corpus). You can replace or extend it by passing a custom bank via method_kwargs.
A style bank can be a dict[str, list[str]] or a list[list[str]]:
from diversify_text import diversify
from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
custom_bank = {
"academic": ["The results demonstrate a statistically significant effect."],
"enthusiastic": ["We found something really interesting — check this out!"],
"telegraphic": ["Key finding: effect confirmed. Details follow."],
}
results = diversify(
"The experiment was conducted in a controlled lab setting.",
method_kwargs={"tinystyler": {"style_bank": custom_bank}},
)
DEFAULT_STYLE_BANK is exported from diversify_text.method.tinystyler so you can build on it:
from diversify_text.method.tinystyler import DEFAULT_STYLE_BANK
extended_bank = {
**DEFAULT_STYLE_BANK,
"scientific": ["The data clearly indicate a statistically significant result."],
}
You can also select specific styles by key name with styles, instead of cycling through the entire bank.
The number of paraphrases is determined by the number of selected styles:
results = diversify(
"The experiment was conducted in a controlled lab setting.",
method_kwargs={"tinystyler": {"styles": ["research_article", "personal_blog", "recipe"]}},
)
Creating a custom method
from diversify_text import Diversifier
from diversify_text.method import DiversificationMethod
class MyMethod(DiversificationMethod):
name = "my_method"
def generate(self, texts, *, n_styles, max_new_tokens, temperature, top_p, **kwargs):
return [[f"{text} :: variant {i}" for i in range(n_styles)] for text in texts]
results = Diversifier(methods=[MyMethod()]).diversify("Hello", n_styles=3)
[{"original": "Hello", "paraphrases": ["Hello :: variant 0", "Hello :: variant 1", "Hello :: variant 2"]}]
Install
pip install diversify-text
Requires Python 3.10+.
Contributing
Development setup
[!NOTE] You must have uv installed. Full installation guide: https://docs.astral.sh/uv/getting-started/installation/
git clone https://github.com/AnnaWegmann/diversify_text.git
cd diversify_text
uv sync --group dev
source .venv/bin/activate
Running tests
# Run all tests
pytest
# Run a specific test file
pytest tests/test_core.py
# Run a specific test class or method
pytest tests/test_core.py::TestDiversifier
pytest tests/test_core.py::TestDiversifier::test_single_text_returns_one_result
Tests are also individually runnable via PyCharm's built-in test runner (right-click any test class or method).
Working with uv
Adding packages with uv add
To add packages to your project, always use uv add rather than uv pip install. This ensures that your dependencies are properly managed and recorded in your pyproject.toml.
uv add <package-name>
Adding packages to the dev group
If you need to add a package specifically for your development environment:
uv add --group dev <package-name>
Switching between dev and standard mode
After you are done with testing and want to go back to standard mode, you can remove the dev-only packages:
uv sync --no-group dev
This will disable all additional groups and just load your main project dependencies.
Best practice: run uv lock -U
Whenever you upgrade, downgrade, or change versions of packages, it's good practice to run:
uv lock -U
This updates your lock file to ensure all versions are consistent and everything is in sync.
Building docs locally
uv sync --group docs
sphinx-build -b html docs docs/_build/html
open docs/_build/html/index.html
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file diversify_text-0.1.3.tar.gz.
File metadata
- Download URL: diversify_text-0.1.3.tar.gz
- Upload date:
- Size: 39.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
99e3d5528d79de3ce657472ccd7d47fa5e656093847fe223c90f7251dab6158c
|
|
| MD5 |
fc80fd96fc76acc359c292f59f1eeac3
|
|
| BLAKE2b-256 |
4b63e72d6fdb667badffe531995efd313c8346301a2a20a028076c6fd0c3173f
|
File details
Details for the file diversify_text-0.1.3-py3-none-any.whl.
File metadata
- Download URL: diversify_text-0.1.3-py3-none-any.whl
- Upload date:
- Size: 47.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.17
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3b6c966ad33e9a5e7a99e9e335e9559294482242e43b4cbbd9d0c2e210d14153
|
|
| MD5 |
b53987b7a61bd5744cb9b5e20867735b
|
|
| BLAKE2b-256 |
152fd6e393c98a80330e357f0dc06a228d89621228a8633a92ed9a10f001de77
|