Skip to main content

Library for procedurally-generating text that resembles a particular language.

Project description

ipsum

Tests PyPI version License: MIT

Ipsum is a Python library for the generation of international placeholder text.

Unlike most other generators which work by scrambling a particular text (e.g. Lorem Ipsum generators with Cicero's "De Finibus Bonorum et Malorum"), it instead uses Markov models to generate a vocabulary of meaningless new words that resemble the language it was trained on. This allows for the generation of text that is typographically similar to a specified language (i.e. uses the same alphabet and punctuation, in the same manner and at the same frequency), but is semantically meaningless.

You can read more about how Ipsum works here.

You can use Ipsum directly from your browser by accessing the web app at ipsum.trifunovski.me.

It currently supports the following languages:

  • English
  • German
  • Albanian
  • Bulgarian
  • Dutch
  • English
  • French
  • German
  • Greek
  • Italian
  • Macedonian
  • Serbian
  • Spanish
  • Swedish

Installing

Note that ipsum requires Python >= 3.8.1.

Run

pip install ipsum

to install the latest published version of the library, or clone the repo and use poetry

git clone git@github.com:dtrifuno/ipsum
cd ipsum/ipsum
poetry install

to install a development copy.

Usage

import ipsum

# Load the English language model
model = ipsum.load_model("en")

# Returns a list of 3 strings, each resembling a paragraph of English
paragraphs = model.generate_paragraphs(3)

# Returns a list of 10 strings, each resembling a full sentence of English
sentences = model.generate_sentences(10)

# Returns a list of 50 words (does not include any punctuation)
words = model.generate_words(50)

Development

Typechecking, linting and testing

You can run

poetry run mypy /src /tests

to typecheck,

poetry run flake8

to lint, or

poetry run pytest --cov

to test the code.

Additional scripts

This repository contains several scripts that are useful in development, but are not included with the PyPI package. If you want to make a change to this library, please clone the repository instead. You can check out these scripts and what they do by running poetry run dev.

Adding a language

  1. Find out the two-letter ISO 639-1 code of the language you want to add (xx for the rest of this subsection). Add the full English name and ISO 639-1 code of the language to supported_languages.py.
  2. Prepare a corpus of texts in the language. The corpus should be packaged as a zip archive of .txt files.
  3. Write a parser for the language (look at src/ipsum/parse/en_parser.py for an example). Name the Parser instance xx_parser and save it as src/ipsum/parse/language/xx.py. Add the parser instance to load_parser in src/ipsum/parse/__init__.py.
  4. Run poetry run dev parser-diagnostics xx. Ideally, the parser should detect around 100,000 sentences and be able to parse into skeletons more than 50–60% of them.
  5. Run poetry run dev build_model xx && poetry run model_diagnostics xx.
  6. Inspect diagnostics/xx.png. If it looks good, congrats, you are done! Otherwise, return to Step 2 and try to figure out what went wrong.

Corpora

The models were trained on the following corpora:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipsum-0.1.1.tar.gz (475.6 kB view hashes)

Uploaded Source

Built Distribution

ipsum-0.1.1-py3-none-any.whl (480.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page