Skip to main content

Library for procedurally-generating text that resembles a particular language.

Project description

ipsum

Tests PyPI version License: MIT

Ipsum is a Python library for the generation of international placeholder text.

Unlike most other generators which work by scrambling a particular text (e.g. Lorem Ipsum generators with Cicero's "De Finibus Bonorum et Malorum"), it instead uses Markov models to generate a vocabulary of meaningless new words that resemble the language it was trained on. This allows for the generation of text that is typographically similar to a specified language (i.e. uses the same alphabet and punctuation, in the same manner and at the same frequency), but is semantically meaningless.

You can read more about how Ipsum works here.

You can use Ipsum directly from your browser by accessing the web app at ipsum.trifunovski.me.

It currently supports the following languages:

  • English
  • German
  • Albanian
  • Bulgarian
  • Dutch
  • English
  • French
  • German
  • Greek
  • Italian
  • Macedonian
  • Serbian
  • Spanish
  • Swedish

Installing

Note that ipsum requires Python >= 3.8.1.

Run

pip install ipsum

to install the latest published version of the library, or clone the repo and use poetry

git clone git@github.com:dtrifuno/ipsum
cd ipsum/ipsum
poetry install

to install a development copy.

Usage

import ipsum

# Load the English language model
model = ipsum.load_model("en")

# Returns a list of 3 strings, each resembling a paragraph of English
paragraphs = model.generate_paragraphs(3)

# Returns a list of 10 strings, each resembling a full sentence of English
sentences = model.generate_sentences(10)

# Returns a list of 50 words (does not include any punctuation)
words = model.generate_words(50)

Development

Typechecking, linting and testing

You can run

poetry run mypy /src /tests

to typecheck,

poetry run flake8

to lint, or

poetry run pytest --cov

to test the code.

Additional scripts

This repository contains several scripts that are useful in development, but are not included with the PyPI package. If you want to make a change to this library, please clone the repository instead. You can check out these scripts and what they do by running poetry run dev.

Adding a language

  1. Find out the two-letter ISO 639-1 code of the language you want to add (xx for the rest of this subsection). Add the full English name and ISO 639-1 code of the language to supported_languages.py.
  2. Prepare a corpus of texts in the language. The corpus should be packaged as a zip archive of .txt files.
  3. Write a parser for the language (look at src/ipsum/parse/en_parser.py for an example). Name the Parser instance xx_parser and save it as src/ipsum/parse/language/xx.py. Add the parser instance to load_parser in src/ipsum/parse/__init__.py.
  4. Run poetry run dev parser-diagnostics xx. Ideally, the parser should detect around 100,000 sentences and be able to parse into skeletons more than 50–60% of them.
  5. Run poetry run dev build_model xx && poetry run model_diagnostics xx.
  6. Inspect diagnostics/xx.png. If it looks good, congrats, you are done! Otherwise, return to Step 2 and try to figure out what went wrong.

Corpora

The models were trained on the following corpora:

License

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ipsum-0.1.1.tar.gz (475.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ipsum-0.1.1-py3-none-any.whl (480.1 kB view details)

Uploaded Python 3

File details

Details for the file ipsum-0.1.1.tar.gz.

File metadata

  • Download URL: ipsum-0.1.1.tar.gz
  • Upload date:
  • Size: 475.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.8.16 Linux/5.15.0-1033-azure

File hashes

Hashes for ipsum-0.1.1.tar.gz
Algorithm Hash digest
SHA256 1fa215421c059313537b888f6a805f705a403a75ab3b78d5d27cd08d62637c9d
MD5 90d2ca4aaaf6983bf498206ab1d7987e
BLAKE2b-256 939b10b0d6d7e321cfb747511fa2252c22f839bffb4b1b9f44e35e03d5e73319

See more details on using hashes here.

File details

Details for the file ipsum-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ipsum-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 480.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.3.1 CPython/3.8.16 Linux/5.15.0-1033-azure

File hashes

Hashes for ipsum-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 cea8672488aefb3af920eff0abce26396e6eb49cf243727e2163898227131337
MD5 45e410ac8aa118937f5ce59ac3813620
BLAKE2b-256 72ac59fd7cdd81a40417781e54e644824c301be6697ba2c25a8f05787956ca3f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page