Skip to main content

Monsters for your language games.

Project description

     .─') _                                       .─') _                  
    (  OO) )                                     ( OO ) )            
  ░██████  ░██ ░██   ░██               ░██        ░██ ░██                                 
 ░██   ░██ ░██       ░██                ░██        ░██                                     
░██        ░██ ░██░████████  ░███████   ░████████  ░██ ░██░████████   ░████████ ░███████  
░██  █████ ░██ ░██   ░██    ░██('─.░██ ░██    ░██ ░██ ░██░██    ░██ ░██.─')░██ ░██        
░██     ██ ░██ ░██   ░██    ░██( OO ) ╱░██    ░██ ░██ ░██░██    ░██ ░██(OO)░██ ░███████  
  ░██  ░███ ░██ ░██   ░██    ░██    ░██ ░██    ░██ ░██ ░██░██    ░██ ░██ o ░███      ░██ 
  ░█████░█ ░██ ░██   ░████   ░███████  ░██    ░██ ░██ ░██░██    ░██  ░█████░██ ░███████  
                                                                          ░██            
                                                                  ░███████             

                        Every language game breeds monsters.

Python Versions PyPI version Wheel Linting and Typing
Entropy Budget Chaos Charm
Lore Compliance

Glitchlings are utilities for corrupting the text inputs to your language models in deterministic, linguistically principled ways.
Each embodies a different way that documents can be compromised in the wild.

If reinforcement learning environments are games, then Glitchlings are enemies to breathe new life into old challenges.

They do this by breaking surface patterns in the input while keeping the target output intact.

Some Glitchlings are petty nuisances. Some Glitchlings are eldritch horrors.
Together, they create truly nightmarish scenarios for your language models.

After all, what good is general intelligence if it can't handle a little chaos?

-The Curator

Motivation

If your model performs well on a particular task, but not when Glitchlings are present, it's a sign that it hasn't actually generalized to the problem.

Conversely, training a model to perform well in the presence of the types of perturbations introduced by Glitchlings should help it generalize better.

Quickstart

pip install -U glitchlings

The fastest way to get started is to ask my assistant, Auggie, to prepare a custom mix of glitchlings for you:

from glitchlings import Auggie, SAMPLE_TEXT

auggie = (
    Auggie(seed=404)
    .typo(rate=0.015)
    .confusable(rate=0.01)
    .homophone(rate=0.02)
)

print(auggie(SAMPLE_TEXT))

One morning, when Gregor Samsa woke from troubld dreams, he found himself transformed in his bed into a horible vermin. He layed on his armour-like back, and if he lifted his head a little he could see his brown belly, slightly domed and divided by arches into stiff sections. The bedding was hardly able to cover it and seemed ready to slide off any moment. His many legs, pitifully thin compared with the size of the rest of him, waved about helplessly as he looked.

You're more than welcome to summon them directly, if you're feeling brave:

from glitchlings import Gaggle, SAMPLE_TEXT, Typogre, Mim1c, Wherewolf

gaggle = Gaggle(
    [
        Typogre(rate=0.015),
        Mim1c(rate=0.01),
        Wherewolf(rate=0.02),
    ],
    seed=404
)

Consult the Glitchlings Usage Guide for end-to-end instructions spanning the Python API, CLI, and third-party integrations.

Your First Battle

Summon your chosen Glitchling (or a few, if ya nasty) and call it on your text or slot it into Dataset.map(...), supplying a seed if desired. Glitchlings are standard Python classes:

from glitchlings import Gaggle, Typogre, Mim1c

custom_typogre = Typogre(rate=0.1)
selective_mimic = Mim1c(rate=0.05, classes=["LATIN", "GREEK"])

gaggle = Gaggle([custom_typogre, selective_mimic], seed=99)
corrupted = gaggle("We Await Silent Tristero's Empire.")
print(corrupted)

Calling a Glitchling on a str transparently calls .corrupt(str, ...) -> str. This means that as long as your glitchlings get along logically, they play nicely with one another.

When summoned as or gathered into a Gaggle, the Glitchlings will automatically order themselves into attack waves, based on the scope of the change they make:

  1. Document
  2. Paragraph
  3. Sentence
  4. Word
  5. Character

They're horrible little gremlins, but they're not unreasonable.

Command-Line Interface (CLI)

Keyboard warriors can challenge them directly via the glitchlings command (see the generated CLI reference in docs/cli.md for the full contract):

# Discover which glitchlings are currently on the loose.
glitchlings --list
 
# Review the full CLI contract.
glitchlings --help
 
# Run Typogre against the contents of a file and inspect the diff.
glitchlings -g typogre --input-file documents/report.txt --diff

# Configure glitchlings inline by passing keyword arguments.
glitchlings -g "Typogre(rate=0.05)" "Ghouls just wanna have fun"

# Pipe text straight into the CLI for an on-the-fly corruption.
echo "Beware LLM-written flavor-text" | glitchlings -g mim1c

# Emit an Attack summary with metrics and counts.
glitchlings --attack --sample

# Emit a full Attack report with tokens, token IDs, and metrics.
glitchlings --report --sample

Configuration Files

Configurations live in plain YAML files so you can version-control experiments without touching code:

# Load a roster from a YAML attack configuration.
glitchlings --config experiments/chaos.yaml "Let slips the glitchlings of war"
# experiments/chaos.yaml
seed: 31337
glitchlings:
  - name: Typogre
    rate: 0.04
  - "Rushmore(rate=0.12, unweighted=True)"
  - name: Zeedub
    parameters:
      rate: 0.02
      characters: ["\u200b", "\u2060"]

Attack on Token

Looking to compare before/after corruption with metrics and stable seeds? Reach for the Attack helper, which bundles tokenization, metrics, and transcript batching into a single utility. It accepts plain list[str] batches, renders quick summary() reports, and can compare multiple tokenizers via Attack.compare(...) when you need a metrics matrix.

Development

Follow the development setup guide for editable installs, automated tests, and tips on enabling the Rust pipeline while you hack on new glitchlings.

Starter 'lings

For maintainability reasons, all Glitchling have consented to be given nicknames once they're in your care. See the Monster Manual for a complete bestiary.

Typogre

What a nice word, would be a shame if something happened to it.

Fatfinger. Typogre introduces character-level errors (duplicating, dropping, adding, or swapping) based on the layout of a keyboard (QWERTY by default, with Dvorak and Colemak variants built-in).

Typogre supports motor coordination weighting based on biomechanical research from the Aalto 136M Keystrokes dataset. Use motor_weighting="wet_ink" for uncorrected errors (cross-hand typos slip through) or motor_weighting="hastily_edited" for raw typing patterns before correction.

Mim1c

Wait, was that...?

Confusion. Mim1c replaces non-space characters with Unicode Confusables, characters that are distinct but would not usually confuse a human reader.

Hokey

She's soooooo coooool!

Passionista. Hokey gets a little excited and streeeeetches words for emphasis.

Apocryphal Glitchling contributed by Chloé Nunes

Scannequin

How can a computer need reading glasses?

OCArtifacts. Scannequin mimics optical character recognition errors by swapping visually similar character sequences (like rn↔m, cl↔d, O↔0, l/I/1).

Zeedub

Watch your step around here.

Invisible Ink. Zeedub slips zero-width codepoints between non-space character pairs, forcing models to reason about text whose visible form masks hidden glyphs.

Wherewolf

Did you hear what I heard?

Echo Chamber. Wherewolf swaps words with curated homophones so the text still sounds right while the spelling drifts. Groups are normalised to prevent duplicates and casing is preserved when substitutions fire.

Jargoyle

Uh oh. The worst person you know just bought a thesaurus.

Sesquipedalianism. Jargoyle insufferably replaces words with synonyms at random, without regard for connotational or denotational differences.

Rushmore

I accidentally an entire word.

Tactical Scrambler. Rushmore randomly drops, duplicates, or swaps words in the text to simulate hasty writing, editing mistakes, or transmission errors.

Redactyl

Oops, that was my black highlighter.

FOIA Reply. Redactyl obscures random words in your document like an NSA analyst with a bad sense of humor.

Apocrypha

Cave paintings and oral tradition contain many depictions of strange, otherworldly Glitchlings.
These Apocryphal Glitchling are said to possess unique abilities or behaviors.
If you encounter one of these elusive beings, please document your findings and share them with The Curator.

Ensuring Reproducible Corruption

Every Glitchling should own its own independent random.Random instance. That means:

  • No random.seed(...) calls touch Python's global RNG.
  • Supplying a seed when you construct a Glitchling (or when you summon(...)) makes its behavior reproducible.
  • Re-running a Gaggle with the same master seed and the same input text (and same external data!) yields identical corruption output.
  • Corruption functions are written to accept an rng parameter internally so that all randomness is centralized and testable.

At Wits' End?

If you're trying to add a new glitchling and can't seem to make it deterministic, here are some places to look for determinism-breaking code:

  1. Search for any direct calls to random.choice, random.shuffle, or set(...) ordering without going through the provided rng.
  2. Ensure you sort collections before shuffling or sampling.
  3. Make sure indices are chosen from a stable reference (e.g., original text) when applying length‑changing edits.
  4. Make sure there are enough sort keys to maintain stability.

References

Glitchlings incorporates research from the following sources:

  • Aalto 136M Keystrokes Dataset — Motor coordination weights for Typogre's biomechanically-informed error sampling:

    Dhakal, V., Feit, A. M., Kristensson, P. O., & Oulasvirta, A. (2018). Observations on Typing from 136 Million Keystrokes. Proceedings of the 2018 CHI Conference on Human Factors in Computing Systems (CHI '18), Article 646. https://doi.org/10.1145/3173574.3174220

  • Expressive Lengthening Research — Linguistic foundations for Hokey's stretchability scoring and site selection:

    Brody, S., & Diakopoulos, N. (2011). Cooooooooooooooollllllllllllll!!!!!!!!!!!!!!: Using Word Lengthening to Detect Sentiment in Microtext. Proceedings of the 2011 Conference on Empirical Methods in Natural Language Processing (EMNLP '11), 562–570. https://aclanthology.org/D11-1052

    Gray, B., Bruxvoort, C., Beigman Klebanov, B., & Leong, B. (2020). Expressive Lengthening in Social Media. Proceedings of the 12th Language Resources and Evaluation Conference (LREC 2020), 4517–4523. https://aclanthology.org/2020.lrec-1.556

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

glitchlings-0.10.3.tar.gz (284.6 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

glitchlings-0.10.3-cp313-cp313-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.13Windows x86-64

glitchlings-0.10.3-cp313-cp313-manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.13manylinux: glibc 2.28+ x86-64

glitchlings-0.10.3-cp313-cp313-macosx_11_0_universal2.whl (1.5 MB view details)

Uploaded CPython 3.13macOS 11.0+ universal2 (ARM64, x86-64)

glitchlings-0.10.3-cp312-cp312-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.12Windows x86-64

glitchlings-0.10.3-cp312-cp312-manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.12manylinux: glibc 2.28+ x86-64

glitchlings-0.10.3-cp312-cp312-macosx_11_0_universal2.whl (1.5 MB view details)

Uploaded CPython 3.12macOS 11.0+ universal2 (ARM64, x86-64)

glitchlings-0.10.3-cp311-cp311-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.11Windows x86-64

glitchlings-0.10.3-cp311-cp311-manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.11manylinux: glibc 2.28+ x86-64

glitchlings-0.10.3-cp311-cp311-macosx_11_0_universal2.whl (1.5 MB view details)

Uploaded CPython 3.11macOS 11.0+ universal2 (ARM64, x86-64)

glitchlings-0.10.3-cp310-cp310-win_amd64.whl (1.4 MB view details)

Uploaded CPython 3.10Windows x86-64

glitchlings-0.10.3-cp310-cp310-manylinux_2_28_x86_64.whl (1.7 MB view details)

Uploaded CPython 3.10manylinux: glibc 2.28+ x86-64

glitchlings-0.10.3-cp310-cp310-macosx_11_0_universal2.whl (1.5 MB view details)

Uploaded CPython 3.10macOS 11.0+ universal2 (ARM64, x86-64)

File details

Details for the file glitchlings-0.10.3.tar.gz.

File metadata

  • Download URL: glitchlings-0.10.3.tar.gz
  • Upload date:
  • Size: 284.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for glitchlings-0.10.3.tar.gz
Algorithm Hash digest
SHA256 c0dacf32359c8a220f627c8739e22f263e3e595b573035d63ed1f98993aa10c2
MD5 cd66b9153d3bb4096fc66b8edc1d8434
BLAKE2b-256 13a2225a8d5e9e6b28bf8cf632a4c5c794d4a893eda8ee244c5d9ba5e3e4f2a5

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3.tar.gz:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp313-cp313-win_amd64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp313-cp313-win_amd64.whl
Algorithm Hash digest
SHA256 f78d1d170fca4e49be8a015fde879af752e0e42a1f28b5f7d187aae3f2c259fd
MD5 0352904972c21248bf9e6d1f3a86f30a
BLAKE2b-256 1c1ad34e67eb4e151bcaea5fb142b7ca42fe74075fc3f99005f8fc70c84667fb

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp313-cp313-win_amd64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 9eeb998fb85f00e0e1e45d6739c9586846dc189d3979ddbb65875e13e62d201d
MD5 78d60e33bcdef0af6eb3b5e494d7311d
BLAKE2b-256 5d8c822c80bc1d9250f26ce83f64637a847cca8e3e83ea718256a91e9cb4b68d

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp313-cp313-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp313-cp313-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp313-cp313-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 962812250c13e6676732d5cd388a26077218f66246221b20756c1dc44abd5918
MD5 0db4843d90f9d361ba2f9d7fce5d9536
BLAKE2b-256 21bdf97dc64a513c8b621531a50f753ad18bef9063c39179320ed2fb1b1cf6b0

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp313-cp313-macosx_11_0_universal2.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 70b77e093e46d0aa6b5b80b30165d7dc8986511c06fea98282a23402c76109b6
MD5 81d663258bed90b3d19a74856ef6927b
BLAKE2b-256 2658a0951d64f5b7300713b927b03fb8da3f66b3fd7448b44f204b9e590a5996

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp312-cp312-win_amd64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 cf594c72a95a46fa5897bba0ae61bc64acbbed4f4a3509a6c4a830836ebbd77c
MD5 9b93a5d855ed007a88d584615c12b40c
BLAKE2b-256 0700bfaf9957f656b08e58d71751dc1a60917072d15cbbefb241a29a33e0b0a7

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp312-cp312-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp312-cp312-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp312-cp312-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 9167bbf674cd4729bba7cd0969ed94f9e868f03911ccd712a4276c8f4445a46a
MD5 df4491170f865937d89730ba60709306
BLAKE2b-256 4e946faf34e968f50c87c206687a94eb774d50983bff890cdfca9a628d613908

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp312-cp312-macosx_11_0_universal2.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 02702aabbcf806cd71dd7ee2d78a2d64c929a413c0b89da61c2c8ea3a50269b3
MD5 6763535b3c359315f64413c19e2a5843
BLAKE2b-256 7446d0fd487b96f870f439cc1828f406a4904a2c68ac75d642b552d6c8461696

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp311-cp311-win_amd64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 5d5a9925a5b5cba1a01a1339a0ec3fe76da382dac564bb7355fb1aa697f3b39e
MD5 a1d50aae2a3d0d690184570df29ac7ed
BLAKE2b-256 49de18886ca5eefe91bde99c693e1e416f604c58933e192370b3efb1a3f69dd5

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp311-cp311-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp311-cp311-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp311-cp311-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 7639d3a3a88b4975b42743926e057c0e7f51719da7a0e5a907c261a193b84460
MD5 17c687f3600024fba5be4cbb9c1c88b6
BLAKE2b-256 fd5ba89a2b8fd05e415cfaabd68ecde374bc0d1fd940f5f3eb88224b6d0b60a3

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp311-cp311-macosx_11_0_universal2.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 27cbc2729b87b079c6612703af3b4c026d8bb20b9d5ee95cdbee9dd300511f4c
MD5 a525b72607d96d273c1a0c3efbeab1eb
BLAKE2b-256 8355cc56c258e6bbbfeb8d452c8f8f7dc69b9a43edecbb3c5bd837f1b261e12e

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp310-cp310-win_amd64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 21ed559796cfdb026a5cc743e3c2f255d7703338e7472e47b19ff31abb5933cf
MD5 0383766811e1f4f5b97397a43aa52963
BLAKE2b-256 ebd4e0d9235112d9d0fc45a89f61215b74ddafb34c169b86ea60b120e9a65fdf

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp310-cp310-manylinux_2_28_x86_64.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file glitchlings-0.10.3-cp310-cp310-macosx_11_0_universal2.whl.

File metadata

File hashes

Hashes for glitchlings-0.10.3-cp310-cp310-macosx_11_0_universal2.whl
Algorithm Hash digest
SHA256 15faa6b14b1c3ad495b78bc34b6d625e378eff6db3e5a8be9f4302ef7616fcd1
MD5 ac7cd3c079ec6b65699d7ab640625c7a
BLAKE2b-256 b8f73852f05ce577e26b45ae829f38f1a48e65d7fd4dce9158fb1b3cd2d09f73

See more details on using hashes here.

Provenance

The following attestation bundles were made for glitchlings-0.10.3-cp310-cp310-macosx_11_0_universal2.whl:

Publisher: publish.yml on osoleve/glitchlings

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page