Skip to main content

Lyndon words, Duval factorization, necklace and bracelet enumeration, and De Bruijn sequences in pure Python.

Project description

lyndon-words

lyndon-words logo

CI License: MIT

Combinatorics on words in pure Python with zero dependencies: Lyndon-word testing, Duval's Chen-Fox-Lyndon factorization, necklace and bracelet enumeration, and De Bruijn sequences.

What is this?

A Lyndon word is a non-empty word that is strictly smaller than all of its rotations. Lyndon words are the building blocks of combinatorics on words: every word factors uniquely into a non-increasing concatenation of Lyndon words (the Chen-Fox-Lyndon theorem), and Lyndon words index the necklaces (rotation classes) and the De Bruijn sequences over an alphabet.

This library implements the classic algorithms directly, with no dependencies and strict typing.

Install

pip install lyndon-words

PyPI release pending. Install from source:

git clone https://github.com/amaar-mc/lyndon-words
cd lyndon-words
pip install -e .

Quick start

from lyndon_words import is_lyndon, factorize, enumerate_necklaces, enumerate_bracelets, de_bruijn, lyndon_array

# Lyndon-word test (accepts str, list, or tuple)
is_lyndon("aab")     # True
is_lyndon("aba")     # False
is_lyndon((0, 0, 1)) # True

# Duval's Chen-Fox-Lyndon factorization (non-increasing Lyndon factors)
factorize("banana")  # ['b', 'an', 'an', 'a']
factorize("aababab") # ['aabab', 'ab']

# Necklaces: rotation-class representatives over {0, 1} of length 3
list(enumerate_necklaces(alphabet_size=2, length=3))
# [(0, 0, 0), (0, 0, 1), (0, 1, 1), (1, 1, 1)]

# Bracelets: rotation + reflection classes over {0, 1} of length 4
list(enumerate_bracelets(alphabet_size=2, length=4))
# [(0, 0, 0, 0), (0, 0, 0, 1), (0, 0, 1, 1), (0, 1, 0, 1), (0, 1, 1, 1), (1, 1, 1, 1)]

# De Bruijn sequence B(2, 3): every length-3 binary word appears once cyclically
de_bruijn(alphabet_size=2, length=3)
# (0, 0, 0, 1, 0, 1, 1, 1)

# Lyndon array: longest Lyndon prefix length at each position
lyndon_array("banana")   # [1, 2, 1, 2, 1, 1]
lyndon_array("abcd")     # [4, 3, 2, 1]   (entire increasing suffix is Lyndon)
lyndon_array("aaaa")     # [1, 1, 1, 1]   (all equal, no run > 1 is Lyndon)
lyndon_array("0010011")  # [7, 2, 1, 4, 3, 1, 1]

Word representation

Words over an integer alphabet are represented as tuple[int, ...], for example (0, 0, 1) for a 3-symbol word over {0, 1}. This is the canonical type, and every enumeration and generation function (enumerate_necklaces, enumerate_bracelets, de_bruijn) yields or returns tuples of ints.

The word predicates is_lyndon and factorize are generic over any comparable Sequence, so str, list, and tuple all work. The factors returned by factorize have the same type as the input: pass a str and you get a list of str factors, pass a tuple and you get a list of tuple factors. String examples like "aab" behave exactly like integer tuples because string comparison is lexicographic.

API

Enumeration and generation functions take keyword-only alphabet_size and length.

Function Description
is_lyndon(word) True iff word is non-empty and strictly smaller than all proper rotations
factorize(word) Duval's O(n) Chen-Fox-Lyndon factorization into non-increasing Lyndon factors
lyndon_array(word) For each index i, the length of the longest Lyndon prefix of word[i:]
enumerate_necklaces(*, alphabet_size, length) FKM enumeration of rotation classes, lexicographic order
enumerate_bracelets(*, alphabet_size, length) Enumeration of rotation + reflection classes, lexicographic order
de_bruijn(*, alphabet_size, length) De Bruijn sequence B(alphabet_size, length) via FKM

alphabet_size >= 1 and length >= 1 are required; otherwise a ValueError is raised.

Definitions

Lyndon word. A non-empty word strictly smaller, in lexicographic order, than every one of its proper rotations (equivalently, every proper suffix). Lyndon words are aperiodic.

Chen-Fox-Lyndon factorization. Every non-empty word factors uniquely into Lyndon words l_1 >= l_2 >= ... >= l_m. Duval's algorithm computes this in O(n) time and O(1) extra space.

Necklace. A rotation-equivalence class of length-n words. The representative is the lexicographically least rotation. The count is (1/n) * sum over d dividing n of phi(d) * k^(n/d).

Bracelet. An equivalence class under rotation and reflection (the dihedral group). The representative is the lexicographically least element of the orbit.

Lyndon array. For a word s of length n, the Lyndon array is the integer array L where L[i] is the length of the longest Lyndon word that is a prefix of s[i:]. Every entry satisfies 1 <= L[i] <= n - i. The first entry L[0] equals the length of the first (leftmost) factor in the Chen-Fox-Lyndon factorization of s: that factor is the unique longest Lyndon prefix of the entire word.

De Bruijn sequence. A cyclic sequence B(k, n) of length k^n in which every length-n word appears exactly once as a contiguous cyclic subword. The FKM construction concatenates, in lexicographic order, every Lyndon word whose length divides n.

References

  • Duval, J.-P. (1983). Factorizing words over an ordered alphabet. Journal of Algorithms.
  • Chen, K.-T., Fox, R. H., Lyndon, R. C. (1958). Free differential calculus IV. Annals of Mathematics.
  • Fredricksen, H., Maiorana, J. (1978). Necklaces of beads in k colors and k-ary de Bruijn sequences. Discrete Mathematics.
  • Fredricksen, H., Kessler, I. J. (1986). An algorithm for generating necklaces of beads in two colors. Discrete Mathematics.
  • Ruskey, F. (2003). Combinatorial Generation. University of Victoria.

License

MIT. Copyright (c) 2026 Amaar Chughtai.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lyndon_words-0.2.0.tar.gz (880.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

lyndon_words-0.2.0-py3-none-any.whl (11.8 kB view details)

Uploaded Python 3

File details

Details for the file lyndon_words-0.2.0.tar.gz.

File metadata

  • Download URL: lyndon_words-0.2.0.tar.gz
  • Upload date:
  • Size: 880.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for lyndon_words-0.2.0.tar.gz
Algorithm Hash digest
SHA256 f11e967a3cb250542fa69ead6aed64247c1ef7c2d6870b7f6abf5063b97742ec
MD5 b223e8209d65635342405f0af5fadd45
BLAKE2b-256 5f91b58a2f43b891dcb61b4accda31efd7c104c1972346121aacdf2c3911a104

See more details on using hashes here.

File details

Details for the file lyndon_words-0.2.0-py3-none-any.whl.

File metadata

  • Download URL: lyndon_words-0.2.0-py3-none-any.whl
  • Upload date:
  • Size: 11.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.3

File hashes

Hashes for lyndon_words-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6ece1746a620eb34fb85df09766d89691c44b19ddf99524c64a49af259578049
MD5 5902a52d423e93ab5906f3a515c18a43
BLAKE2b-256 be3e8c9181dce23b9638c5aa400da7e1855b5a2c81093c36cddd6392f7680a60

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page