Simple python DSL for parsing text

These details have not been verified by PyPI

Project links

repository

Project description

Turtles

Turtles is a small Python DSL for writing parsers that feel like dataclasses: you define a grammar as a collection of Rule classes, parse some input, and get back a hydrated object you can inspect, transform, or serialize.

This is especially useful when you have a custom format (or “mostly structured” text) and want a real parser instead of a giant regex, or handrolled parser.

NOTE: Implementation is still evolving. Please open an issue if you hit unexpected behavior.

Install

pip install turtles

Requires Python 3.12+. If you’re on Python <3.14, it’s recommended to add from __future__ import annotations at the top of your grammar modules so you can forward-reference rules (and define rules in any order).

Quickstart

Define a grammar (in a .py file), parse input, and use the structured result.

from turtles import Rule, char, repeat, at_least, separator

# define rules for our grammar
class Int(Rule, int):
    value: repeat[char["0-9"], at_least[1]]

class Float(Rule, float):
    whole: Int
    "."
    frac: Int

Number = Float | Int

class KV(Rule):
    key: repeat[char["a-zA-Z_"], at_least[1]]
    "="
    value: Number

class Row(Rule):
    items: repeat[KV, separator[" "], at_least[1]]


# parse some input with the grammar
src = "temp=21.5 humidity=45 retries=0"
row = Row(src)

# Work with hydrated objects
assert row.items[0].key == "temp"
assert row.items[0].value == 21.5

# Convert the whole parse result to plain Python containers
data = row.as_dict()
assert data == {
    "items": [
        {"key": "temp", "value": 21.5},
        {"key": "humidity", "value": 45},
        {"key": "retries", "value": 0},
    ]
}

# # Helpful while iterating on a grammar
print(repr(row))
# Row
# └── items: [3 items]
#     ├── [0]: KV
#     │   ├── key: temp
#     │   └── value: Float(float)
#     │       ├── whole: Int(int)
#     │       │   └── value: 21
#     │       └── frac: Int(int)
#     │           └── value: 5
#     ├── [1]: KV
#     │   ├── key: humidity
#     │   └── value: Int(int)
#     │       └── value: 45
#     └── [2]: KV
#         ├── key: retries
#         └── value: Int(int)
#             └── value: 0

Important notes

Rules must be defined in a real source file (not a REPL / exec) because Turtles inspects source to build the grammar.
Named fields (e.g. key: ...) become attributes on the hydrated result.
Unnamed fields are anonymous and only used to guide parsing, but are omitted from the result.
optional[...] and Rule | None captures are omitted from .as_dict() when absent.
Repeats of terminals become strings; repeats of Rules become lists of hydrated Rule instances.

Type Mixins

Rules can inherit from Python's built-in types (int, str, float, bool) to make parsed values behave like native types:

from turtles import Rule, char, repeat, at_least, sequence

class Integer(Rule, int):
    value: '0' | sequence[char['1-9'], repeat[char['0-9']]]

result = Integer("42")
assert result == 42              # Compares as int
assert isinstance(result, int)   # Type checks pass
assert result + 8 == 50          # Arithmetic works
assert result.as_dict() == 42    # as_dict() returns the int value

# Fields are still accessible
assert result.value == "42"

This is useful when you want parsed numeric or string values to integrate seamlessly with Python code.

Custom Converters

For more complex transformations, define a __convert__ method to transform parsed results into any Python type:

from turtles import Rule, char, repeat, at_least

class Point(Rule):
    x: repeat[char['0-9'], at_least[1]]
    ','
    y: repeat[char['0-9'], at_least[1]]
    
    def __convert__(self):
        return (int(self.x), int(self.y))

result = Point("10,20")
assert result == (10, 20)         # Compares as tuple
assert result.__class__ is tuple  # Type is tuple
assert result.as_dict() == (10, 20)

# Original fields still accessible
assert result.x == "10"
assert result.y == "20"

The converter runs after hydration, so all fields are populated before __convert__ is called. You can convert to any type: tuples, dataclasses, named tuples, custom classes, etc.

from dataclasses import dataclass
from turtles import Rule, char, repeat, at_least

@dataclass
class Coordinate:
    x: int
    y: int

class CoordRule(Rule):
    x: repeat[char['0-9'], at_least[1]]
    ','
    y: repeat[char['0-9'], at_least[1]]
    
    def __convert__(self):
        return Coordinate(int(self.x), int(self.y))

result = CoordRule("5,10")
assert result == Coordinate(5, 10)

Parse errors

Turtles automatically outputs user-friendly modern-style error messages whenever an input fails to parse

from turtles import ParseError

try:
    Row("not_a_kv_pair")
except ParseError as e:
    print(e)

Example output:

Error: incomplete KV: missing key

    ╭─[test.py:1:1]
  1 | not_a_kv_pair
    · ┬           ╱╲
    · │           ╰─ expected "="
    · ╰─ Row started here
    ╰───
  help: The input appears incomplete. Try adding "=".

Example grammars

The turtles/examples/ directory contains complete grammar examples:

File	Description
`semver.py`	Semantic versioning (`SemVer("1.2.3-alpha.1+build.5")`)
`json_toy.py`	Minimal JSON subset (good for learning)
`json.py`	Full RFC 8259 JSON grammar
`csv.py`	RFC 4180 CSV grammar

The test suite is also a great source of patterns:

File	Coverage
`tests/test_hydration.py`	Field captures, repeats, optionals, unions, mixins, converters
`tests/test_as_dict.py`	Serialization with `.as_dict()`
`tests/test_csv.py`	Real-world CSV parsing scenarios

Contributions welcome! Open a PR with new example grammars.

More Examples

Semantic Versions

from turtles import Rule, repeat, char, separator, sequence, at_least

class NumId(Rule):
    id: '0' | sequence[char['1-9'], repeat[char['0-9']]]

class Id(Rule):
    id: repeat[char['a-zA-Z0-9-'], at_least[1]]

class Prerelease(Rule):
    "-"
    ids: repeat[Id, separator['.'], at_least[1]]

class Build(Rule):
    "+"
    ids: repeat[Id, separator['.'], at_least[1]]

class SemVer(Rule):
    major: NumId
    "."
    minor: NumId
    "."
    patch: NumId
    prerelease: Prerelease | None
    build: Build | None


result = SemVer('1.2.3-alpha+build.5')

assert result.major.id == '1'
assert result.minor.id == '2'
assert result.patch.id == '3'
assert result.prerelease.ids[0].id == 'alpha'
assert result.build.ids[0].id == 'build'
assert result.build.ids[1].id == '5'

Toy JSON Parser

from turtles import Rule, char, repeat, at_least, separator

class Whitespace(Rule):
    repeat[char[' \t\n\r']]

class Comma(Rule):
    Whitespace
    ','
    Whitespace

class JNull(Rule):
    "null"

class JBool(Rule):
    value: "true" | "false"

class JNumber(Rule):
    value: repeat[char['0-9'], at_least[1]]

class JString(Rule):
    '"'
    value: repeat[char["A-Za-z0-9 !#$%&'()*+,-./:;<=>?@^_`{|}~"]]
    '"'

class JArray(Rule):
    '['
    Whitespace
    items: repeat[JSONValue, separator[Comma]]
    Whitespace
    ']'

class Pair(Rule):
    key: JString
    Whitespace
    ':'
    Whitespace
    value: JSONValue

class JObject(Rule):
    '{'
    Whitespace
    pairs: repeat[Pair, separator[Comma]]
    Whitespace
    '}'

# Rule union - JSONValue can be any of these types
JSONValue = JNull | JBool | JNumber | JString | JArray | JObject


src = '{ "A": { "a": null }, "B": [ true, false, 1, 2, 3 ] }'
result = JSONValue(src)

assert isinstance(result, JObject)
assert len(result.pairs) == 2
assert result.pairs[0].key.value == "A"
assert isinstance(result.pairs[1].value, JArray)

# Tree visualization
print(repr(result))

# Convert to plain Python containers
result.as_dict()

Note: This is a simplified grammar. See turtles/examples/json.py for a complete RFC 8259 implementation with floats, escapes, and full unicode support.

DSL Reference

Construct	Description	Example	BNF Equivalent
`"literal"`	Match exact string	`"hello"`	`"hello"`
`char['a-z']`	Character class	`char['0-9A-Fa-f']`	`[0-9A-Fa-f]`
`repeat[X]`	Zero or more	`repeat[char['0-9']]`	`[0-9]*`
`repeat[X, at_least[n]]`	At least n	`repeat[char['a-z'], at_least[1]]`	`[a-z]+` / `[a-z]{1,}`
`repeat[X, at_most[n]]`	At most n	`repeat[Int, at_most[10]]`	`Int{0,10}`
`repeat[X, exactly[n]]`	Exactly n	`repeat[Int, exactly[3]]`	`Int{3,3}`
`repeat[X, separator[Y]]`	Separated list	`repeat[Item, separator[',']]`	`Item (',' Item)*`
`optional[X]`	Zero or one	`optional[Sign]`	`Sign?`
`X \| None`	Optional rule	`prefix: Sign \| None`	`Sign?`
`A \| B \| C`	Rule union	`Value = Int \| Float \| String`	`Int \| Float \| String`
`sequence[A, B]`	Explicit sequence	`sequence[char['1-9'], repeat[char['0-9']]]`	`[1-9] [0-9]`
`field: X`	Named capture	`value: repeat[char['0-9']]`	—
`Rule, int`	Type mixin	`class Num(Rule, int): ...`	—
`__convert__`	Custom converter	`def __convert__(self): return int(self.x)`	—

Backend

Turtles uses a GLL (Generalized LL) parser backend. GLL is a general parsing algorithm for arbitrary context-free grammars, including grammars with ambiguity and left recursion, while still keeping the implementation reasonably small.

At a high level, parsing works like this:

The Rule class body is inspected and compiled into a context-free grammar.
The GLL parser runs against the input and produces a compact shared parse forest.
Turtles extracts a parse tree (with optional disambiguation rules like precedence/associativity) and hydrates it back into instances of your Rule classes.

Looking for the old Turtles?

⚠️ The turtles project has been rebooted. v2.0.0 and onward will not be compatible with the original v1.0.0 release. If you are looking for the original project, see Roguelazer/turtles.

Project details

These details have not been verified by PyPI

Project links

repository

Release history Release notifications | RSS feed

This version

2.0.6

Jan 26, 2026

2.0.5

Jan 25, 2026

2.0.4

Jan 25, 2026

2.0.3

Jan 25, 2026

2.0.2

Jan 25, 2026

2.0.1

Jan 24, 2026

2.0.0

Jan 23, 2026

2.0.0rc1 pre-release

Jan 22, 2026

2.0.0rc0 pre-release

Nov 6, 2025

1.0.0

Aug 31, 2017

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

turtles-2.0.6.tar.gz (79.9 kB view details)

Uploaded Jan 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

turtles-2.0.6-py3-none-any.whl (55.2 kB view details)

Uploaded Jan 26, 2026 Python 3

File details

Details for the file turtles-2.0.6.tar.gz.

File metadata

Download URL: turtles-2.0.6.tar.gz
Upload date: Jan 26, 2026
Size: 79.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Bazzite","version":"43","id":"Silverblue","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for turtles-2.0.6.tar.gz
Algorithm	Hash digest
SHA256	`145a0e4259a54fe0a81f17448ef920fa8a2caa2272c1e4e74c9151294ebcb036`
MD5	`b40b98fcc524675bd7cf298bad23e3fe`
BLAKE2b-256	`4d417dcabf633359cd03ca073089824f3041be86e94b744028cb013735bfc6f1`

See more details on using hashes here.

File details

Details for the file turtles-2.0.6-py3-none-any.whl.

File metadata

Download URL: turtles-2.0.6-py3-none-any.whl
Upload date: Jan 26, 2026
Size: 55.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.9.20 {"installer":{"name":"uv","version":"0.9.20","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Bazzite","version":"43","id":"Silverblue","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":null}

File hashes

Hashes for turtles-2.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`51f30594cb44745499f231012f71d13618dac68a693b3dc629a664ba6e92f537`
MD5	`a301db21cc97296cb38d21c4b2286a56`
BLAKE2b-256	`6e86da4b3b97b8925a5dabd8e127aef40f4186499d57418e1ae951f44a887ec1`

See more details on using hashes here.

turtles 2.0.6

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Turtles

Install

Quickstart

Important notes

Type Mixins

Custom Converters

Parse errors

Example grammars

More Examples

Semantic Versions

Toy JSON Parser

DSL Reference

Backend

Looking for the old Turtles?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes