Serialization format readable for LLMs and humans

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FilipMalczak

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Project description

lmxml

A serialization format readable for both LLMs and humans.

I found that structured prompting yields great results. Instead of feeding the model a wall of text (possibly formatted with Markdown or sprinkled with pseudo-XML tags), you can write your prompt in-memory as an object (anything JSON-like), serialize it, and use it as a system or user prompt.

In my (non-benchmarked, but battle-tested) experience, models understand your intent much better that way — especially the smaller ones.

Why lmxml?

What do you use to serialize the prompts, though?

JSON is a first-class citizen in the world of tool-capable models, so why not that?

Because you might end up with a single-line prompt that’s hard to read and even harder to debug. You can pretty-print it, but once you introduce multi-line text fields, you’re parsing \n in your own head — and mine started hurting the first time I tried.

So, maybe YAML?

It is nicely formatted for human ingestion and supports |-style multi-line strings.
Well… yes, but have fun configuring YAML serializers to reliably emit that flavor.

There’s TOON as well, but it’s optimized for token usage, not human readability.
Let’s not even talk about INI or TOML — I think you already see how that won’t do us any good here.

But hey — every prompting guide tells you that XML-like <tags> make models understand structure better.
So… maybe XML?

I liked that idea best, but the eXtensible part turns out to be a curse, not a feature, for this use case. There’s no simple xml.dumps(...) in major XML libraries, so you’re forced to decide:

which values become attributes
how lists are represented
whether to use CDATA
how much whitespace matters

That’s basically what I’ve done.

lmxml stands for Language Model XML and is an opinionated way to produce XML-like text from JSON-ish data.

Yes, I know that if you expand the acronym, you technically get
Language Model eXtensible Modelling Language.
I figure that “XML” is a proper noun these days (like YAML), so I’m willing to pay this silly price for a cute name.

Core design principles

Indentation is mandatory and deterministic
No attributes, except one permitted attribute: index on <item> inside <list>
Everything is a tag — most leaves are single-line: <tag>value</tag>
Multiline strings are supported, but not indented:
- opening and closing tags are indented
- inner lines are raw (no leading spaces)
Collections: only lists are first-class (tuples and sets are silently converted to lists)
Top-level primitives are emitted as raw values (no wrapping <None> tag)
Primitives are serialized using Python str() semantics (True, False, 42, 3.14)

The goal is not expressiveness — it’s predictability.

Usage

As simple as this (which is the whole API surface btw):

import lmxml

data = {
    "user": {
        "id": 42,
        "name": "Ada",
        "bio": "Researcher\nLoves coffee"
    },
    "tags": ["ml", "nlp"]
}

print(lmxml.dumps(data))

That snippet prints:

<user>
  <id>42</id>
  <name>Ada</name>
  <bio>
Researcher
Loves coffee
  </bio>
</user>
<tags>
  <list>
    <item index="0">ml</item>
    <item index="1">nlp</item>
  </list>
</tags>

Pydantic support

If pydantic is importable, then you can feed any instance of BaseModel to lmxml.dumps. Following is an invariant:

x: pydantic.BaseModel
lmxml.dumps(x) == lmxml.dumps(x.model_dump(mode="json"))

Pydantic is not a dependency, even an optional one. I just recognize whether it is present and add that tweak (including typing) if it is.

Where's the deserializer?

There isn’t one.

This is not a data transport format. It’s a way to take structured concepts and feed them to a model reliably, while preserving both human readability and structural cues.

You can run the output through a standard XML parser, but you won’t get the original structure back out-of-the-box. That’s intentional.

Unless you serialized a primitive (which is emitted as raw str()), parsing should always succeed — as in no exceptions should be raised.

There are some minor gotchas (HTML escaping, no CDATA for multi-line strings, XML character restrictions), but you’re unlikely to hit them in normal prompting scenarios. If you do, open an issue — we’ll figure out whether it’s a bug or a feature.

lmxml is intentionally boring.

If you need schemas, validation, round-tripping, or extensibility — use something else.

If you want prompts that are easy to read, hard to break, and easy for models to follow — lmxml is for you.

Disclaimer for the AI age

I admit, this has been vibe-slapped together. ChatGPT can do surprisingly good job when writing code, although I did that by chatting and copying its snippets. Anyway, even though most of the code has been conjured via LLM magic, it has all been reviewed by yours truly (not like there was lots to review).

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

FilipMalczak

These details have not been verified by PyPI

Development Status
- 3 - Alpha
Intended Audience
- Developers
License
- OSI Approved :: MIT License
Programming Language

Release history Release notifications | RSS feed

This version

0.1.0

Feb 6, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lmxml-0.1.0.tar.gz (7.7 kB view details)

Uploaded Feb 6, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

lmxml-0.1.0-py3-none-any.whl (6.0 kB view details)

Uploaded Feb 6, 2026 Python 3

File details

Details for the file lmxml-0.1.0.tar.gz.

File metadata

Download URL: lmxml-0.1.0.tar.gz
Upload date: Feb 6, 2026
Size: 7.7 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmxml-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`4babbd5b0d019409fb0573b8099d7bf9dd0ccac343b8f82700ca2161d798db84`
MD5	`c8d457499c246522f6874ff0ac64b6d1`
BLAKE2b-256	`7c04a4a7c8477e2afa4a00b7811985dcdcf01b7d66003490cc356cc5f550e117`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmxml-0.1.0.tar.gz:

Publisher: on_release.yml on FilipMalczak/lmxml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmxml-0.1.0.tar.gz
- Subject digest: 4babbd5b0d019409fb0573b8099d7bf9dd0ccac343b8f82700ca2161d798db84
- Sigstore transparency entry: 924517571
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: FilipMalczak/lmxml@8cb00c0456a9c24092e33b29469a8be5c470a061
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/FilipMalczak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: on_release.yml@8cb00c0456a9c24092e33b29469a8be5c470a061
- Trigger Event: release

File details

Details for the file lmxml-0.1.0-py3-none-any.whl.

File metadata

Download URL: lmxml-0.1.0-py3-none-any.whl
Upload date: Feb 6, 2026
Size: 6.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for lmxml-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`818726edfa260b47898821eb6d19480144b5ec29ad26e91592396de0e23e4a8e`
MD5	`0dc1662efdb1575b7d77e2eda88bcdec`
BLAKE2b-256	`edc12ce169b71a9290c26050b89ad117a265bde5bd6f0773b272ddbc9e370040`

See more details on using hashes here.

Provenance

The following attestation bundles were made for lmxml-0.1.0-py3-none-any.whl:

Publisher: on_release.yml on FilipMalczak/lmxml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: lmxml-0.1.0-py3-none-any.whl
- Subject digest: 818726edfa260b47898821eb6d19480144b5ec29ad26e91592396de0e23e4a8e
- Sigstore transparency entry: 924517578
- Sigstore integration time: Feb 6, 2026
Source repository:
- Permalink: FilipMalczak/lmxml@8cb00c0456a9c24092e33b29469a8be5c470a061
- Branch / Tag: refs/tags/0.1.0
- Owner: https://github.com/FilipMalczak
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: on_release.yml@8cb00c0456a9c24092e33b29469a8be5c470a061
- Trigger Event: release

lmxml 0.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Project description

lmxml

Why lmxml?

Core design principles

Usage

Pydantic support

Where's the deserializer?

lmxml is intentionally boring.

Disclaimer for the AI age

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Meta

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance