Skip to main content

LaTeX <-> JSON parser/generator for structured math documents

Project description

The library transforms raw LaTeX files into a structured JSON format based on logical blocks. This allows LLMs to process mathematical content without being overwhelmed by LaTeX formatting commands. JSON Structure Produced

The parser outputs a consistent hierarchical structure:

env = "other": Capture plain text or LaTeX commands located outside any \begin...\end environment.

env = "theorem", "definition", "equation", etc.: Capture structured content from specific LaTeX environments.

Preamble Configuration 🛠️

The generator can prepend a LaTeX preamble to ensure the output is a stand-alone, compilable document. By default, it looks for a file named preambule.tex. Example preambule.tex

Create a preambule.tex in your working directory:

Note: Including \begin{document} in your preamble is required if you want the generated output to be directly compilable.

Usage (Python API) 🐍

  1. Parse a LaTeX file to JSON
  2. Parse raw LaTeX text directly
  3. Generate LaTeX from a JSON file
  4. Generate LaTeX from an in-memory JSON dictionary Command Line Interface (CLI) 💻

If enabled in your pyproject.toml, you can use the library directly from your terminal:

Convert LaTeX to JSON:

Convert JSON to LaTeX: Notes & Current Limitations ⚠️

Scope: The parser currently targets standard environments of the form \begin{ENV} ... \end{ENV}.

Metadata: Environment options (e.g., \begin{theorem}[Optional title]) are currently kept within the content block and not yet extracted as separate metadata.

Applications: This library is ideal for:

    RAG / AI Search: Indexing mathematical proofs by semantic chunks.

    Educational Platforms: Segmenting long courses into manageable units.

    Automated Publishing: Programmatically generating LaTeX documents from structured data.

What this library does Parsing (LaTeX → JSON)

Reads LaTeX content (from a file or a string)

Removes some commands that are not needed for structure extraction (e.g., \label, \cite, \ref, \eqref)

Extracts the content between latex \begin{document} and \end{document}

Splits the document into \section{...}

Extracts LaTeX environments inside each section:

\begin{theorem}...\end{theorem}

\begin{definition}...\end{definition}

etc.

Anything that is not inside an environment is stored as a block with env="other".

Generation (JSON → LaTeX)

Loads JSON (file or JSON string) or accepts a Python dict

Generates LaTeX sections and blocks in the right order

Optionally prepends a LaTeX preamble from preambule.tex

Appends \end{document} at the end

JSON format

The parser returns a structure like:

{
  "document": {
    "sections": [
      {
        "title": "Intro",
        "content": "...",
        "blocks": [
          { "env": "other", "content": "Some text", "order": 0 },
          { "env": "theorem", "content": "Theorem text", "order": 1 }
        ]
      }
    ]
  }
}

Blocks

env = "other": LaTeX text outside any \begin...\end... environment.

env = "theorem", definition, remark, etc.: content captured from that environment.

order: used to preserve the original order of blocks in a section.

LaTeX preamble (preambule.tex)

The generator can prepend a preamble at the beginning of the generated .tex.

Default path: preambule.tex

You can change it via LatexGenerator(preamble_path="...")

Example preambule.tex

Create a file called preambule.tex next to your scripts:

\documentclass[11pt,a4paper]{article}
\usepackage[utf8]{inputenc}
\usepackage[T1]{fontenc}
\usepackage{lmodern}
\usepackage{amsmath,amssymb,amsthm}
\begin{document}

Important: include \begin{document} in your preamble if you want the output .tex to be directly compilable.

Python API LatexParser

Import:

from laguchori_latex import LatexParser

Methods

parse_file(path: str) -> dict Parse a .tex file and return structured JSON (dict).

parse_text(text: str) -> dict Parse LaTeX content provided as a string.

save_json(data: dict, path: str) -> None (static) Save JSON dict into a file.

Example: parse a file

from laguchori_latex import LatexParser

parser = LatexParser()
data = parser.parse_file("course.tex")

LatexParser.save_json(data, "extracted_elements.json")

Example: parse LaTeX text

from laguchori_latex import LatexParser

latex_text = r"""
\begin{document}
\section{Intro}
Hello
\begin{theorem}
A theorem text.
\end{theorem}
\end{document}
"""
parser = LatexParser()
data = parser.parse_text(latex_text)
print(data)

LatexGenerator

Import:

from laguchori_latex import LatexGenerator

Constructor

LatexGenerator(preamble_path: str = "preambule.tex") Loads a preamble from preamble_path if the file exists.

Methods

json_file_to_latex(json_path: str) -> str Load JSON from a file and generate LaTeX.

json_text_to_latex(json_text: str) -> str Load JSON from a string and generate LaTeX.

json_data_to_latex(data: dict) -> str Generate LaTeX from an in-memory dict.

save_latex(latex_code: str, path: str) -> None (static) Save LaTeX code into a .tex file.

Example: generate .tex from JSON file

from laguchori_latex import LatexGenerator

gen = LatexGenerator(preamble_path="preambule.tex")
latex_code = gen.json_file_to_latex("extracted_elements.json")

LatexGenerator.save_latex(latex_code, "output.tex")

Example: generate .tex from Python dict

from laguchori_latex import LatexGenerator
```json
data = {
  "document": {
    "sections": [
      {
        "title": "Section 1",
        "blocks": [
          {"env": "other", "content": "Free text.", "order": 0},
          {"env": "theorem", "content": "Theorem content.", "order": 1}
        ]
      }
    ]
  }
}
gen = LatexGenerator(preamble_path="preambule.tex")
latex_code = gen.json_data_to_latex(data)
print(latex_code)

CLI

If you enabled the CLI entrypoint in pyproject.toml, you can use:

LaTeX → JSON

laguchori-latex parse course.tex -o extracted_elements.json

JSON → LaTeX (with preamble)

laguchori-latex generate extracted_elements.json -o output.tex --preamble preambule.tex

End-to-end example

Step 1 — Parse LaTeX into JSON

laguchori-latex parse course.tex -o extracted_elements.json

Step 2 — Regenerate a compilable .tex

laguchori-latex generate extracted_elements.json -o output.tex --preamble preambule.tex

Limitations

The parser extracts environments of the form:

\begin{ENV} ... \end{ENV}

Environment options like:

\begin{theorem}[Optional title]

are not extracted yet.

Only \section{...} is handled (no \subsection yet).

Development

Install dev dependencies:

pip install -e .[dev]
pytest -q

Development 🛠️ License 📄

MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

laguchori_latex-0.1.8.tar.gz (11.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

laguchori_latex-0.1.8-py3-none-any.whl (10.3 kB view details)

Uploaded Python 3

File details

Details for the file laguchori_latex-0.1.8.tar.gz.

File metadata

  • Download URL: laguchori_latex-0.1.8.tar.gz
  • Upload date:
  • Size: 11.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for laguchori_latex-0.1.8.tar.gz
Algorithm Hash digest
SHA256 d8bd24118bc02d645def4110fd237317f9ed3de0a0542ddf029d6df012a969fe
MD5 18e4db2a9dc742a1dad88bed0d8730e4
BLAKE2b-256 9e25b984738c07a121a6a8e77a5e087b47e92acdaea4dedc77c1d6ffcfe0f1d0

See more details on using hashes here.

Provenance

The following attestation bundles were made for laguchori_latex-0.1.8.tar.gz:

Publisher: publish.yml on laguchoritarik/laguchori-latex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file laguchori_latex-0.1.8-py3-none-any.whl.

File metadata

File hashes

Hashes for laguchori_latex-0.1.8-py3-none-any.whl
Algorithm Hash digest
SHA256 ef8b0711355d6082793f722feea8a4017b78a29d74fc2f3504b87c2a70f1559f
MD5 2bb03eda22109f39c9bae3224789e17b
BLAKE2b-256 91467941c07dce5a9bccd303c4e4ae2c8ba52e71cebef1806e2c5414e3633f00

See more details on using hashes here.

Provenance

The following attestation bundles were made for laguchori_latex-0.1.8-py3-none-any.whl:

Publisher: publish.yml on laguchoritarik/laguchori-latex

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page