Skip to main content

No project description provided

Project description

🌌 GravitasML

Lightweight Markup Parsing for Python - Perfect for LLMs

GravitasML Banner

License PyPI version Python versions CI/CD Status Code style: black

A lightweight Python library for parsing custom markup languages, built and used by AutoGPT


🤔 Why use GravitasML?

GravitasML is purpose-built for parsing simple markup structures, particularly LLM-generated outputs.

By design, it excludes XML features that can introduce security risks:

  • No DTD processing - Prevents billion laughs and quadratic blowup attacks
  • No external entities - Prevents XXE attacks
  • No entity expansion - Prevents decompression bombs
  • Simple and predictable - No namespaces, no attributes, just tags and content

Perfect for:

  • Parsing LLM outputs with xml tags
  • Simple configuration formats
  • Data extraction from controlled markup
  • Any scenario where you need safe, simple markup parsing

🛡️ Security by Design

GravitasML is immune to common XML vulnerabilities because it simply doesn't implement the features that enable them:

Attack Type GravitasML
Billion Laughs Safe (no entity support)
Quadratic Blowup Safe (no entity expansion)
External Entity Expansion (XXE) Safe (no external resources)
DTD Retrieval Safe (no DTD support)
Decompression Bomb Safe (no decompression)

Perfect for parsing LLM outputs and other scenarios where you need simple, secure markup processing.

✨ Features

GravitasML transforms custom markup into Python data structures:

  • Simple API - Parse markup to dictionaries with just a few lines of code
  • Pydantic Integration - Convert parsed data directly to Pydantic models for validation
  • Nested Structure Support - Handles nested tags, multiple roots, and repeated elements
  • Tag Normalization - Automatic whitespace handling and case conversion
  • Error Detection - Syntax error detection for unmatched or improperly nested tags

📦 Installation

pip install gravitasml

Or with Poetry:

poetry add gravitasml

🚀 Quick Start

Basic Usage

from gravitasml.token import tokenize
from gravitasml.parser import Parser

# Parse simple markup
markup = "<name>GravitasML</name>"
tokens = tokenize(markup)
parser = Parser(tokens)
result = parser.parse()

print(result)  # {'name': 'GravitasML'}

Nested Structure Example

from gravitasml.token import tokenize
from gravitasml.parser import Parser

markup = """
<person>
    <name>John Doe</name>
    <contact>
        <email>john@example.com</email>
        <phone>555-0123</phone>
    </contact>
</person>
"""

tokens = tokenize(markup)
result = Parser(tokens).parse()

# Result: {
#     'person': {
#         'name': 'John Doe',
#         'contact': {
#             'email': 'john@example.com',
#             'phone': '555-0123'
#         }
#     }
# }

🎓 Advanced Usage

Pydantic Model Integration

Transform your markup directly into validated Pydantic models:

from pydantic import BaseModel
from gravitasml.token import tokenize
from gravitasml.parser import Parser

class Contact(BaseModel):
    email: str
    phone: str

class Person(BaseModel):
    name: str
    contact: Contact

markup = """
<person>
    <name>Jane Smith</name>
    <contact>
        <email>jane@example.com</email>
        <phone>555-9876</phone>
    </contact>
</person>
"""

tokens = tokenize(markup)
parser = Parser(tokens)
person = parser.parse_to_pydantic(Person)

print(person.name)  # Jane Smith
print(person.contact.email)  # jane@example.com

Handling Repeated Tags

GravitasML automatically converts repeated tags into lists:

from gravitasml.token import tokenize
from gravitasml.parser import Parser

markup = "<tag><a>value1</a><a>value2</a></tag>"
tokens = tokenize(markup)
result = Parser(tokens).parse()
# Result: {'tag': [{'a': 'value1'}, {'a': 'value2'}]}

# Multiple root tags with the same name also become a list
markup2 = "<tag>content1</tag><tag>content2</tag>"
tokens2 = tokenize(markup2)
result2 = Parser(tokens2).parse()
# Result: [{'tag': 'content1'}, {'tag': 'content2'}]

Tag Name Normalization

Tag names are automatically normalized - spaces become underscores and names are lowercased:

from gravitasml.token import tokenize
from gravitasml.parser import Parser

# Spaces in tag names are converted to underscores
markup = "<User Profile><First Name>Alice</First Name></User Profile>"
tokens = tokenize(markup)
result = Parser(tokens).parse()
# Result: {'user_profile': {'first_name': 'Alice'}}

🏗️ Architecture

GravitasML uses a two-stage parsing approach:

  1. Tokenization (gravitasml.token) - Converts raw markup into a stream of tokens
  2. Parsing (gravitasml.parser) - Builds a tree structure and converts to Python objects

🧪 Testing

GravitasML comes with a test suite. To run the tests, execute the following command:

python -m unittest discover -v

📊 Dependencies

GravitasML has minimal dependencies:

  • Python 3.10, 3.11, or 3.12 (tested in CI)
  • Pydantic 2.x (for model validation features)
  • Black (development dependency for code formatting)
  • Pytest (development dependency)

🤝 Contributing

We welcome contributions! GravitasML uses:

  • Poetry for dependency management
  • Black for code formatting
  • GitHub Actions for CI/CD
  • unittest for testing

To contribute:

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Make your changes and add tests
  4. Ensure all tests pass and code is formatted with Black
  5. Commit your changes (git commit -m 'Add amazing feature')
  6. Push to the branch (git push origin feature/amazing-feature)
  7. Open a Pull Request

See our CI/CD workflow for the automated checks your PR must pass.

📝 Current Limitations

GravitasML is designed for simplicity. It currently does not support:

  • XML namespaces or schema validation
  • Tag attributes (e.g., <tag attr="value">)
  • Processing instructions or CDATA sections
  • Writing/generating markup (parsing only)
  • Streaming parsing for very large documents
  • Self-closing tags (e.g., <tag />)

These limitations are intentional to keep the library focused and easy to use. If you need these features, consider using Python's built-in xml.etree.ElementTree or third-party libraries like lxml.

🎯 Philosophy

GravitasML is built on the principle that not every markup parsing task needs the complexity of full XML processing. Sometimes you just want to convert simple markup to Python dictionaries without the overhead of namespaces, DTDs, or complex validation rules.

📄 License

GravitasML is licensed under the MIT License - see the LICENSE file for details.

🙏 Acknowledgments

Built by the AutoGPT Team and used in the AutoGPT project.


Simple markup parsing for modern Python applications.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gravitasml-0.1.4.tar.gz (7.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gravitasml-0.1.4-py3-none-any.whl (8.4 kB view details)

Uploaded Python 3

File details

Details for the file gravitasml-0.1.4.tar.gz.

File metadata

  • Download URL: gravitasml-0.1.4.tar.gz
  • Upload date:
  • Size: 7.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gravitasml-0.1.4.tar.gz
Algorithm Hash digest
SHA256 35d0d9fec7431817482d53d9c976e375557c3e041d1eb6928e809324a8c866e3
MD5 8911a2e2451f3270f7723dda7fd21bb2
BLAKE2b-256 031489ec16093615cb9b3f6902879140c8ae0895b8133726dfbd78f3fb55a9b5

See more details on using hashes here.

Provenance

The following attestation bundles were made for gravitasml-0.1.4.tar.gz:

Publisher: cicd.yml on Significant-Gravitas/gravitasml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gravitasml-0.1.4-py3-none-any.whl.

File metadata

  • Download URL: gravitasml-0.1.4-py3-none-any.whl
  • Upload date:
  • Size: 8.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for gravitasml-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 671a18b11d3d8a0e270c6a80c72cd058458b18d5ef7560d00010e962ab1bca74
MD5 4b2a81b7501dbfca7be3b5bdcec998da
BLAKE2b-256 e1d64fdcac30962e243b7ec5793661ac589b95ca0295808b4a6e89a3aca99b1e

See more details on using hashes here.

Provenance

The following attestation bundles were made for gravitasml-0.1.4-py3-none-any.whl:

Publisher: cicd.yml on Significant-Gravitas/gravitasml

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page