No project description provided
Project description
🌌 GravitasML
Lightweight Markup Parsing for Python - Perfect for LLMs
A lightweight Python library for parsing custom markup languages, built and used by AutoGPT
🤔 Why use GravitasML?
GravitasML is purpose-built for parsing simple markup structures, particularly LLM-generated outputs.
By design, it excludes XML features that can introduce security risks:
- No DTD processing - Prevents billion laughs and quadratic blowup attacks
- No external entities - Prevents XXE attacks
- No entity expansion - Prevents decompression bombs
- Simple and predictable - No namespaces, no attributes, just tags and content
Perfect for:
- Parsing LLM outputs with xml tags
- Simple configuration formats
- Data extraction from controlled markup
- Any scenario where you need safe, simple markup parsing
🛡️ Security by Design
GravitasML is immune to common XML vulnerabilities because it simply doesn't implement the features that enable them:
| Attack Type | GravitasML |
|---|---|
| Billion Laughs | ✅ Safe (no entity support) |
| Quadratic Blowup | ✅ Safe (no entity expansion) |
| External Entity Expansion (XXE) | ✅ Safe (no external resources) |
| DTD Retrieval | ✅ Safe (no DTD support) |
| Decompression Bomb | ✅ Safe (no decompression) |
Perfect for parsing LLM outputs and other scenarios where you need simple, secure markup processing.
✨ Features
GravitasML transforms custom markup into Python data structures:
- Simple API - Parse markup to dictionaries with just a few lines of code
- Pydantic Integration - Convert parsed data directly to Pydantic models for validation
- Nested Structure Support - Handles nested tags, multiple roots, and repeated elements
- Tag Normalization - Automatic whitespace handling and case conversion
- Error Detection - Syntax error detection for unmatched or improperly nested tags
📦 Installation
pip install gravitasml
Or with Poetry:
poetry add gravitasml
🚀 Quick Start
Basic Usage
from gravitasml.token import tokenize
from gravitasml.parser import Parser
# Parse simple markup
markup = "<name>GravitasML</name>"
tokens = tokenize(markup)
parser = Parser(tokens)
result = parser.parse()
print(result) # {'name': 'GravitasML'}
Nested Structure Example
from gravitasml.token import tokenize
from gravitasml.parser import Parser
markup = """
<person>
<name>John Doe</name>
<contact>
<email>john@example.com</email>
<phone>555-0123</phone>
</contact>
</person>
"""
tokens = tokenize(markup)
result = Parser(tokens).parse()
# Result: {
# 'person': {
# 'name': 'John Doe',
# 'contact': {
# 'email': 'john@example.com',
# 'phone': '555-0123'
# }
# }
# }
🎓 Advanced Usage
Pydantic Model Integration
Transform your markup directly into validated Pydantic models:
from pydantic import BaseModel
from gravitasml.token import tokenize
from gravitasml.parser import Parser
class Contact(BaseModel):
email: str
phone: str
class Person(BaseModel):
name: str
contact: Contact
markup = """
<person>
<name>Jane Smith</name>
<contact>
<email>jane@example.com</email>
<phone>555-9876</phone>
</contact>
</person>
"""
tokens = tokenize(markup)
parser = Parser(tokens)
person = parser.parse_to_pydantic(Person)
print(person.name) # Jane Smith
print(person.contact.email) # jane@example.com
Handling Repeated Tags
GravitasML automatically converts repeated tags into lists:
from gravitasml.token import tokenize
from gravitasml.parser import Parser
markup = "<tag><a>value1</a><a>value2</a></tag>"
tokens = tokenize(markup)
result = Parser(tokens).parse()
# Result: {'tag': [{'a': 'value1'}, {'a': 'value2'}]}
# Multiple root tags with the same name also become a list
markup2 = "<tag>content1</tag><tag>content2</tag>"
tokens2 = tokenize(markup2)
result2 = Parser(tokens2).parse()
# Result: [{'tag': 'content1'}, {'tag': 'content2'}]
Tag Name Normalization
Tag names are automatically normalized - spaces become underscores and names are lowercased:
from gravitasml.token import tokenize
from gravitasml.parser import Parser
# Spaces in tag names are converted to underscores
markup = "<User Profile><First Name>Alice</First Name></User Profile>"
tokens = tokenize(markup)
result = Parser(tokens).parse()
# Result: {'user_profile': {'first_name': 'Alice'}}
🏗️ Architecture
GravitasML uses a two-stage parsing approach:
- Tokenization (
gravitasml.token) - Converts raw markup into a stream of tokens - Parsing (
gravitasml.parser) - Builds a tree structure and converts to Python objects
🧪 Testing
GravitasML comes with a test suite. To run the tests, execute the following command:
python -m unittest discover -v
📊 Dependencies
GravitasML has minimal dependencies:
- Python 3.10, 3.11, or 3.12 (tested in CI)
- Pydantic 2.x (for model validation features)
- Black (development dependency for code formatting)
- Pytest (development dependency)
🤝 Contributing
We welcome contributions! GravitasML uses:
- Poetry for dependency management
- Black for code formatting
- GitHub Actions for CI/CD
- unittest for testing
To contribute:
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Make your changes and add tests
- Ensure all tests pass and code is formatted with Black
- Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
See our CI/CD workflow for the automated checks your PR must pass.
📝 Current Limitations
GravitasML is designed for simplicity. It currently does not support:
- XML namespaces or schema validation
- Tag attributes (e.g.,
<tag attr="value">) - Processing instructions or CDATA sections
- Writing/generating markup (parsing only)
- Streaming parsing for very large documents
- Self-closing tags (e.g.,
<tag />)
These limitations are intentional to keep the library focused and easy to use. If you need these features, consider using Python's built-in xml.etree.ElementTree or third-party libraries like lxml.
🎯 Philosophy
GravitasML is built on the principle that not every markup parsing task needs the complexity of full XML processing. Sometimes you just want to convert simple markup to Python dictionaries without the overhead of namespaces, DTDs, or complex validation rules.
📄 License
GravitasML is licensed under the MIT License - see the LICENSE file for details.
🙏 Acknowledgments
Built by the AutoGPT Team and used in the AutoGPT project.
Simple markup parsing for modern Python applications.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gravitasml-0.1.4.tar.gz.
File metadata
- Download URL: gravitasml-0.1.4.tar.gz
- Upload date:
- Size: 7.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
35d0d9fec7431817482d53d9c976e375557c3e041d1eb6928e809324a8c866e3
|
|
| MD5 |
8911a2e2451f3270f7723dda7fd21bb2
|
|
| BLAKE2b-256 |
031489ec16093615cb9b3f6902879140c8ae0895b8133726dfbd78f3fb55a9b5
|
Provenance
The following attestation bundles were made for gravitasml-0.1.4.tar.gz:
Publisher:
cicd.yml on Significant-Gravitas/gravitasml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gravitasml-0.1.4.tar.gz -
Subject digest:
35d0d9fec7431817482d53d9c976e375557c3e041d1eb6928e809324a8c866e3 - Sigstore transparency entry: 779878016
- Sigstore integration time:
-
Permalink:
Significant-Gravitas/gravitasml@9eaa339a2c65e9df32415169fb086d6c0145c5be -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/Significant-Gravitas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cicd.yml@9eaa339a2c65e9df32415169fb086d6c0145c5be -
Trigger Event:
release
-
Statement type:
File details
Details for the file gravitasml-0.1.4-py3-none-any.whl.
File metadata
- Download URL: gravitasml-0.1.4-py3-none-any.whl
- Upload date:
- Size: 8.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
671a18b11d3d8a0e270c6a80c72cd058458b18d5ef7560d00010e962ab1bca74
|
|
| MD5 |
4b2a81b7501dbfca7be3b5bdcec998da
|
|
| BLAKE2b-256 |
e1d64fdcac30962e243b7ec5793661ac589b95ca0295808b4a6e89a3aca99b1e
|
Provenance
The following attestation bundles were made for gravitasml-0.1.4-py3-none-any.whl:
Publisher:
cicd.yml on Significant-Gravitas/gravitasml
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gravitasml-0.1.4-py3-none-any.whl -
Subject digest:
671a18b11d3d8a0e270c6a80c72cd058458b18d5ef7560d00010e962ab1bca74 - Sigstore transparency entry: 779878017
- Sigstore integration time:
-
Permalink:
Significant-Gravitas/gravitasml@9eaa339a2c65e9df32415169fb086d6c0145c5be -
Branch / Tag:
refs/tags/v0.1.4 - Owner: https://github.com/Significant-Gravitas
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
cicd.yml@9eaa339a2c65e9df32415169fb086d6c0145c5be -
Trigger Event:
release
-
Statement type: