General-purpose web data processing library with IRI handling and MicroXML/XML processing

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

uche

These details have not been verified by PyPI

Project description

Amara is a general-purpose web data processing library with IRI handling and MicroXML/XML processing

Features

IRI (Internationalized Resource Identifier) processing - Complete implementation for handling IRIs, including percent encoding/decoding, joining, splitting, and validation
MicroXML/XML parsing and processing - Simplified XML data model based on MicroXML, with support for full XML 1.0
HTML5 parsing - Parse HTML5 documents with modern html5lib-modern
XPath-like queries - MicroXPath support for querying XML documents
Command-line tool - microx for rapid XML/MicroXML processing and extraction

Installation

Requires Python 3.12 or later.

pip install amara

Or with uv (recommended):

uv pip install amara

You can also install directly from the latest source version:

git clone https://github.com/OoriData/Amara.git
cd Amara
pip install -U .

Quick Start

IRI Processing

from amara.iri import I, iri

# Create and manipulate IRIs
url = I('http://example.org/path/to/resource')
print(url.scheme)  # 'http'
print(url.host)    # 'example.org'

# Join relative paths with base URLs
joined = iri.join('http://example.org/a/b', '../c')
print(joined)  # 'http://example.org/a/c'

# Percent encoding/decoding
encoded = iri.percent_encode('hello world!')
print(encoded)  # 'hello%20world%21'

XML Processing

from amara.uxml import parse

SAMPLE_XML = '''<monty>
  <python spam="eggs">What do you mean "bleh"</python>
  <python ministry="abuse">But I was looking for argument</python>
</monty>'''

# Parse XML
root = parse(SAMPLE_XML)
print(root.xml_name)  # "monty"

# Access children and attributes
for child in root.xml_children:
    if hasattr(child, 'xml_attributes'):
        print(f'Element: {child.xml_name}')
        print(f'Spam attr: {child.xml_attributes.get('spam')}')
        print(f'Text: {child.xml_value}')

# Iterate through all elements
for elem in root.xml_descendants():
    print(f'Found element: {elem.xml_name}')

"MicroXML?" What's that?

MicroXML is a W3C Community Project and spec. A lot of XML veterans, including Uche, Amara's founder, had become fed up with the levels of unnecessary complexity in the XML stack, including XML Namespaces, which charges a huge technical cost in order to solve an overstated problem. Amara implements the MicroXML data model, and allows you to parse into this from tradiional XML and the MicroXML serialization.

In reality, most of the XML-like data you’ll be dealing with is full XML 1.0, so Amara package provides capabilities to parse legacy XML and reduce it to MicroXML. In many cases the biggest implication of this is that namespace information is stripped. You can get very far by just ignoring this, and it opens up the much simpler processing encouraged by MicroXML.

HTML5 Processing

from amara.uxml import html5

HTML_DOC = '''<!DOCTYPE html>
<html>
  <head><title>Example</title></head>
  <body><p class="plain">Hello World</p></body>
</html>'''

doc = html5.parse(HTML_DOC)
print(doc.xml_name)  # "html"

XPath-like Queries (MicroXPath)

from amara.uxml import parse

SAMPLE_XML = '''<catalog>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
  </book>
  <book id="2">
    <title>Web Development</title>
    <author>Jane Smith</author>
  </book>
</catalog>'''

root = parse(SAMPLE_XML)

# Find all book titles
titles = root.xml_xpath('//book/title')
for title in titles:
    print(title.xml_value)

# Find book by ID
book = list(root.xml_xpath("//book[@id='2']"))
if book:
    # First child is whitespace. 2nd is the "title" element
    print(f'Found: {book[0].xml_children[1].xml_value}')

Command-Line Tool

The microx command provides powerful XML/MicroXML querying and processing:

# Extract elements by name
microx file.xml --match=item

# XPath-like expressions
microx file.xml --expr="//item[@id='2']"

# Extract text content from specific elements
microx file.xml --match=name --foreach="text()"

# Process multiple files
microx *.xml --match=title --foreach="text()"

# Pretty-print XML
microx file.xml --pretty

# Convert to MicroXML
microx file.xml --microxml

For more options, run:

microx --help

Requirements

Python 3.12+
Dependencies: ply, html5lib-modern, nameparser

Development

Amara is primarily developed by the crew at Oori Data. We offer LLMOps, data pipelines and software engineering services around AI/LLM applications.

History

Amara was originally an open source project I created, renaming and expanding on Anobind 2003, looking to simplify and rethink XML and related technology processing, with an eye to Python. It went through a few evolutions and progress had slowed down since the late 2010s.

Quote from the revival ticket:

The Amara saga continues! I don't exactly remember why I decided to dead end the Amara PyPI project when it hit 2.0, but I moved to a series of Amara 3 generation projects (amara3.iri, amara3.xml & amara3-names). Those were far more lone wolf efforts, but at Oori Data we're seeing a lot of need for the sorts of capability that's inchoate in Amara 3.

Project details

These details have been verified by PyPI

Project links

GitHub Statistics

Maintainers

uche

These details have not been verified by PyPI

Release history Release notifications | RSS feed

This version

4.1.0

Jul 1, 2026

4.0.2

Nov 14, 2025

4.0.1

Nov 12, 2025

2.0.0

Mar 23, 2014

2.0.0a6 pre-release

Aug 15, 2011

2.0a4 pre-release

Mar 5, 2010

2.0a1 pre-release

Feb 25, 2009

1.2.0.2

Jun 23, 2007

1.2.0.1

Jan 5, 2007

1.2

Dec 31, 2006

1.2rc1 pre-release

Dec 26, 2006

1.2b1 pre-release

Dec 2, 2006

1.2a2 pre-release

Oct 30, 2006

1.1.9

Sep 15, 2006

1.1.8b2 pre-release

Jun 13, 2006

1.1.8b1 pre-release

Jun 13, 2006

1.1.6

Oct 28, 2005

1.1.5

Oct 28, 2005

1.0

Aug 9, 2005

1.0b3 pre-release

Jun 14, 2005

1.0b2 pre-release

Apr 20, 2005

1.0b1 pre-release

Mar 26, 2005

0.9.4

Feb 2, 2005

0.9.3

Jan 22, 2005

0.9.2

Jan 12, 2005

0.9.1

Dec 30, 2004

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amara-4.1.0.tar.gz (96.6 kB view details)

Uploaded Jul 1, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

amara-4.1.0-py3-none-any.whl (88.2 kB view details)

Uploaded Jul 1, 2026 Python 3

File details

Details for the file amara-4.1.0.tar.gz.

File metadata

Download URL: amara-4.1.0.tar.gz
Upload date: Jul 1, 2026
Size: 96.6 kB
Tags: Source
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amara-4.1.0.tar.gz
Algorithm	Hash digest
SHA256	`75d176d09090f03de0b48c43808321ce44896f8fb267350fd62714dd2c661b78`
MD5	`6b277cb2e7166f5d5a84b5fd9e794784`
BLAKE2b-256	`3a1e9325e619b3087d35765d24edee3bb57e9f42a2f82f18cda19e44338f836c`

See more details on using hashes here.

Provenance

The following attestation bundles were made for amara-4.1.0.tar.gz:

Publisher: publish.yml on OoriData/Amara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: amara-4.1.0.tar.gz
- Subject digest: 75d176d09090f03de0b48c43808321ce44896f8fb267350fd62714dd2c661b78
- Sigstore transparency entry: 2038804571
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: OoriData/Amara@44a3144b69c800e6a870f2634068b9905b3142c7
- Branch / Tag: refs/tags/v4.1.0
- Owner: https://github.com/OoriData
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@44a3144b69c800e6a870f2634068b9905b3142c7
- Trigger Event: release

File details

Details for the file amara-4.1.0-py3-none-any.whl.

File metadata

Download URL: amara-4.1.0-py3-none-any.whl
Upload date: Jul 1, 2026
Size: 88.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? Yes
Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for amara-4.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`10153f68428c581a8e8e394648655da116671ff1efb4ae377400e9cdab4ec0c2`
MD5	`f97150ad9dae74e4566cce11a2fe1dbe`
BLAKE2b-256	`5cfb4a5e2b1017eb1b90442f980ab12abcbebb99814276b56db0ccaaa710c59f`

See more details on using hashes here.

Provenance

The following attestation bundles were made for amara-4.1.0-py3-none-any.whl:

Publisher: publish.yml on OoriData/Amara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Statement:
- Statement type: https://in-toto.io/Statement/v1
- Predicate type: https://docs.pypi.org/attestations/publish/v1
- Subject name: amara-4.1.0-py3-none-any.whl
- Subject digest: 10153f68428c581a8e8e394648655da116671ff1efb4ae377400e9cdab4ec0c2
- Sigstore transparency entry: 2038805177
- Sigstore integration time: Jul 1, 2026
Source repository:
- Permalink: OoriData/Amara@44a3144b69c800e6a870f2634068b9905b3142c7
- Branch / Tag: refs/tags/v4.1.0
- Owner: https://github.com/OoriData
- Access: public
Publication detail:
- Token Issuer: https://token.actions.githubusercontent.com
- Runner Environment: github-hosted
- Publication workflow: publish.yml@44a3144b69c800e6a870f2634068b9905b3142c7
- Trigger Event: release

Amara 4.1.0

Navigation

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Project description

Features

Installation

Quick Start

IRI Processing

XML Processing

"MicroXML?" What's that?

HTML5 Processing

XPath-like Queries (MicroXPath)

Command-Line Tool

Requirements

Development

History

Project details

Verified details

Project links

GitHub Statistics

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

Provenance

File details

File metadata

File hashes

Provenance