Skip to main content

General-purpose web data processing library with IRI handling and MicroXML/XML processing

Project description

Amara

General-purpose web data processing library with IRI handling and MicroXML/XML processing

Features

  • IRI (Internationalized Resource Identifier) processing - Complete implementation for handling IRIs, including percent encoding/decoding, joining, splitting, and validation
  • MicroXML/XML parsing and processing - Simplified XML data model based on MicroXML, with support for full XML 1.0
  • HTML5 parsing - Parse HTML5 documents with modern html5lib-modern
  • XPath-like queries - MicroXPath support for querying XML documents
  • Command-line tool - microx for rapid XML/MicroXML processing and extraction

Installation

Requires Python 3.12 or later.

pip install amara

Or with uv (recommended):

uv pip install amara

Note: This package is currently in development. For the latest features and bug fixes, you can install directly from source:

git clone https://github.com/OoriData/Amara.git
cd Amara
pip install -U .

Quick Start

IRI Processing

from amara.iri import I, iri

# Create and manipulate IRIs
url = I('http://example.org/path/to/resource')
print(url.scheme)  # 'http'
print(url.host)    # 'example.org'

# Join relative paths with base URLs
joined = iri.join('http://example.org/a/b', '../c')
print(joined)  # 'http://example.org/a/c'

# Percent encoding/decoding
encoded = iri.percent_encode('hello world!')
print(encoded)  # 'hello%20world%21'

XML Processing

from amara.uxml import xml

SAMPLE_XML = """<monty>
  <python spam="eggs">What do you mean "bleh"</python>
  <python ministry="abuse">But I was looking for argument</python>
</monty>"""

# Parse XML
builder = xml.treebuilder()
root = builder.parse(SAMPLE_XML)
print(root.xml_name)  # "monty"

# Access children and attributes
for child in root.xml_children:
    if hasattr(child, 'xml_attributes'):
        print(f"Element: {child.xml_name}")
        print(f"Spam attr: {child.xml_attributes.get('spam')}")
        print(f"Text: {child.xml_text}")

# Iterate through all elements
for elem in root.xml_iter():
    print(f"Found element: {elem.xml_name}")

HTML5 Processing

from amara.uxml import html5

HTML_DOC = """<!DOCTYPE html>
<html>
  <head><title>Example</title></head>
  <body><p class="plain">Hello World</p></body>
</html>"""

doc = html5.parse(HTML_DOC)
print(doc.xml_name)  # "html"

XPath-like Queries

from amara.uxpath import xpath

SAMPLE_XML = """<catalog>
  <book id="1">
    <title>Python Programming</title>
    <author>John Doe</author>
  </book>
  <book id="2">
    <title>Web Development</title>
    <author>Jane Smith</author>
  </book>
</catalog>"""

builder = xml.treebuilder()
root = builder.parse(SAMPLE_XML)

# Find all book titles
titles = xpath.select(root, '//book/title')
for title in titles:
    print(title.xml_text)

# Find book by ID
book = xpath.select(root, "//book[@id='2']")
if book:
    print(f"Found: {book[0].xml_children[0].xml_text}")

Command-Line Tool

The microx command provides powerful XML/MicroXML querying and processing:

# Extract elements by name
microx file.xml --match=item

# XPath-like expressions
microx file.xml --expr="//item[@id='2']"

# Extract text content from specific elements
microx file.xml --match=name --foreach="text()"

# Process multiple files
microx *.xml --match=title --foreach="text()"

# Pretty-print XML
microx file.xml --pretty

# Convert to MicroXML
microx file.xml --microxml

For more options, run:

microx --help

Requirements

Development

This project is actively developed by Oori Data. For development setup:

git clone https://github.com/OoriData/Amara.git
cd Amara
pip install -U .

History

Amara was originally an open source project I created, renaming and expanding on Anobind 2003, looking to simplify and rethink XML and related technology processing. It went through a few evolutions and progress had slowed down since the late 2010s.

Quote from the revival ticket:

The Amara saga continues! I don't exactly remember why I decided to dead end the Amara PyPI project when it hit 2.0, but I moved to a series of Amara 3 generation projects (amara3.iri, amara3.xml & amara3-names). Those were far more lone wolf efforts, but at Oori Data we're seeing a lot of need for the sorts of capability that's inchoate in Amara 3.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

amara-4.0.1.tar.gz (89.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

amara-4.0.1-py3-none-any.whl (3.5 kB view details)

Uploaded Python 3

File details

Details for the file amara-4.0.1.tar.gz.

File metadata

  • Download URL: amara-4.0.1.tar.gz
  • Upload date:
  • Size: 89.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for amara-4.0.1.tar.gz
Algorithm Hash digest
SHA256 7f899a6077cd8d5b6a1dd160660431f8c736edec24267c2879bd8074da46d775
MD5 b11b419b14606ed8a73018b4005a6de7
BLAKE2b-256 1757102ef32065fcf0f87b0304bea650115846230d3d9e5f5a3f607f8133d47d

See more details on using hashes here.

Provenance

The following attestation bundles were made for amara-4.0.1.tar.gz:

Publisher: publish.yml on OoriData/Amara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file amara-4.0.1-py3-none-any.whl.

File metadata

  • Download URL: amara-4.0.1-py3-none-any.whl
  • Upload date:
  • Size: 3.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for amara-4.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ebfdd891b1771e27ef969746c8f019f107bc4b2d476ba7c211dc6127642c5e95
MD5 7b1f853c54db4d0d0e6df1201101cbeb
BLAKE2b-256 fef8591b998fbb3323a455f258113883239d33b6793ccff492d4415db0ea5b2c

See more details on using hashes here.

Provenance

The following attestation bundles were made for amara-4.0.1-py3-none-any.whl:

Publisher: publish.yml on OoriData/Amara

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page