Parser for XML generated by Axiell EMu
Project description
Emu XML Parser
- Purpose: Parse XML files produced by Axiell EMu into Python-native records (lists of dicts) using schema information embedded in the XML processing instruction. The parser preserves nested tables/tuples and fills missing fields with sensible defaults.
Quick Install
- Using pip (recommended for end users):
# create and activate a venv (recommended)
python3 -m venv .venv
source .venv/bin/activate
# install the package in editable mode for development
pip install -e .
# install test runner
pip install pytest
- Other managers:
condaorpoetryalso work — create/activate an env then install withpip install -e ..
Basic Usage (Python)
- Import and parse an EMu XML file. The public function is
parseexposed at the package root.
from emu_xml_parser import parse
rows = parse("/path/to/emu_export.xml")
- If you want date fields parsed into Python
dateobjects:
rows = parse("/path/to/emu_export.xml", parse_dates=True)
Single-Column Tables
EMu tables defined with only one field are automatically flattened to lists of strings instead of lists of dicts. This makes the data easier to work with.
Example Schema:
<?schema
table ecatalogue
table common_name
text short ComName
end
table element
text long IPAnatomy
end
end
?>
XML Data:
<tuple>
<table name="common_name">
<tuple>
<atom name="ComName">Indian Bush Lark</atom>
</tuple>
<tuple>
<atom name="ComName">Rufous-tailed Lark</atom>
</tuple>
</table>
<table name="element">
<tuple>
<atom name="IPAnatomy">shell(s)</atom>
</tuple>
</table>
</tuple>
Python Output:
{
"common_name": ["Indian Bush Lark", "Rufous-tailed Lark"], # List of strings
"element": ["shell(s)"] # Not [{"IPAnatomy": "shell(s)"}]
}
Contrast with Multi-Column Tables:
Multi-field tables remain as lists of dicts:
<?schema
table ecatalogue
table SitSiteRef_tab
text long locality
integer locality_irn
end
end
?>
Python Output:
{
"SitSiteRef_tab": [
{"locality": "San Pedro", "locality_irn": 368989},
{"locality": "Los Angeles", "locality_irn": 363879}
]
}
Why This Matters:
- Simplicity: Access values with
row["common_name"][0]instead ofrow["common_name"][0]["ComName"] - Common pattern: Many EMu exports have single-field reference tables (taxonomy names, elements, etc.)
- Backwards compatible: Multi-field tables work as expected
Minimal XML Example
Input (EMu XML contains a <?schema ... ?> processing instruction):
<?xml version="1.0"?>
<?schema
table ecatalogue
date date_emu_record_modified
date date_emu_record_inserted
integer irn
text short emu_guid
text short department
text short catalogue_number
table SitSiteRef_tab
text long locality
integer locality_irn
end
tuple SpeTaxonRef
text short taxon_irn
table common_name
text short ComName
end
end
end
?>
<root>
<tuple>
<atom name="date_emu_record_modified">2023-05-18</atom>
<atom name="date_emu_record_inserted">2012-10-30</atom>
<atom name="irn">368521</atom>
<atom name="emu_guid">8767ccff-...</atom>
<atom name="department">Ornithology</atom>
<atom name="catalogue_number">89334</atom>
<tuple name="SpeTaxonRef">
<atom name="taxon_irn">24960</atom>
</tuple>
<table name="common_name">
<tuple>
<atom name="ComName">Indian Bush Lark</atom>
</tuple>
</table>
</tuple>
</root>
Expected Python output (approx):
[
{
"date_emu_record_modified": "2023-05-18",
"date_emu_record_inserted": "2012-10-30",
"irn": 368521,
"emu_guid": "8767ccff-...",
"department": "Ornithology",
"catalogue_number": "89334",
"SpeTaxonRef": [{"taxon_irn": 24960}],
"SitSiteRef_tab": [
{
"locality": None,
"locality_irn": None
}
],
"common_name": ["Indian Bush Lark"]
}
]
Notes:
- Atom fields become strings by default. When
parse_dates=True, date-like fields are converted to Pythondateobjects. - Multi-field tables (tables/tuples with multiple field definitions) are represented as lists of dicts. Single-field tables become lists of strings.
- Missing fields are filled with empty strings or empty lists per the schema.
Testing
- Run the test suite (after installing dev/test deps):
pytest -q
If you used a virtual environment, ensure it's activated before running pytest.
Working with Real / Large Fixtures
- Keep small, anonymized fixtures under
tests/fixturesand reference them in tests. - For large or private datasets, do not commit originals; point tests to a folder via
TEST_EMU_XML_DIRand skip if unset.
Extending / Customizing
- Conversion helpers live in
emu_xml_parser.converter(e.g. date parsing/serialization) and validation/enforcement lives inemu_xml_parser.validator. - If you need different conversion rules, you can adapt
convert_valueor wrap the parser in a small class that injects custom converters.
Files of Interest
src/emu_xml_parser/core.py: entry pointparse()for the packagesrc/emu_xml_parser/extractor.py: reads the<?schema ... ?>processing instructionsrc/emu_xml_parser/schema.py: schema text → structured schemasrc/emu_xml_parser/tuple_parser.py: recursive XML → dict conversionsrc/emu_xml_parser/converter.py: value conversion utilitiessrc/emu_xml_parser/validator.py: schema enforcement and normalization
License & Contributing
- Add your preferred license and contribution guidelines to the repository root.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file emu_xml_parser-0.1.1.tar.gz.
File metadata
- Download URL: emu_xml_parser-0.1.1.tar.gz
- Upload date:
- Size: 6.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.14.3 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
630fbff151aa0d315bfdd13dd11dd075c7a712246379d2f139c6638a5ca75e9d
|
|
| MD5 |
5d174ab139854523d17fbbacc59c4a5a
|
|
| BLAKE2b-256 |
84ac82e1c25067be65df8997e7b7709aa97a4b6e2208db53b051f3fddb228425
|
File details
Details for the file emu_xml_parser-0.1.1-py3-none-any.whl.
File metadata
- Download URL: emu_xml_parser-0.1.1-py3-none-any.whl
- Upload date:
- Size: 9.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/2.3.2 CPython/3.14.3 Darwin/25.2.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e4184f1d0d140debb986ab211a72580fde431c5bdde0d42bf93eaf7e85339992
|
|
| MD5 |
bacd3edc9763449db83d6531c06c71cb
|
|
| BLAKE2b-256 |
aec605bb885a7a652de89bb8dcfa4d01c1cf71d2aea167d72781363d4bcff466
|