A Python library for working with Encoded Archival Description (EAD) XML files
Project description
EADPy
A Python library for working with Encoded Archival Description (EAD) XML documents.
Features
- Parse and manipulate EAD XML documents
- Convert EAD to various formats (JSON, CSV)
- Tools for batch processing of EAD files
Installation
Install EADPy using pip:
pip install eadpy
Install using uv:
uv tool install eadpy
EADPy requires Python 3.8 or higher.
Command-line Usage
The following command will process an EAD XML file and export it to JSON format:
eadpy file path/to/finding_aid.xml -o output.json
To export to CSV format instead:
eadpy file path/to/finding_aid.xml -o output.csv -f csv
For batch processing of multiple EAD XML files in a directory:
eadpy dir path/to/ead_directory -o path/to/output_directory
To process subdirectories recursively:
eadpy dir path/to/ead_directory -r -o path/to/output_directory
Use the verbose flag for detailed information during processing:
eadpy file path/to/finding_aid.xml -v
Run the following to view all available options:
eadpy --help
Python Usage
EADPy provides multiple ways to create an EAD instance depending on your source data:
import eadpy
# Load an EAD file from a file path
ead = eadpy.from_path("path/to/finding_aid.xml")
# Create an EAD instance from an XML string
xml_string = """<?xml version="1.0" encoding="UTF-8"?>
<ead xmlns="urn:isbn:1-931666-22-9">
<!-- EAD content here -->
</ead>"""
ead = eadpy.from_string(xml_string)
# Create an EAD instance from bytes
from eadpy import from_bytes
with open("path/to/finding_aid.xml", "rb") as f:
xml_bytes = f.read()
ead = from_bytes(xml_bytes)
# Create an EAD instance from a file-like object
from eadpy import from_file
with open("path/to/finding_aid.xml", "r") as f:
ead = from_file(f)
# Also works with StringIO or BytesIO objects
from io import StringIO, BytesIO
from eadpy import from_file, from_string, from_bytes
string_io = StringIO(xml_string)
ead = from_file(string_io)
bytes_io = BytesIO(xml_bytes)
ead = from_file(bytes_io)
Class-Based API Style
from eadpy import EAD
# Load an EAD file from a file path
ead = EAD.from_path("path/to/finding_aid.xml")
# Create an EAD instance from an XML string
ead = EAD.from_string(xml_string)
# Create an EAD instance from bytes
with open("path/to/finding_aid.xml", "rb") as f:
xml_bytes = f.read()
ead = EAD.from_bytes(xml_bytes)
# Create an EAD instance from a file-like object
with open("path/to/finding_aid.xml", "r") as f:
ead = EAD.from_file(f)
Export to JSON chunks
JSON chunks are useful for embedding or display in applications:
# Create chunks and save them to a file
chunks = ead.create_and_save_chunks("output.json")
# Or create chunks without saving
chunks = ead.create_item_chunks()
# Then save them separately if needed
ead.save_chunks_to_json(chunks, "output.json")
Export to CSV
CSV export is useful for tabular analysis:
# Create CSV data and save it to a file
csv_data = ead.create_and_save_csv("output.csv")
# Or create CSV data without saving
csv_data = ead.create_csv_data()
# Then save it separately if needed
ead.save_csv_data(csv_data, "output.csv")
API Reference
Package Level Functions
-
from_path(file_path: str) -> EAD: Creates an EAD instance from a file path. Validates that the file exists, is not a directory, and is readable. -
from_string(xml_string: str, encoding: str = 'utf-8') -> EAD: Creates an EAD instance from an XML string. Handles encoding the string to bytes for proper XML parsing. -
from_bytes(xml_bytes: bytes) -> EAD: Creates an EAD instance from XML bytes. Useful when working with binary data from HTTP responses or other sources. -
from_file(file_like_object) -> EAD: Creates an EAD instance from a file-like object with aread()method. Works with both text-based (StringIO) and binary (BytesIO) file objects.
Class Methods (Object Creation)
-
EAD.from_path(file_path: str) -> EAD: Creates an EAD instance from a file path. Validates that the file exists, is not a directory, and is readable. -
EAD.from_string(xml_string: str, encoding: str = 'utf-8') -> EAD: Creates an EAD instance from an XML string. Handles encoding the string to bytes for proper XML parsing. -
EAD.from_bytes(xml_bytes: bytes) -> EAD: Creates an EAD instance from XML bytes. Useful when working with binary data from HTTP responses or other sources. -
EAD.from_file(file_like_object) -> EAD: Creates an EAD instance from a file-like object with aread()method. Works with both text-based (StringIO) and binary (BytesIO) file objects.
Instance Methods (Data Export)
-
create_item_chunks() -> list: Creates item-focused chunks that include relevant information from their parent hierarchy. Returns a list of dictionaries, each containing a text representation and metadata for each item. -
save_chunks_to_json(chunks: list, output_file: str) -> None: Saves chunks to a JSON file. Takes a list of chunks and an output file path. -
create_and_save_chunks(output_file: str) -> list: Creates item-focused chunks and saves them to a JSON file. Returns the chunks that were created and saved. -
create_csv_data() -> list: Creates a flattened hierarchy representation suitable for CSV export. Returns a list of dictionaries, each representing a row in the CSV. -
save_csv_data(csv_data: list, output_file: str) -> None: Saves CSV data to a file. Takes a list of dictionaries and an output file path. -
create_and_save_csv(output_file: str) -> list: Creates flattened CSV data and saves it to a file. Returns the CSV data that was created and saved.
Command-line Reference
Global options
--version: Show the version number and exit--help: Show help message and exit
File command options
input: Path to the EAD XML file (required)-o, --output: Path to the output file-f, --format: Output format ('json' or 'csv')-v, --verbose: Print detailed information
Directory command options
input_dir: Path to the directory containing EAD XML files (required)-o, --output-dir: Directory for output files-f, --format: Output format ('json' or 'csv', default: 'json')-r, --recursive: Process subdirectories recursively-v, --verbose: Print detailed information
Development
Setting up the development environment
EADPy uses uv for dependency management and virtual environment setup.
- Clone the repository:
git clone https://github.com/nulib-labs/eadpy
cd eadpy
- Create and activate a virtual environment:
uv venv --python 3.13
source .venv/bin/activate # On Unix/macOS
# or
.venv\Scripts\activate # On Windows
- Install development dependencies:
uv pip install -e ".[dev]"
Running tests
pytest
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Acknowledgements
Special thanks to the ArcLight project, which inspired the EAD processing approach taken here. Thank you to the developers and contributors of ArcLight for their work in the archival community!
License
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file eadpy-0.1.3.tar.gz.
File metadata
- Download URL: eadpy-0.1.3.tar.gz
- Upload date:
- Size: 51.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b6f8b8776226adae890cac871377039de1cd00df6873dce40fe6a26680bb5857
|
|
| MD5 |
284b8ba4c133ceea2d4c59633966753c
|
|
| BLAKE2b-256 |
3b63919853ff0d235964ae091070c04bb6d78d7b409a4601c5b74ff31b62ccd6
|
File details
Details for the file eadpy-0.1.3-py3-none-any.whl.
File metadata
- Download URL: eadpy-0.1.3-py3-none-any.whl
- Upload date:
- Size: 14.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
69e9fcd5aefb45ed02154a1376970b5724228cf886190865e06fbcc1a15028e4
|
|
| MD5 |
93400b066125148227740d82cd6ce0d3
|
|
| BLAKE2b-256 |
c4b899a2435e47d1ffeb83d4bba8af4a047bb7d5a8c05d605e5fd06de17b06f5
|