A command-line tool for fetching IIIF Collections on the Web.
Project description
loam-iiif
A command-line tool for traversing IIIF collections and extracting manifest URLs. This tool helps you explore and collect IIIF manifest URLs from collections, with support for nested collections and paginated results.
Features
- Recursively Traverses IIIF Collections: Finds all manifest URLs within a collection, including those in nested collections.
- Supports Multiple IIIF Presentation API Versions: Compatible with both IIIF Presentation API 2.0 and 3.0.
- Multiple Output Formats: Choose between
json,jsonl(JSON Lines), and formatted tables. - Download Full Manifest JSONs: Save the complete JSON content of each manifest, named by their IDs.
- Save Results to File or Display in Terminal: Flexible output options to suit your workflow.
- Debug Mode for Detailed Logging: Provides comprehensive logs for troubleshooting and monitoring.
- Robust Error Handling with Automatic Retries: Ensures reliable data fetching even in the face of transient network issues.
- Support for Paginated Collections: Handles collections that span multiple pages seamlessly.
Installation
Requires Python 3.10 or higher.
pip install loam-iiif
Usage
The basic command structure is:
loamiiif [OPTIONS] URL
Options
-o, --output PATH: If used with--download-manifests, specifies directory to save manifest JSON files. Otherwise saves manifest URLs list to a file (JSON or plain text format)-f, --format [json|jsonl|table]: Output format (default: json)-d, --download-manifests: Download full JSON contents of each manifest--cache-dir, -c PATH: Directory to cache manifest JSON files (defaults to system temp directory)--skip-cache: Skip reading from cache but still write to it--no-cache: Disable manifest caching completely--debug: Enable debug mode with detailed logs--help: Show help message--max-manifests, -m INTEGER: Maximum number of manifests to retrieve
Examples
Basic Usage
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections/c69bb1ed-accb-4cfb-b60e-495b9911690f?as=iiif"
Output Options
Output as a formatted table:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format table
Save manifest URLs to different formats:
# JSON output
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --output manifests.json
# JSON Lines (jsonl) output
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format jsonl --output manifests.jsonl
Download manifest contents to a directory:
# Downloads full manifest JSON files to ./manifest_downloads/ directory
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --download-manifests --output ./manifest_downloads
Advanced Features
Download manifests and save JSON output:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --format json --output manifests.json --download-manifests
Limit the number of manifests:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --max-manifests=42
Enable debug logging:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --debug
Cache Control
Use a custom cache directory:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --cache-dir ./my-cache-dir
Skip reading from cache but still write to it:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --skip-cache
Disable caching completely:
loamiiif "https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif" --no-cache
Example debug output (truncated):
[2025-01-17 14:14:48] DEBUG Starting traversal of IIIF collection: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
INFO Processing collection: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Fetching URL: https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Successfully fetched data from https://api.dc.library.northwestern.edu/api/v2/collections?as=iiif
DEBUG Found nested collection: https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif
INFO Processing collection: https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif
DEBUG Added manifest: https://api.dc.library.northwestern.edu/api/v2/works/e40479c4-06cb-48be-9d6b-adf47f238852?as=iiif
DEBUG Added manifest: https://api.dc.library.northwestern.edu/api/v2/works/f4720687-61b6-4dcd-aed0-b70eff985583?as=iiif
# ... more manifests and collections ...
Caching Behavior
The tool implements manifest caching to improve performance and reduce load on IIIF servers:
- By default, manifests are cached in your system's temporary directory (
/tmpon Unix-like systems) - Use
--cache-dirto specify a custom cache location --skip-cachewill ignore existing cache but still write new cache entries (useful for refreshing stale data)--no-cachecompletely disables caching (not recommended for large collections)
Cached manifests are stored as JSON files named using a sanitized version of their URLs. The cache is particularly useful when:
- Working with large collections that you'll need to process multiple times
- Using the
--download-manifestsoption to save full manifest contents - Running the tool repeatedly during development or testing
Output Formats
JSON
The JSON output includes both manifests and collections:
{
"manifests": [
"https://api.dc.library.northwestern.edu/api/v2/works/9d87853e-3955-4912-906f-6ddf0e2e3825?as=iiif",
"..."
],
"collections": []
}
JSON Lines (jsonl)
Each line contains a single manifest or collection URL:
{"manifest": "https://api.dc.library.northwestern.edu/api/v2/works/9d87853e-3955-4912-906f-6ddf0e2e3825?as=iiif"}
{"manifest": "..."}
{"collection": "https://api.dc.library.northwestern.edu/api/v2/collections/ba35820a-525a-4cfa-8f23-4891c9f798c4?as=iiif"}
Table
The table format provides a readable view of manifests and collections with indexed entries.
Development
Requirements
- Python 3.10+
- click>=8.1.8
- requests>=2.32.3
- rich>=13.9.4
Development Installation
- Clone the repository:
git clone https://github.com/nulib-labs/loam-iiif.git
cd loam-iiif
- Create and activate a virtual environment with
uv:
uv venv --python 3.10
source .venv/bin/activate # On Windows use: .venv\Scripts\activate
- Install dependencies:
uv sync
License
This project is licensed under the MIT License - see the LICENSE file for details.
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
Project Links
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file loam_iiif-0.1.3.tar.gz.
File metadata
- Download URL: loam_iiif-0.1.3.tar.gz
- Upload date:
- Size: 28.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
cdd158c268ea16bf2f53f48bc3927fb6e854908c6294cf9a0b70d707099ea459
|
|
| MD5 |
4c025e4b2729d84cb8580e967a07b4c9
|
|
| BLAKE2b-256 |
a2912504d7a4fd714bddb4bfad0e20ff93b03a41b0476c5b9775017651f3ade3
|
File details
Details for the file loam_iiif-0.1.3-py3-none-any.whl.
File metadata
- Download URL: loam_iiif-0.1.3-py3-none-any.whl
- Upload date:
- Size: 11.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.0.1 CPython/3.10.16
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
434c018bfe31ed6addd1c2dacb20c8b7743e2bd2f2a0810d6863eca5614f3118
|
|
| MD5 |
1c5ae508ed049cdc8d5a227eec89df05
|
|
| BLAKE2b-256 |
8f6db984bea66fede6eb4d9b7fcdda60a8c6db90dcea8981abf4b292527ba179
|