Python client for the OpenAleph data API
Project description
OpenAleph
Python client for the OpenAleph data API.
Installation
pip install openaleph-client
Command-Line Interface
All commands share the same global options:
openaleph --host URL --api-key KEY [--retries N] <command> [options]
--hostOpenAleph API host URL (default fromOPAL_HOSTenv var)--api-keyAPI key for authentication (default fromOPAL_API_KEYenv var)--retriesNumber of retry attempts on server failure (default: 5)--versionShow the current version and exit
crawldir
Recursively upload the contents of a folder to a collection, with optional pause/resume:
openaleph crawldir -f <foreign-id> [--resume] [--state-file PATH] [--parallel N] [--noindex] [--casefile] [-l LANG] <path>
-f, --foreign-idForeign-ID of the target collection (required)--resumeResume from an existing state database; omit to start fresh (this will delete the state file!)--state-file PATHPath to state file (for resuming from custom locations)-p, --parallel NNumber of parallel upload threads (default: 1)-i, --noindexSkip indexing on ingest--casefileTreat files as case files-l, --language LANGLanguage hints (ISO 639; repeatable)
fetchdir
Download all entities in a collection (or a single entity) into a folder tree:
openaleph fetchdir -f <foreign-id> [-e <entity-id>] [-p <path>] [--overwrite]
Other commands
reingestRe-ingest all documents in a collectionreindexRe-index all entities in a collectiondeleteDelete a collection and its contentsflushDelete all contents of a collectionwrite-entityIndex a single entity from stdinwrite-entitiesBulk-index entities from stdinstream-entitiesStream entities to stdoutentitysetsList entity setsentitysetitemsList items in an entity setmake-listCreate a new list entity set
State Persistence
When running crawldir, OpenAleph maintains a small SQLite database file to track upload progress:
Default Behavior (Writable Directories)
For directories where you have write permissions, the state file is created in your crawl root:
<crawl-root>/.openaleph_crawl_state.db
Read-Only Directory Support
When crawling read-only directories (e.g., mounted filesystems, archived data), OpenAleph automatically detects the lack of write permissions and creates the state file in your system's temporary directory with a unique name:
/tmp/openaleph_crawl_state_<hash>.db
The hash is based on the target directory path, ensuring multiple crawls of different read-only directories don't conflict.
Key Features
- Purpose: track which files have already been successfully uploaded.
- Resume support:
- Passing
--resumeskips any files recorded in this DB. - Omitting
--resumedeletes any existing state DB and starts fresh.
- Passing
- Custom state files: Use
--state-file PATHto specify a custom location for the state database. - Thread-safe: uploads are recorded under a lock to support parallel threads.
- Update datasets later: The db file persists, allowing you to update your local repository at any time and only sync new files to OpenAleph.
- Clear logging: OpenAleph logs the exact state file location and provides resume commands for easy reference.
Usage Examples
Standard crawl (writable directory):
openaleph crawldir -f my_collection /path/to/data
# State file: /path/to/data/.openaleph_crawl_state.db
# Resume later:
openaleph crawldir --resume -f my_collection /path/to/data
Read-only directory crawl:
openaleph crawldir -f my_collection /readonly/mount/data
# Output: Using state file: /tmp/openaleph_crawl_state_a1b2c3d4.db
# Output: To resume this crawl, use: --resume --state-file /tmp/openaleph_crawl_state_a1b2c3d4.db
# Resume later:
openaleph crawldir --resume --state-file /tmp/openaleph_crawl_state_a1b2c3d4.db -f my_collection /readonly/mount/data
Custom state file location:
openaleph crawldir --state-file ~/my_crawl_state.db -f my_collection /any/path
Ignore File
You can create a file named:
<crawl-root>/.openalephignore
and list glob patterns for any files or directories you want to skip entirely:
# Skip hidden files
.*
# Common junk
.DS_Store
Thumbs.db
# Temporary directories
tmp/
build/
# Log files
*.log
- Patterns are matched against the relative path of each file or folder.
- A pattern ending in
/only matches directories (and their contents). - Blank lines and lines beginning with
#are ignored. - Anything matched here is never enqueued or uploaded.
- the
.openalephignorefile itself is ignored by default, and so is the state file
Final Report
After a crawl completes, OpenAleph will print a summary to the console including:
- Number of files successfully uploaded
- Number of failed uploads
- State file location for future resume operations
Failed Files Log
If any failures occurred, a file is written containing the relative paths of files that could not be uploaded:
For writable directories:
<crawl-root>/.openaleph-failed.txt
For read-only directories:
/tmp/openaleph_failed_<hash>.txt
The failed files list contains one relative path per line for each file that could not be uploaded permanently. You can inspect this file to retry or investigate failures.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file openaleph_client-1.1.2.tar.gz.
File metadata
- Download URL: openaleph_client-1.1.2.tar.gz
- Upload date:
- Size: 19.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1ccbb95176bc620b069e1ccafddb734aa26d768d69bc3c50b5d3e6b5a94f89f
|
|
| MD5 |
0417ad67455f71955974faf08fdd6a63
|
|
| BLAKE2b-256 |
01d743114414b367a774c342e9b147a40269a813b52c902c880e2e7b9c2c39d5
|
File details
Details for the file openaleph_client-1.1.2-py3-none-any.whl.
File metadata
- Download URL: openaleph_client-1.1.2-py3-none-any.whl
- Upload date:
- Size: 19.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
aa8b736bf59656daa2dfe7991b6388b0123e352d240daa185a01bc210573d8f6
|
|
| MD5 |
0ff8b112e9ca980b069122386a4eca02
|
|
| BLAKE2b-256 |
a2bb34b391b4815068d6c724275e913151d4ce625cb079a2d6f4f354f02c01c8
|