Skip to main content

Python client for the OpenAleph data API

Project description

OpenAleph

Python client for the OpenAleph data API.

Installation

pip install openaleph

Command-Line Interface

All commands share the same global options:

openaleph --host URL --api-key KEY [--retries N] <command> [options]
  • --host OpenAleph API host URL (default from OPAL_HOST env var)
  • --api-key API key for authentication (default from OPAL_API_KEY env var)
  • --retries Number of retry attempts on server failure (default: 5)

crawldir

Recursively upload the contents of a folder to a collection, with optional pause/resume:

openaleph crawldir -f <foreign-id> [--resume] [--parallel N] [--noindex] [--casefile] [-l LANG] <path>
  • -f, --foreign-id Foreign-ID of the target collection (required)
  • --resume Resume from an existing state database; omit to start fresh (this will delete the state file!)
  • -p, --parallel N Number of parallel upload threads (default: 1)
  • -i, --noindex Skip indexing on ingest
  • --casefile Treat files as case files
  • -l, --language LANG Language hints (ISO 639; repeatable)

fetchdir

Download all entities in a collection (or a single entity) into a folder tree:

openaleph fetchdir -f <foreign-id> [-e <entity-id>] [-p <path>] [--overwrite]

Other commands

  • reingest Re-ingest all documents in a collection
  • reindex Re-index all entities in a collection
  • delete Delete a collection and its contents
  • flush Delete all contents of a collection
  • write-entity Index a single entity from stdin
  • write-entities Bulk-index entities from stdin
  • stream-entities Stream entities to stdout
  • entitysets List entity sets
  • entitysetitems List items in an entity set
  • make-list Create a new list entity set

State Persistence

When running crawldir, OpenAleph maintains a small SQLite database file in your crawl root:

<crawl-root>/.openaleph_crawl_state.db
  • Purpose: track which files have already been successfully uploaded.
  • Resume support:
    • Passing --resume skips any files recorded in this DB.
    • Omitting --resume deletes any existing state DB and starts fresh.
  • Thread-safe: uploads are recorded under a lock to support parallel threads.
  • Update datasets later: The db file stays in the directory, allowing you to update your local repository at any time and only sync the new files to OpenAleph.

Ignore File

You can create a file named:

<crawl-root>/.openalephignore

and list glob patterns for any files or directories you want to skip entirely:

# Skip hidden files
.*

# Common junk
.DS_Store
Thumbs.db

# Temporary directories
tmp/
build/

# Log files
*.log
  • Patterns are matched against the relative path of each file or folder.
  • A pattern ending in / only matches directories (and their contents).
  • Blank lines and lines beginning with # are ignored.
  • Anything matched here is never enqueued or uploaded.
  • the .openalephignore file itself is ignored by default, and so is the state file

Final Report

After a crawl completes, OpenAleph will print a summary to the console. If any failures occurred, by default a file is written to:

<crawl-root>/.openaleph-failed.txt

It contains one relative path per line for each file that could not be uploaded permanently. You can inspect this file to retry or investigate failures.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

openaleph-1.0.1.tar.gz (17.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

openaleph-1.0.1-py3-none-any.whl (17.5 kB view details)

Uploaded Python 3

File details

Details for the file openaleph-1.0.1.tar.gz.

File metadata

  • Download URL: openaleph-1.0.1.tar.gz
  • Upload date:
  • Size: 17.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for openaleph-1.0.1.tar.gz
Algorithm Hash digest
SHA256 33e1ce96ed9b621262942c20f7b7c2180dbf89b39efb64c59b6c47e5989a4268
MD5 07ba2e70c6a40b22b5a04df1ad005d42
BLAKE2b-256 c7f55ad5d00086464f72a33d1e342ececbf31bd47c41e1408918b022bf1c13be

See more details on using hashes here.

File details

Details for the file openaleph-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: openaleph-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 17.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.3

File hashes

Hashes for openaleph-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7fcb6c87bf4621d8ae5f8736573c68d4672c79e6b7a00333b7126262a57676eb
MD5 0d5889ce0caed7e2ff01b1bdc819cbcd
BLAKE2b-256 a34988a838cf9b1cf9868f9f2ef05dcc1b0b14a8b277c35b2f93211f1819137a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page