Skip to main content

BDRC Utilities

Project description

BDRC-UTIL

Overview

BDRC UTIL is a python package containing modules for use by the Buddhist Digital Resource Center. It is offered to the public under the MIT License. This document describes its contents and features.

At this time, the source repository is not publicly available.

Development

archive-ops uses python packages from archive-ops/venv

# be in project main dir
python -m venv venv
source venv/bin/activate
openpecha-266-fix-install.sh
pip install -r requirements.txt

Deployment

# be in project main dir
python -m setup bdist_wheel
# test
twine upload --verbose  -r testpypi dist/bdrc_util-x.MM.mm-py3-none-any.whl
# prod
twine upload --verbose  dist/bdrc_util-x.MM.mm-py3-none-any.whl

Installation

pyPI.org bdrc-util

Debian requirements

You need this (and its dependencies) for the pip component mysqlclient to install: sudo apt install default-libmysqlclient-dev

MacOS requirements

You need this (and its dependencies) for the pip component mysqlclient to install: brew install mysql

Contents

Publicly available scripts

As defined in setup.py

locators

Maps a work and a destination parent to a specific directory using various BDRC mapping schemes

migrate works

Scripts to migrate and log works into BDRC’s 2021 Archival strategy

log_dip

Log creation and distribution of Distribution Information Packages (DIPs). DIP is an OAIS term to describe a unit of publication.

User Guides

log_dip

The command log_dip is intended for use by BDRC staff to instrument their publication activities. log_dip takes arguments from the shell and transfers them into a database table.

Synopsis

log_dip --help
usage: log_dip | -d DBAppSection:DbAppFile log_dip [OPTIONS] [dip_source_path] [dip_dest_path]

Logs a number of different publication strategies

positional arguments:
  source_path           Source path (optional) - string
  dest_path             Destination path (optional) - string

options:
  -h, --help            show this help message and exit
  -d DRSDBCONFIG, --drsDbConfig DRSDBCONFIG
                        specify section:configFileName
  -l {info,warning,error,debug,critical}, --log-level {info,warning,error,debug,critical}
                        choice values are from python logging module
  -a ACTIVITY_TYPE, --activity_type ACTIVITY_TYPE
                        Activity type
  -w WORK_NAME, --work_name WORK_NAME
                        work being distributed
  -i DIP_ID, --dip_id DIP_ID
                        ID to update
  -r ACTIVITY_RETURN_CODE, --activity_return_code ACTIVITY_RETURN_CODE
                        Integer result of operation.
  -b BEGIN_TIME, --begin_time BEGIN_TIME
                        time of beginning - ')yyyy-mm-dd hh:mm:ss bash format date +'%Y-%m-%d
                        %R:%S'
  -e END_TIME, --end_time END_TIME
                        time of end.Default is invocation time. yyyy-mm-dd hh:mm:ss bash format
                        date + '%Y-%m-%d %R:%S'
  -c COMMENT, --comment COMMENT
                        Any text up to 4GB in length
  -s DIP_SOURCE_PATH, --dip_source_path DIP_SOURCE_PATH
                        Source path (optional) - string
  -t DIP_DEST_PATH, --dip_dest_path DIP_DEST_PATH
                        Destination path (optional) - string
  -L, --resolve-sym-links
                        True to resolve file paths, false to accept input as is
  -n INVENTORY, --inventory INVENTORY
                        path to inventory (only used for ARCHIVE)

Argument structure

log_dip creates a database record that captures the beginning or end of a DIP event.

All its operations return an opaque identifier which can reference the record. In bash, this would be invoked as

You reference the record later by one of two methods:

  • passing in the id from the initial (or subsequent calls):

dip_id=$(dip_log --drsDbConfig sec:some.config --begin_time "2021-05-11 01:23:45" --activity_type DRS --work_name W12345)

dip_log -d sec:some.config --activity_return_code 42 --end_time "2021-05-11 12:34:56" --dip_id $dip_id
  • using the work Id, Activity type and begin time:

dip_log -d sec:some.config -b "2021-05-11 01:23:45" -a DRS -w W12345

dip_log -d sec:some.config -b "2021-05-11 01:23:45" -a DRS -w W12345 -r 42 -e "2021-05-11 12:34:56"

Both of the above examples perform the same function:

  1. log the start of a DRS job for work W12345 at “2021-05-11 01:23:45”

  2. log the end_time of the job at “2021-05-11 12:34:56” , with a return code of 42

Argument hints

  • to give an end time, you must give all the job id information, either in the id, or with the (work_name, begin_time, activity_id) tuple

  • You can add as much information as you want in one call. If you’ve captured the begin time, you can create a call which logs them all at the same time (this is not the best practice, because it eliminates the system’s ability to check for in-progress jobs). This is perfectly legal:

dip_log -d sec:some.config -b "2021-05-11 01:23:45" -a DRS -w W12345  -r 42 -e "2021-05-11 12:34:56 -c "Hi Mom, Im re-writing history"
  • Begin and end dates are fussy: in shell, the format for generating the date dip_log requires is: date +%Y-%m-%d %R:%S (for Mac with GNU core, GNU Linuxes)

  • you can update some DIP log properties:

    • comments

    • end time

    • operation return code

  • Obviously, since these are the tuple which identifies the transaction, you cannot modify:

    • work name

    • begin time

    • activity type

    • dip_external_id (this is a read only argument supplied by the caller of log_dip)

In this example, the comments field is updated.

dip_log_id=$( dip_log -d sec:some.config -b "2021-05-11 01:23:45" -a DRS -w W12345  -r 42 -e "2021-05-11 12:34:56 -c "Experienced some discomfort")
dip_log -d sec:some.config -i $dip_log_id  -c "But it passed.")
  • Any property not given in the command line is preserved. (The example above preserves the begin and end times of the DIP transaction.)

  • the comment field is a free-form text field of up to 4GB in length. You can store XML or JSON data in it for later use. (such as any error messages or summary information about the process or the objects being processed). Update: the deep-archive utility reads the comment field for coded data.

Deep Archive and Inversion

Inversion

In version 1.0.2 of bdrc-util the deep-archive utility was created, to send to Glacier Deep Archive separate image groups. This allowed large works to be sent as separate smaller segments. (It also allowed other material that was not categorized by image group to be sent to Glacier.) The process packages all the media types (sources, archive, images) for an image group into one bagged zip file.

Sync and Deep Archive

archive-ops-1087 - sync by image group specifies enhancements to the sync process to sync fragments of image groups. README.md documents these requirements and provides examples.

API

A simple API, inspired by openpecha.buda.api is provided as a central library for commonly used utilities, including Legacy Hack Image Group Translation

TODO: Document API

To use in your code, pip install bdrc-util>=0.9.44

bdrc-util Changelog

version

commit

Comments

1.0.11

3b9ba53

SqlAlchemy 1.4 support for airflow

1.0.10

7ff6a79

Fix default bucket name

1.0.9

d0d73f51

Fix do_archive_incremental parameter mismatch

1.0.5

df8de377

Release fixes

1.0.5

(many)

Integration fixes

1.0.4

(many)

Support volume-manifest-builder by image group

1.0.3

1dfef221

Silence deep archive empty file error

1.0.2

(many)

Invert works for deep archive

1.0.1

ccd9865

dip_log passes db config to ORM

0.9.48

9573f3c

optional symlink resolution

0.9.47

192c43f4

Add s3pathlib to install requirements

0.9.46

e14b3a6

decomission web in favor of api

0.9.45

89724ee

Raise pageSize for Get volumes

0.9.44

TBD

Move Resolvers to api

0.9.43

013242a

cacheing to reduce load on server

0.9.42

146bc43a

support buda-dld

0.9.41

0d01394

print, dont return from disk_ig_from_buda

0.9.40

Rename get_image_groups

0.9.39

Added measure archive fixity

Shorten log file name

0.9.38

Added RST documentation to setup.

Added minimum requirement for bdrc-db-lib

0.9.34

Use external address for resolver

0.9.32

be754999

Create entry points for image group renaming

0.9.31

192eea17

(not released) single entry point for image group renames

0.9.30

83c5062a

Add Work calculation size to script

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

bdrc_util-1.0.11-py3-none-any.whl (58.1 kB view details)

Uploaded Python 3

File details

Details for the file bdrc_util-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: bdrc_util-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 58.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/8.5.0 pkginfo/1.11.2 requests/2.32.3 requests-toolbelt/1.0.0 tqdm/4.67.0 CPython/3.10.13

File hashes

Hashes for bdrc_util-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 2ea9d2f8314246a990220c0df621a260ddab08c7fd55b134fcdab63d92b454ff
MD5 5c709a2b0add959ba80e6c418b31bbdf
BLAKE2b-256 b1e808f014bd327986a03dc22721eb7beda21b4eed64291ed407d47608101aec

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page