Skip to main content

Utilities for the Laboratory Catalog and Archive Service of the Early Detection Research Network

Project description

LabCAS Utilities

This is a hodge-podge collection of various utilities for managing, reporting on, and maintaining instances of the Laboratory Catalog and Archive System (LabCAS).

📀 Installation

Using Python 3.9 or newer, create a virtual environment and install it:

python3 -m venv venv
venv/bin/pip install jpl.labacs.utils

🔧 The Utilities

The numerous utilities are quickly summarized below:

  • assign-uuids — finds Solr documents without a UUID and assign one
  • backup-labcas — issues backup commands for the Solr cores for LabCAS, "collections", "datasets", and "files"
  • common-prefixes — allegedly finds common prefixes in FileLocation fields in Solr (but doesn't?)
  • date-report — writes a CSV to stdout of protocols with dates and dates of corresponding collections in LabCAS
  • dcm-header-usage — shows usage of DICOM headers in Lung_Team_Project_2 and Prostate_MRI collections based on a Google spreadsheet for input
  • delete-collection — deletes a collection (including all of its datasets and files) from LabCAS Solr
  • delete-datasets — deletes datasets from LabCAS Solr while producing a CSV of the corresponding files that will need to be deleted from disk
  • delete-field — deletes a field from documents in Solr
  • field-usage — tells what fields are in use by collections, datasets, and files in Solr and marks those that appear in collection and dataset .cfg files
  • fix-event-ids — repairs event IDs in Solr after publication of a dataset (or collection) that uses event IDs based on a LabCAS publish alias.json file
  • fix-patient-ids — overwrites PatiendID field in DICOM files with event IDs from Solr
  • fix-principals — sets the OwnerPrincipal fields of the "collections", "datasets", and "files" cores in Solr based on information in metadata .cfg files
  • mangle-headers — mangles DICOM headers according to Radka's specifications
  • mass-spec-fix — fixed misspelled "mass spectrometry" in Solr cores
  • missing-bbd-dcis — finds missing anonymized BBD and DCIS not specified in a given .csv file
  • missing-event-ids — given event IDs, report which are missing in LabCAS Solr
  • populate-bbd-dcis — populate the BBD or DCIS files in Solr with filenames in a BBD or DCIS .csv file
  • replace-field — replace a list field in Solr with a new single value
  • replace-fields — replace a list field in Solr with multiple values
  • report — generate various reports, including events, privacy, event correlation, availability, or patient IDs
  • report-fields — generate a report on requested fields
  • report-file-size — report total size of all files in LabCAS using Solr metadata
  • restructure-bbd-dcis — make symlinks into the Validation and Discovery folders for BBD and DCIS data on disk
  • s3-report — generate a report about files, sub-folders, and average number of files in sub-folders in S3
  • split-brsi — split the contents of a gzip'd tar file into training and validation folders based on a spreadsheet input
  • sub-field — substitute the value of a field in multiple documents

Many of these utilities are one-offs, which is typical for LabCAS.

🔁 Looping

Some of these utilities loop over large collections of data, paginating through results and making updates. You may have to run the utilities multiple times until they report updating no more documents.

🛤️ Solr and Tunneling

Many of these utilities operate on Solr that's assumed to be at https://localhost:8984/solr/ with a self-signed certificate. You can override these with a --solr option.

Feel free to tunnel these connection over ssh to a preferential Solr, or run it directly on a host like edrn-labcas, mcl-labcas, labcas-dev, and so forth.

🖥️ Development

To install from source:

git clone https://github.com/jpl-labcas/jpl.labcas.utils.git
cd jpl.labcas.utils
pre-commit install
python3 -m venv .venv
source .venv/bin/activate  # or activate.csh if you're a csh/tcsh user
pip install --editable .

To release to PyPI:

python3 -m build .
twine upload dist/*

👩‍🎨 Creators

The principal developer is:

To contact the team as a whole, email the Informatics Center.

📃 License

The project is licensed under the Apache version 2 license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jpl_labcas_utils-0.0.2.tar.gz (37.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

jpl_labcas_utils-0.0.2-py3-none-any.whl (61.1 kB view details)

Uploaded Python 3

File details

Details for the file jpl_labcas_utils-0.0.2.tar.gz.

File metadata

  • Download URL: jpl_labcas_utils-0.0.2.tar.gz
  • Upload date:
  • Size: 37.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for jpl_labcas_utils-0.0.2.tar.gz
Algorithm Hash digest
SHA256 df5bb96ed48472d233b5cf9f7a94245cef3f74acbb88dc67236bc15cbbb8d204
MD5 8034f2240b4914a5d4286efa4becfaac
BLAKE2b-256 4e75c6d37ffa4c4b85f51a5b2604775d80a8ba26fd956bc580a99f15a2956759

See more details on using hashes here.

File details

Details for the file jpl_labcas_utils-0.0.2-py3-none-any.whl.

File metadata

File hashes

Hashes for jpl_labcas_utils-0.0.2-py3-none-any.whl
Algorithm Hash digest
SHA256 7cde4df6c081b1e2d13fabecc2f1644a4f1f14e909bebdabc4f86e673c185277
MD5 5550669cc2fc1aa67effeda73a339ef5
BLAKE2b-256 1f2b92f08adaf592c88b2ff6ca1e741be838442ba1d442b5cb88bfe0fe0ef5d7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page