Skip to main content

The CIDC data model and tools for working with it.

Project description

cidc-schemas

This repository contains formal definitions of the CIDC metadata model using json-schema syntax and vocabulary.

View documentation at https://nci-cidc.github.io/cidc-schemas/

Installation

To install the latest released version, run:

pip install nci-cidc-schemas

Development

Project Structure

  • cidc_schemas/ - a python module for generating, validating, and reading manifest and assay templates.
    • schemas/ - json specifications defining the CIDC metadata model.
      • templates/ - schemas for generating and validating manifest and assay templates.
      • assays/ - schemas for defining assay data models.
      • artifacts/ - schemas for defining artifacts.
  • docs/ - the most recent build of the data model documentation, along with templates and scripts for re-generating the documentation.
  • template_examples/ - example populated Excel files for template specifications in schemas/templates, and .csvs auto-generated from those .xlsxs that allow to transparently keep track of changes in them.
  • tests/ - tests for the cidc_schemas module.

Developer Setup

Install necessary dependencies.

pip install -r requirements.dev.txt

Install and configure pre-commit hooks.

pre-commit install

Updating dependencies

Use scripts/outdated_packages.py to see which packages have newer versions available and how old each new release is:

python scripts/outdated_packages.py

Only update packages whose latest version is at least 5 days old — this avoids pulling in releases that may be yanked or have undiscovered issues. The Age (days) column in the output indicates how long a release has been on PyPI.

Once you've identified packages to update, bump their versions in requirements.txt and requirements.dev.txt, then run the test suite to confirm nothing is broken.

Running tests

This repository has unit tests in the tests folder. After installing dependencies the tests can be run via the command

pytest tests

Template Versioning

When modifying an existing assay or analysis template (in cidc_schemas/schemas/templates/assays/ or cidc_schemas/schemas/templates/analyses/), you must increment the "version" field in its _template.json file using Semantic Versioning (e.g., "1.0.0" -> "1.1.0").

This ensures the template reflects that updates were made a pre-commit hook is configured to automatically verify this version bump before allowing your commit to pass.

Building documentation

To build documentation from schema updates, run the following commands:

python setup.py install # install helpers from the cidc_schemas library
python docs/generate_docs.py

This will output the generated html documents in docs/docs. If the updated docs are pushed up and merged into master, they will be viewable at https://nci-cidc.github.io/cidc-schemas/.

Using the Command-Line Interface

This project comes with a command-line interface for validating schemas and generating/validating assay and manifest templates.

Install the CLI

Clone the repository and cd into it

git clone git@github.com:NCI-CIDC/cidc-schemas.git
cd cidc-schemas

Install the cidc_schemas package (this adds the cidc_schemas CLI to your console)

python setup.py install

Run cidc_schemas --help to see available options.

If you're making changes to the module and want those changes to be reflected in the CLI without reinstalling the cidc_schemas module every time, run

python3 -m cidc_schemas.cli [args]

Creating a new assay or analysis type

In order to create a new assay type, your best bet is to just search for an existing assay and copy it.

Preferably, look at scrnaseq and copy exactly what it does. Make changes in the assay schema and template for your particular assay and/or analysis schema.

Once you update and update the version of this repo, update api-gae. You should only need to copy what scrnaseq did in api-gae in order for files to show up on the portal. Make sure to update the api-gae version. Update the api-gae version used in cloud-functions.

Finally, make sure to update the cli tool to include the new assay.

There are a lot of gotchas and hidden parsing going on behind the scenes. Listing them all would be hard, so the practical advice is to follow an existing working template.

Be sure to regenerate the docs after creating your schema, so the new schema is added to the reference docs.

Generate templates

Create a template for a given template configuration.

cidc_schemas generate_template -m templates/manifests/pbmc_template.json -o pbmc.xlsx

Validate filled-out templates

Check that a populated template file is valid with respect to a template specification.

cidc_schemas validate_template -m templates/manifests/pbmc_template.json -x template_examples/pbmc_template.xlsx

Validate JSON schemas

Check that a JSON schema conforms to the JSON Schema specifications.

cidc_schemas validate_schema -f shipping_core.json

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nci_cidc_schemas-0.28.25.tar.gz (2.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

nci_cidc_schemas-0.28.25-py2.py3-none-any.whl (2.6 MB view details)

Uploaded Python 2Python 3

File details

Details for the file nci_cidc_schemas-0.28.25.tar.gz.

File metadata

  • Download URL: nci_cidc_schemas-0.28.25.tar.gz
  • Upload date:
  • Size: 2.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for nci_cidc_schemas-0.28.25.tar.gz
Algorithm Hash digest
SHA256 b0b2fc3fdf16a47af1bd4eeb32f2eec58f1edf20d2fe6cb5a270fb43113134c0
MD5 6f15ea6da2a4df4d01baa8de39e7f568
BLAKE2b-256 8399b19dee2962d8805f86f1efed2c1e5953a86a17047cbb027359e8f33f39ba

See more details on using hashes here.

File details

Details for the file nci_cidc_schemas-0.28.25-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for nci_cidc_schemas-0.28.25-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 c5d1a33e4c1fda38afb252aca9efd8aff794887426e56d79673795795595bf2f
MD5 a252cb5b9afda59a27fd9cf4ca0a3720
BLAKE2b-256 95ede8388c79a1c5da91f60c900c72df0baafacd77e5f814b829cb1d72bce6e5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page