Package for biomedical data model and metadata ingress management

These details have been verified by PyPI

Maintainers

andrewelamb GiaJordan linglp loren.wolfe mialydefelice mnikolov

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Project description

Schematic

Introduction
Installation
Other Contribution Guidelines
- Update readthedocs documentation
Command Line Usage
Testing
- Updating Synapse test resources
Code Style
Contributors

Introduction

SCHEMATIC is an acronym for Schema Engine for Manifest Ingress and Curation. The Python based infrastructure provides a novel schema-based, metadata ingress ecosystem, that is meant to streamline the process of biomedical dataset annotation, metadata validation and submission to a data repository for various data contributors.

Installation

Installation Requirements

Python 3.7.1 or higher

Note: You need to be a registered and certified user on synapse.org, and also have the right permissions to download the Google credentials files from Synapse.

Installation guide for data curator app

Create and activate a virtual environment within which you can install the package:

python3 -m venv .venv
source .venv/bin/activate

Note: Python 3 has a built-in support for virtual environment venv so you no longer need to install virtualenv.

Install and update the package using pip:

python3 -m pip install schematicpy

If you run into error: Failed building wheel for numpy, the error might be able to resolve by upgrading pip. Please try to upgrade pip by:

pip3 install --upgrade pip

Installation guide for developers/contributors

When contributing to this repository, please first discuss the change you wish to make via issue, email, or any other method with the owners of this repository before making a change.

Please note we have a code of conduct, please follow it in all your interactions with the project.

Development environment setup

Clone the schematic package repository.

git clone https://github.com/Sage-Bionetworks/schematic.git

Install poetry (version 1.2 or later) using either the official installer or pipx. If you have an older installation of Poetry, we recommend uninstalling it first.
Start the virtual environment by doing:

poetry shell

Install the dependencies by doing:

poetry install

This command will install the dependencies based on what we specify in poetry.lock. If this step is taking a long time, try to go back to step 2 and check your version of poetry. Alternatively, you could also try deleting the lock file and regenerate it by doing poetry install (Please note this method should be used as a last resort because this would force other developers to change their development environment)

Fill in credential files: Note: If you won't interact with Synapse, please ignore this section.

There are two main configuration files that need to be edited : config.yml and synapseConfig

Configure .synapseConfig File

Download a copy of the .synapseConfig file, open the file in the editor of your choice and edit the username and authtoken attribute under the authentication section

Note: You could also visit configparser doc to see the format that .synapseConfig must have. For instance:

[authentication]
username = ABC
authtoken = abc

Configure config.yml File

Description of config.yml attributes

definitions:
    synapse_config: "~/path/to/.synapseConfig"
    creds_path: "~/path/to/credentials.json"
    token_pickle: "~/path/to/token.pickle"
    service_acct_creds: "~/path/to/service_account_creds.json"

synapse:
    master_fileview: "syn23643253" # fileview of project with datasets on Synapse
    manifest_folder: "~/path/to/manifest_folder/" # manifests will be downloaded to this folder
    manifest_basename: "filename" # base name of the manifest file in the project dataset, without extension
    token_creds: "syn23643259" # synapse ID of credentials.json file
    service_acct_creds: "syn25171627" # synapse ID of service_account_creds.json file

manifest:
    title: "Patient Manifest " # title of metadata manifest file
    data_type: "Patient" # component or data type from the data model

model:
    input:
        location: "data/schema_org_schemas/example.jsonld" # path to JSON-LD data model
        file_type: "local" # only type "local" is supported currently
        validation_schema: "~/path/to/validation_schema.json" # path to custom JSON Validation Schema JSON file
        log_location: "~/path/to/log_folder/validation_schema.json" # auto-generated JSON Validation Schemas can be logged

Note: Paths can be specified relative to the config.yml file or as absolute paths.

Obtain Google credential Files

To obtain credentials.json and token.pickle, please run:

schematic init --config ~/path/to/config.yml

This should prompt you with a URL that will take you through Google OAuth. Your credentials.json and token.pickle will get automatically downloaded the first time you run this command.

Note : The credentials.json file is required when you are using OAuth2 to authenticate with the Google APIs.

For details about the steps involved in the OAuth2 authorization flow refer to the Credentials section in the docs/md/details document.

To obtain schematic_service_account_creds.json, please run:

schematic init --config ~/path/to/config.yml --auth service_account

Notes: Use the schematic_service_account_creds.json file for the service account mode of authentication (for Google services/APIs). Service accounts are special Google accounts that can be used by applications to access Google APIs programmatically via OAuth2.0, with the advantage being that they do not require human authorization.

Background: schematic uses Google’s API to generate google sheet templates that users fill in to provide (meta)data. Most Google sheet functionality could be authenticated with service account. However, more complex Google sheet functionality requires token-based authentication. As browser support that requires the token-based authentication diminishes, we are hoping to deprecate token-based authentication and keep only service account authentication in the future.

Development process instruction

For new features, bugs, enhancements

Pull the latest code from develop branch in the upstream repo
Checkout a new branch develop-<feature/fix-name> from the develop branch
Do development on branch develop-<feature/fix-name> a. may need to ensure that schematic poetry toml and lock files are compatible with your local environment
Add changed files for tracking and commit changes using best practices
Have granular commits: not “too many” file changes, and not hundreds of code lines of changes
Commits with work in progress are encouraged: a. add WIP to the beginning of the commit message for “Work In Progress” commits
Keep commit messages descriptive but less than a page long, see best practices
Push code to develop-<feature/fix-name> in upstream repo
Branch out off develop-<feature/fix-name> if needed to work on multiple features associated with the same code base
After feature work is complete and before creating a PR to the develop branch in upstream a. ensure that code runs locally b. test for logical correctness locally c. wait for git workflow to complete (e.g. tests are run) on github
Create a PR from develop-<feature/fix-name> into the develop branch of the upstream repo
Request a code review on the PR
Once code is approved merge in the develop branch
Delete the develop-<feature/fix-name> branch

Note: Make sure you have the latest version of the develop branch on your local machine.

Other Contribution Guidelines

Updating readthedocs documentation

cd docs
After making relevant changes, you could run the make html command to re-generate the build folder.
Please contact the dev team to publish your updates

Other helpful resources:

Update toml file and lock file

If you install external libraries by using poetry add <name of library>, please make sure that you include pyproject.toml and poetry.lock file in your commit.

Reporting bugs or feature requests

You can use the Issues tab to create bug and feature requests. Providing enough details to the developers to verify and troubleshoot your issue is paramount:

Provide a clear and descriptive title as well as a concise summary of the issue to identify the problem.
Describe the exact steps which reproduce the problem in as many details as possible.
Describe the behavior you observed after following the steps and point out what exactly is the problem with that behavior.
Explain which behavior you expected to see instead and why.
Provide screenshots of the expected or actual behaviour where applicable.

Command Line Usage

Please visit more documentation here

Testing

All code added to the client must have tests. The Python client uses pytest to run tests. The test code is located in the tests subdirectory.

You can run the test suite in the following way:

pytest -vs tests/

Updating Synapse test resources

Duplicate the entity being updated (or folder if applicable).
Edit the duplicates (e.g. annotations, contents, name).
Update the test suite in your branch to use these duplicates, including the expected values in the test assertions.
Open a PR as per the usual process (see above).
Once the PR is merged, leave the original copies on Synapse to maintain support for feature branches that were forked from develop before your update.
- If the old copies are problematic and need to be removed immediately (e.g. contain sensitive data), proceed with the deletion and alert the other contributors that they need to merge the latest develop branch into their feature branches for their tests to work.

Code style

Please consult the Google Python style guide prior to contributing code to this project.
Be consistent and follow existing code conventions and spirit.

Contributors

Main contributors and developers:

Project details

These details have been verified by PyPI

Maintainers

andrewelamb GiaJordan linglp loren.wolfe mialydefelice mnikolov

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Development Status
- 4 - Beta
Environment
- Console
Intended Audience
- Science/Research
Programming Language
Topic
- Software Development :: Libraries :: Python Modules

Release history Release notifications | RSS feed

24.5.1

May 10, 2024

24.4.1

Apr 25, 2024

24.2.1

Feb 22, 2024

24.1.1

Jan 22, 2024

23.12.1

Dec 14, 2023

23.11.1

Nov 3, 2023

23.11.dev2 pre-release

Nov 15, 2023

23.9.3

Sep 22, 2023

23.9.1

Sep 5, 2023

23.8.1

Aug 3, 2023

23.7.1

Jul 27, 2023

23.6.3

Jun 23, 2023

23.6.2

Jun 23, 2023

23.6.1

Jun 15, 2023

23.1.1

Jan 19, 2023

22.11.3

Nov 22, 2022

22.11.2

Nov 17, 2022

22.11.1

Nov 10, 2022

22.10.3

Oct 28, 2022

22.10.2

Oct 12, 2022

This version

22.10.1

Oct 4, 2022

22.9.1

Sep 28, 2022

22.8.1

Aug 29, 2022

22.7.1

Jul 11, 2022

1.1.1.dev0 pre-release

Apr 18, 2024

1.0.0

Jun 23, 2021

0.1.14

Apr 23, 2021

0.1.13

Apr 14, 2021

0.1.12

Apr 9, 2021

0.1.11

Apr 8, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

schematicpy-22.10.1.tar.gz (341.3 kB view hashes)

Uploaded Oct 4, 2022 Source

Built Distribution

schematicpy-22.10.1-py3-none-any.whl (353.6 kB view hashes)

Uploaded Oct 4, 2022 Python 3

Hashes for schematicpy-22.10.1.tar.gz

Hashes for schematicpy-22.10.1.tar.gz
Algorithm	Hash digest
SHA256	`ada345aad26b709a29b9b6048c67c6be014a0cce2c5d0435bdf8f5dea3caf32f`
MD5	`b52be5669412f855d239b5fd59294863`
BLAKE2b-256	`a278a31bf9b738ed2aef3999fadf005f71b2eaea57e6b49e17e4a3092f6b9db8`

Hashes for schematicpy-22.10.1-py3-none-any.whl

Hashes for schematicpy-22.10.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`f37fade7cefd758e347274aae60579cc6697dfeb8c05418d35e938bb0b8dae71`
MD5	`730389cba0c554e93e6f1971723ee23a`
BLAKE2b-256	`abf7aaa5dd04bb802c48e2c288b1451adac65a11cf2563f5d1b347f7cdbb858e`