Metadata for User facility Template Transformations
Project description
Metadata for User facility Template Transformations (MUTTs)
Table of Contents
- Metadata for User facility Template Transformations (MUTTs)
Introduction
The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.
MUTTs User Documentation
The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.
Prerequisites
- Python 3.12 or higher
- An NMDC user account with an API access token
To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials
Setting up your API access token
This is required for running the examples in the Usage section below (after going through all the Installation steps).
Create a .env file in your working directory with the following environment variables:
echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env
echo "SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org" >> .env
To get your access token:
- Visit https://data.microbiomedata.org/user
- Copy your Refresh Token
- Replace
your_token_herein the.envfile with your token
Installation
- Create a virtual environment (recommended)
python -m venv mutts-env
source mutts-env/bin/activate # On Windows: mutts-env\Scripts\activate
- Install the MUTTs package from PyPI
pip install mutts
- Download any of the MUTTs JSON mapper configuration files
Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.
Create a directory for your mapper files and download them from this repository:
mkdir input-files
cd input-files
Download the mapper files you need from the input-files directory:
- For EMSL:
emsl_header.json - For JGI Metagenome:
jgi_mg_header.jsonorjgi_mg_header_v15.json - For JGI Metatranscriptome:
jgi_mt_header.jsonorjgi_mt_header_v15.json
Updating to the Latest Version
To ensure you have the latest features and bug fixes, you can upgrade the MUTTs package from PyPI:
pip install --upgrade mutts
To check your currently installed version:
pip show mutts
You can also install a specific version if needed:
pip install mutts==<version>
Usage
Run the mutts command with the required options:
mutts --help
Note: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.
An example would look like below:
https://data.microbiomedata.org/submission/<submission-uuid>/samples
Example 1: Generate a JGI Metagenome spreadsheet
mutts --submission <submission-uuid> \
--unique-field samp_name \
--user-facility jgi_mg \
--mapper input-files/jgi_mg_header.json \
--output my-samples_jgi.xlsx
Example 2: Generate a JGI Metagenome v15 spreadsheet
mutts --submission <submission-uuid> \
--unique-field samp_name \
--user-facility jgi_mg \
--mapper input-files/jgi_mg_header_v15.json \
--output my-samples_jgi_v15.xlsx
Example 3: Generate an EMSL spreadsheet
mutts --submission <submission-uuid> \
--user-facility emsl \
--mapper input-files/emsl_header.json \
--header \
--unique-field samp_name \
--output my-samples_emsl.xlsx
Command Options
-s, --submission: Your NMDC metadata submission UUID (required)-u, --user-facility: Target facility (required):emsl,jgi_mg,jgi_mg_lr, orjgi_mt-m, --mapper: Path to the JSON mapper file (required)-uf, --unique-field: Field to uniquely identify records (required, typicallysamp_name)-o, --output: Output Excel file path (required)-h, --header: Include headers in output (use for EMSL, omit for JGI)
MUTTs Developer Documentation
The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.
The software consists of two main components:
- JSON Mapper Configuration Files
- Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
- Top-level keys indicate main headers in the output
- Numbered keys add clarifying header information
- The
headerkeyword allows custom column names - The
sub_port_mappingkeyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns - Examples available in input-files/
muttsCLI
- Command-line application that performs the metadata conversion
- Consumes mapper files and submission data as inputs
Software Requirements
Development Installation
- Clone this repository
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations
- Install dependencies with Poetry
poetry install
This installs the mutts package in development mode and creates the mutts command-line tool.
- Set up your
.envfile
cp .env.example .env # if available, or create a new .env file
Add your NMDC API token and submission portal base URL:
DATA_PORTAL_REFRESH_TOKEN=your_token_here
SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org
Get your token from: https://data.microbiomedata.org/user
- Run the CLI in development mode
poetry run mutts --help
Creating Custom Mapper Files
To create a custom mapper for a new user facility, refer to the existing examples:
- emsl_header.json - EMSL configuration
- jgi_mg_header.json - JGI Metagenome configuration
- jgi_mt_header.json - JGI Metatranscriptome configuration
- jgi_mg_header_v15.json - JGI Metagenome v15 configuration
- jgi_mt_header_v15.json - JGI Metatranscriptome v15 configuration
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file mutts-1.0.7.tar.gz.
File metadata
- Download URL: mutts-1.0.7.tar.gz
- Upload date:
- Size: 32.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5e2db8281a10bdaa1822cb63cfc896e6fa50fd8bae0d29d1778075af12db2187
|
|
| MD5 |
742df77e80b7644597611d63813d4557
|
|
| BLAKE2b-256 |
ab538a61cf813be04d416f7414a623bf282611191bf1de9c41d7d26062d15887
|
Provenance
The following attestation bundles were made for mutts-1.0.7.tar.gz:
Publisher:
pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mutts-1.0.7.tar.gz -
Subject digest:
5e2db8281a10bdaa1822cb63cfc896e6fa50fd8bae0d29d1778075af12db2187 - Sigstore transparency entry: 957963664
- Sigstore integration time:
-
Permalink:
microbiomedata/metadata-for-user-facility-template-transformations@ca5165adead047ff09807b62952d0dcdb30d156e -
Branch / Tag:
refs/tags/v1.0.7 - Owner: https://github.com/microbiomedata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@ca5165adead047ff09807b62952d0dcdb30d156e -
Trigger Event:
release
-
Statement type:
File details
Details for the file mutts-1.0.7-py3-none-any.whl.
File metadata
- Download URL: mutts-1.0.7-py3-none-any.whl
- Upload date:
- Size: 29.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.13.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bf403db32aed61c7f42838d3f002fff676576ecdcae66bc47ab6637d796542b5
|
|
| MD5 |
10184903b27bee81c1345ed0f0d5aebb
|
|
| BLAKE2b-256 |
2b60f2e53aba35c18d388aec592bee0e89e53538d5242ac1397562c45761fd0d
|
Provenance
The following attestation bundles were made for mutts-1.0.7-py3-none-any.whl:
Publisher:
pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
mutts-1.0.7-py3-none-any.whl -
Subject digest:
bf403db32aed61c7f42838d3f002fff676576ecdcae66bc47ab6637d796542b5 - Sigstore transparency entry: 957963687
- Sigstore integration time:
-
Permalink:
microbiomedata/metadata-for-user-facility-template-transformations@ca5165adead047ff09807b62952d0dcdb30d156e -
Branch / Tag:
refs/tags/v1.0.7 - Owner: https://github.com/microbiomedata
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
pypi-publish.yaml@ca5165adead047ff09807b62952d0dcdb30d156e -
Trigger Event:
release
-
Statement type: