Skip to main content

Metadata for User facility Template Transformations

Project description

Metadata for User facility Template Transformations (MUTTs)

Table of Contents

Introduction

The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.


MUTTs User Documentation

The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.

Prerequisites

To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials

Setting up your API access token

This is required for running the examples in the Usage section below (after going through all the Installation steps).

Create a .env file in your working directory with the following environment variables:

echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env
echo "SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org" >> .env

To get your access token:

  1. Visit https://data.microbiomedata.org/user
  2. Copy your Refresh Token
  3. Replace your_token_here in the .env file with your token

Installation

  1. Create a virtual environment (recommended)
python -m venv mutts-env
source mutts-env/bin/activate  # On Windows: mutts-env\Scripts\activate
  1. Install the MUTTs package from PyPI
pip install mutts
  1. Download any of the MUTTs JSON mapper configuration files

Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.

Create a directory for your mapper files and download them from this repository:

mkdir input-files
cd input-files

Download the mapper files you need from the input-files directory:

  • For EMSL: emsl_header.json
  • For JGI Metagenome: jgi_mg_header.json or jgi_mg_header_v15.json
  • For JGI Metatranscriptome: jgi_mt_header.json or jgi_mt_header_v15.json

Updating to the Latest Version

To ensure you have the latest features and bug fixes, you can upgrade the MUTTs package from PyPI:

pip install --upgrade mutts

To check your currently installed version:

pip show mutts

You can also install a specific version if needed:

pip install mutts==<version>

Usage

Run the mutts command with the required options:

mutts --help

Note: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.

An example would look like below:

https://data.microbiomedata.org/submission/<submission-uuid>/samples

Example 1: Generate a JGI Metagenome spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header.json \
      --output my-samples_jgi.xlsx

Example 2: Generate a JGI Metagenome v15 spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header_v15.json \
      --output my-samples_jgi_v15.xlsx

Example 3: Generate an EMSL spreadsheet

mutts --submission <submission-uuid> \
      --user-facility emsl \
      --mapper input-files/emsl_header.json \
      --header \
      --unique-field samp_name \
      --output my-samples_emsl.xlsx

Command Options

  • -s, --submission: Your NMDC metadata submission UUID (required)
  • -u, --user-facility: Target facility (required): emsl, jgi_mg, jgi_mg_lr, or jgi_mt
  • -m, --mapper: Path to the JSON mapper file (required)
  • -uf, --unique-field: Field to uniquely identify records (required, typically samp_name)
  • -o, --output: Output Excel file path (required)
  • -h, --header: Include headers in output (use for EMSL, omit for JGI)

MUTTs Developer Documentation

The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.

The software consists of two main components:

  1. JSON Mapper Configuration Files
  • Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
  • Top-level keys indicate main headers in the output
  • Numbered keys add clarifying header information
  • The header keyword allows custom column names
  • The sub_port_mapping keyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns
  • Examples available in input-files/
  1. mutts CLI
  • Command-line application that performs the metadata conversion
  • Consumes mapper files and submission data as inputs

Software Requirements

Development Installation

  1. Clone this repository
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations
  1. Install dependencies with Poetry
poetry install

This installs the mutts package in development mode and creates the mutts command-line tool.

  1. Set up your .env file
cp .env.example .env  # if available, or create a new .env file

Add your NMDC API token and submission portal base URL:

DATA_PORTAL_REFRESH_TOKEN=your_token_here
SUBMISSION_PORTAL_BASE_URL=https://data.microbiomedata.org

Get your token from: https://data.microbiomedata.org/user

  1. Run the CLI in development mode
poetry run mutts --help

Creating Custom Mapper Files

To create a custom mapper for a new user facility, refer to the existing examples:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutts-1.0.5.tar.gz (32.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutts-1.0.5-py3-none-any.whl (29.7 kB view details)

Uploaded Python 3

File details

Details for the file mutts-1.0.5.tar.gz.

File metadata

  • Download URL: mutts-1.0.5.tar.gz
  • Upload date:
  • Size: 32.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.5.tar.gz
Algorithm Hash digest
SHA256 c01eaaa15d07dfb12bd1b7a98e8654f96178fbc2dad58c6bb40e3dfb076b44f5
MD5 b06a218ebd7e125a54496a16f03015d4
BLAKE2b-256 0d41ed1a274fa60700bdc562f20b8d50440c58f99e6c869f90678d0f4641107e

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.5.tar.gz:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mutts-1.0.5-py3-none-any.whl.

File metadata

  • Download URL: mutts-1.0.5-py3-none-any.whl
  • Upload date:
  • Size: 29.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 ac0bb2aa8462746c4b9bd17fc93c9636a91087fd6ff0941eb2527cd7edafa994
MD5 d5df1170d86e51ca8b55ee4cdfa6aac9
BLAKE2b-256 e741e72aeb4d5c1a55bbcd0adbf563f9faeab72e0ae1d91203d74a4bedee23b7

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.5-py3-none-any.whl:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page