Skip to main content

Metadata for User facility Template Transformations

Project description

Metadata for User facility Template Transformations (MUTTs)

Table of Contents

Introduction

The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.


MUTTs User Documentation

The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.

Prerequisites

To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials

Setting up your API access token

This is required for running the examples in the Usage section below (after going through all the Installation steps).

Create a .env file in your working directory:

echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env

To get your access token:

  1. Visit https://data.microbiomedata.org/user
  2. Copy your Refresh Token
  3. Replace your_token_here in the .env file with your token

Installation

  1. Create a virtual environment (recommended)
python -m venv mutts-env
source mutts-env/bin/activate  # On Windows: mutts-env\Scripts\activate
  1. Install the MUTTs package from PyPI
pip install mutts
  1. Download any of the MUTTs JSON mapper configuration files

Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.

Create a directory for your mapper files and download them from this repository:

mkdir input-files
cd input-files

Download the mapper files you need from the input-files directory:

  • For EMSL: emsl_header.json
  • For JGI Metagenome: jgi_mg_header.json or jgi_mg_header_v15.json
  • For JGI Metatranscriptome: jgi_mt_header.json or jgi_mt_header_v15.json

Updating to the Latest Version

To ensure you have the latest features and bug fixes, you can upgrade the MUTTs package from PyPI:

pip install --upgrade mutts

To check your currently installed version:

pip show mutts

You can also install a specific version if needed:

pip install mutts==<version>

Usage

Run the mutts command with the required options:

mutts --help

Note: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.

An example would look like below:

https://data.microbiomedata.org/submission/<submission-uuid>/samples

Example 1: Generate a JGI Metagenome spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header.json \
      --output my-samples_jgi.xlsx

Example 2: Generate a JGI Metagenome v15 spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header_v15.json \
      --output my-samples_jgi_v15.xlsx

Example 3: Generate an EMSL spreadsheet

mutts --submission <submission-uuid> \
      --user-facility emsl \
      --mapper input-files/emsl_header.json \
      --header \
      --unique-field samp_name \
      --output my-samples_emsl.xlsx

Command Options

  • -s, --submission: Your NMDC metadata submission UUID (required)
  • -u, --user-facility: Target facility (required): emsl, jgi_mg, jgi_mg_lr, or jgi_mt
  • -m, --mapper: Path to the JSON mapper file (required)
  • -uf, --unique-field: Field to uniquely identify records (required, typically samp_name)
  • -o, --output: Output Excel file path (required)
  • -h, --header: Include headers in output (use for EMSL, omit for JGI)

MUTTs Developer Documentation

The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.

The software consists of two main components:

  1. JSON Mapper Configuration Files
  • Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
  • Top-level keys indicate main headers in the output
  • Numbered keys add clarifying header information
  • The header keyword allows custom column names
  • The sub_port_mapping keyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns
  • Examples available in input-files/
  1. mutts CLI
  • Command-line application that performs the metadata conversion
  • Consumes mapper files and submission data as inputs

Software Requirements

Development Installation

  1. Clone this repository
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations
  1. Install dependencies with Poetry
poetry install

This installs the mutts package in development mode and creates the mutts command-line tool.

  1. Set up your .env file
cp .env.example .env  # if available, or create a new .env file

Add your NMDC API token:

DATA_PORTAL_REFRESH_TOKEN=your_token_here

Get your token from: https://data.microbiomedata.org/user

  1. Run the CLI in development mode
poetry run mutts --help

Creating Custom Mapper Files

To create a custom mapper for a new user facility, refer to the existing examples:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutts-1.0.4.tar.gz (32.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutts-1.0.4-py3-none-any.whl (29.6 kB view details)

Uploaded Python 3

File details

Details for the file mutts-1.0.4.tar.gz.

File metadata

  • Download URL: mutts-1.0.4.tar.gz
  • Upload date:
  • Size: 32.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.4.tar.gz
Algorithm Hash digest
SHA256 7525ca965cdaa6042e035e23e9ca3a54149b939ef7907922a3ec1b5fe875dd98
MD5 59cb922603ae1052cf97cc8019292612
BLAKE2b-256 64e16705c106ed2ee00e9e53b78213d3ba3c398682a82d2ab904fd2a4587909a

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.4.tar.gz:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mutts-1.0.4-py3-none-any.whl.

File metadata

  • Download URL: mutts-1.0.4-py3-none-any.whl
  • Upload date:
  • Size: 29.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 71a6d7e4e574f9c55f4d2dbc8b0c049c6881be10c1d5e60ac4a49541a9b8ac0f
MD5 addd1b9599183381afba990f6c68309f
BLAKE2b-256 c4e84fa563a5b64b2c5825740a5d8557f1f93303531f6b724086cea8968a28dd

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.4-py3-none-any.whl:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page