Skip to main content

Metadata for User facility Template Transformations

Project description

Metadata for User facility Template Transformations (MUTTs)

Table of Contents

Introduction

The programs bundled in this repository automatically retrieve Biosample metadata records for studies submitted to NMDC through the NMDC Submission Portal, and convert the metadata into Excel spreadsheets that are accepted by DOE user facilities.


MUTTs User Documentation

The documentation and setup instructions in this section are meant for any user who would like to install the MUTTs Python package and use it's transformation capabilities to convert data from the NMDC Submission Portal into an Excel spreadsheet that follows a template, based on the MUTTs JSON mapper file that is used.

Prerequisites

To create an NMDC user account you will need to sign up at the above link by clicking on the 'ORCID LOGIN' button/link at the top right corner of the NMDC site, and signing in appropriately with your ORCID credentials

Setting up your API access token

This is required for running the examples in the Usage section below (after going through all the Installation steps).

Create a .env file in your working directory:

echo "DATA_PORTAL_REFRESH_TOKEN=your_token_here" > .env

To get your access token:

  1. Visit https://data.microbiomedata.org/user
  2. Copy your Refresh Token
  3. Replace your_token_here in the .env file with your token

Installation

  1. Create a virtual environment (recommended)
python -m venv mutts-env
source mutts-env/bin/activate  # On Windows: mutts-env\Scripts\activate
  1. Install the MUTTs package from PyPI
pip install mutts
  1. Download any of the MUTTs JSON mapper configuration files

Note: It is not mandatory that you need to download/use any of the pre-existing/already defined JSON mapper files that are present in this repository. You can always define your own custom JSON mapper files that follow a format similar to the ones defined in this repo.

Create a directory for your mapper files and download them from this repository:

mkdir input-files
cd input-files

Download the mapper files you need from the input-files directory:

  • For EMSL: emsl_header.json
  • For JGI Metagenome: jgi_mg_header.json or jgi_mg_header_v15.json
  • For JGI Metatranscriptome: jgi_mt_header.json or jgi_mt_header_v15.json

Usage

Run the mutts command with the required options:

mutts --help

Note: In the below examples there is a --submission optional argument that requires you to pass it an NMDC Submission UUID as value, and the way you would get that is from the URL of the Submission page when you open it up from the Submission Portal.

An example would look like below:

https://data.microbiomedata.org/submission/<submission-uuid>/samples

Example 1: Generate a JGI Metagenome spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header.json \
      --output my-samples_jgi.xlsx

Example 2: Generate a JGI Metagenome v15 spreadsheet

mutts --submission <submission-uuid> \
      --unique-field samp_name \
      --user-facility jgi_mg \
      --mapper input-files/jgi_mg_header_v15.json \
      --output my-samples_jgi_v15.xlsx

Example 3: Generate an EMSL spreadsheet

mutts --submission <submission-uuid> \
      --user-facility emsl \
      --mapper input-files/emsl_header.json \
      --header \
      --unique-field samp_name \
      --output my-samples_emsl.xlsx

Command Options

  • -s, --submission: Your NMDC metadata submission UUID (required)
  • -u, --user-facility: Target facility (required): emsl, jgi_mg, jgi_mg_lr, or jgi_mt
  • -m, --mapper: Path to the JSON mapper file (required)
  • -uf, --unique-field: Field to uniquely identify records (required, typically samp_name)
  • -o, --output: Output Excel file path (required)
  • -h, --header: Include headers in output (use for EMSL, omit for JGI)

MUTTs Developer Documentation

The documentation and setup instructions in this section are largely meant for any developer/programmer whose primary use case is to extend/improve/build upon the current capabilities of the MUTTs software.

The software consists of two main components:

  1. JSON Mapper Configuration Files
  • Controls/specifies the mapping between columns from the NMDC Submission Portal and column names used in the output spreadsheets
  • Top-level keys indicate main headers in the output
  • Numbered keys add clarifying header information
  • The header keyword allows custom column names
  • The sub_port_mapping keyword specifies mappings between Submission Portal columns/slots (as dictated by the NMDC submission schema) and user facility template columns
  • Examples available in input-files/
  1. mutts CLI
  • Command-line application that performs the metadata conversion
  • Consumes mapper files and submission data as inputs

Software Requirements

Development Installation

  1. Clone this repository
git clone https://github.com/microbiomedata/metadata-for-user-facility-template-transformations.git
cd metadata-for-user-facility-template-transformations
  1. Install dependencies with Poetry
poetry install

This installs the mutts package in development mode and creates the mutts command-line tool.

  1. Set up your .env file
cp .env.example .env  # if available, or create a new .env file

Add your NMDC API token:

DATA_PORTAL_REFRESH_TOKEN=your_token_here

Get your token from: https://data.microbiomedata.org/user

  1. Run the CLI in development mode
poetry run mutts --help

Creating Custom Mapper Files

To create a custom mapper for a new user facility, refer to the existing examples:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mutts-1.0.3.tar.gz (31.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mutts-1.0.3-py3-none-any.whl (29.5 kB view details)

Uploaded Python 3

File details

Details for the file mutts-1.0.3.tar.gz.

File metadata

  • Download URL: mutts-1.0.3.tar.gz
  • Upload date:
  • Size: 31.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.3.tar.gz
Algorithm Hash digest
SHA256 c5064d2b5dd13d3f10a4bd51e14937a9581dbd0d0eabafe7ff54b0bedbf0aefc
MD5 11fba9f07ca1f118d656f6a2234bb1f9
BLAKE2b-256 d149c9f91ddb442d5e43e572e14885c0f6f8b30bde98f0d989db21ca35dc1510

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.3.tar.gz:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file mutts-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: mutts-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 29.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for mutts-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 04f6af91586a6f766f6284d8fa8da2f0c3a9529ab40be9fffdc01c598b8eca7a
MD5 8d5a1e865b47af1b5ff15c27bbaa8d5a
BLAKE2b-256 b9c727727dff10235f217bd98de15787701ec7d17a2409c646818c44d3682da9

See more details on using hashes here.

Provenance

The following attestation bundles were made for mutts-1.0.3-py3-none-any.whl:

Publisher: pypi-publish.yaml on microbiomedata/metadata-for-user-facility-template-transformations

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page