Skip to main content

A thin wrapper to retrieve sequence from JAMO at NERSC and on Dori.

Project description

jamofetch

thin wrapper on JAMO to allow sequence retrieval on Dori and at NERSC

Installation

Jamofetch requires python 3.8 or higher.

Jamofetch is available on PyPi: https://pypi.org/project/jamofetch/.

$ pip install jamofetch

Usage

See script docs/demo_script.py in the project for a sample script.

Create a JamoFetcher:

from jamofetch.jamofetch import JamoFetcher, JamoLibSeq

WAIT_INTERVAL = 10  # check if JAMO has provisioned sequence every 10 seconds
WAIT_MAX = 7200     # max wait for sequence is 7200 seconds or 2 hours

# directory where JAMO will link sequence, will be created if it doesn't exist
link_dir = '/tmp/sequence-links'

# create a fetcher
fetcher: JamoFetcher = JamoFetcher(link_dir=link_dir, wait_interval_secs=WAIT_INTERVAL, wait_max_secs=WAIT_MAX)

Note the following default configuration parameters for JamoLibSeq. Setting wait_max_secs to -1 causes the JamoFetcher instance to wait indefenitely for JAMO sequence.

class JamoFetcher():
    def __init__(self, link_dir='.', wait_interval_secs=10, wait_max_secs=-1):

Fetch sequence for a library, print path of symlink to sequence file and the real path to the file.

LIBRARY = 'NPUNN'
# Call JAMO to link sequence in the background.  Symlinks to the sequence
# files are created by JAMO in the directory specified by the
# link_dir parameter supplied to the JamoFetcher constructor.
lib_seq: JamoLibSeq = fetcher.fetch_lib_seq(LIBRARY)

printf(f"library name: {lib_seq.get_lib_name()}")
print(f"sequence symlink: {lib_seq.get_seq_path()}")
print(f"sequence real path: {lib_seq.get_real_path()}")

Check if sequence has been provided by JAMO, i.e. the symlink is not broken. Wait for sequence if it isn't ready.

if lib_seq.seq_exists():
    print("sequence ready")
else:
    # wait for JAMO
    real_path = lib_seq.get_real_path_wait()
    print(f"sequence ready at {real_path}")

Command Line Tool

Installing jamofetch with pip exposes a command line interface.

(venv) [dnscott@ln004 jamofetch]$ jamofetch  -h
usage: jamofetch [-h] [-l LIBRARY] [-d DIRECTORY] [-i INTERVAL] [-m MAX] [-w] [--logging LOGGING]

options:
  -h, --help            show this help message and exit
  -l LIBRARY, --library LIBRARY
                        library name(s) for which to retrieve sequence
  -d DIRECTORY, --directory DIRECTORY
                        directory where to link sequence, defaults to current directory. Directory will be created if it doesn't exit.
  -i INTERVAL, --interval INTERVAL
                        wait interval in seconds to check if sequence has been fetched, ignored if wait flag not set
  -m MAX, --max MAX     maximum time to wait for sequence in seconds, ignored if wait flag not set. Specify -1 to wait indefinetely.
  -w, --wait            wait for jamo to link sequence, output real path of linked sequence
  --logging LOGGING     logging level (specify DEBUG for verbose logging)
(venv) [dnscott@ln004 jamofetch]$ jamofetch -d data -l NPUNN -l NOOHG -l HOGH -w --max -1
fetching sequence:
NPUNN /global/dna/dm_archive/sdm/pacbio/00/27/47/pbio-2747.27352.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz BACKUP_COMPLETE 6391936239a7711d789a9380
NOOHG /global/dna/dm_archive/sdm/pacbio/00/26/91/pbio-2691.26653.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz RESTORED 6347dbb35bc59487d7e768d6
HOGH /global/dna/dm_archive/sdm/illumina/00/63/97/6397.2.44053.GGCTAC.fastq.gz RESTORE_IN_PROGRESS 51d52a82067c014cd6ef4f6f

sequence links:
HOGH symlink: /clusterfs/jgi/groups/dsi/homes/dnscott/git/jamofetch/data/HOGH.6397.2.44053.GGCTAC.fastq.gz
HOGH realpath: /clusterfs/jgi/scratch/dsi/aa/dm_archive/sdm/illumina/00/63/97/6397.2.44053.GGCTAC.fastq.gz
NOOHG symlink: /clusterfs/jgi/groups/dsi/homes/dnscott/git/jamofetch/data/NOOHG.pbio-2691.26653.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz
NOOHG realpath: /clusterfs/jgi/scratch/dsi/aa/dm_archive/sdm/pacbio/00/26/91/pbio-2691.26653.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz
NPUNN symlink: /clusterfs/jgi/groups/dsi/homes/dnscott/git/jamofetch/data/NPUNN.pbio-2747.27352.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz
NPUNN realpath: /clusterfs/jgi/scratch/dsi/aa/dm_archive/sdm/pacbio/00/27/47/pbio-2747.27352.bc1001_BAK8A_OA--bc1001_BAK8A_OA.ccs.fastq.gz

waiting for JAMO to provision sequence . . . .
NOOHG sequence ready
NPUNN sequence ready

Credits

Jamofetch uses Will Holtz's doejgi/jamo-dori Docker image to call JAMO on the Dori cluster.

jamofetch was created with cookiecutter and the py-pkgs-cookiecutter template.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

jamofetch-3.7.1.tar.gz (5.7 kB view hashes)

Uploaded Source

Built Distribution

jamofetch-3.7.1-py3-none-any.whl (6.6 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page