Skip to main content

Download and convert cram-to-fastq from irods.

Project description

cram2fastq

A python script to retrieve and convert crams from irods to fastq files. For internal use within the sanger HPC environment.

Installation

pip install cram2fastq
# or
pip install git+https://github.com/clatworthylab/cram2fastq.git

Before using the tool, check that:

  1. samtools is available on your $PATH. If not:
export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH
  1. REF_PATH is set as well. If not:
export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s

If setting it up for the first time, just do this once:

echo 'export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH' >> ~/.bashrc
echo 'export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s' >> ~/.bashrc
source ~/.bashrc

Instructions

usage: cram2fastq.py [-h] [--meta META] [--study STUDY] [--outpath OUTPATH] [--bulk] [--bsub] [--DNAP] [--queue QUEUE] [--ncpu NCPU] [--mem MEM] [--dryrun]

optional arguments:
  -h, --help         show this help message and exit
  --meta META        txt/csv file containing the SANGER SAMPLE IDS as per manifest as a separate line for each sample.
  --study STUDY      Study ID. This will be the name of the output folder.
  --outpath OUTPATH  Path to the directory holding the converted files.
  --bulk             If passed, assume file is bulk data rather than 10x data.
  --bsub             If passed, submits as job to bsub.
  --DNAP             If passed, treats samples as created using semiautomated pipeline from DNAP (i.e. same ID for GEX/TCR/BCR). Output will be separated as folders.
  --queue QUEUE      bsub queue. Only works if --bsub is passed.
  --ncpu NCPU        bsub ncpu. Only works if --bsub is passed.
  --mem MEM          bsub memory. Only works if --bsub is passed.
  --dryrun           If passed, prints command rather than actually run.

After installation, it is as easy as doing:

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk

Adding the --bsub option will submit this as a job if you have many samples to process.

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk --bsub

sampleids.txt is simply a single column .txt or .csv file with the sanger sample ids (no header). The IDs should correspond to SANGER SAMPLE ID column in the manifest.

For example:

SangerSampleID00000001
SangerSampleID00000002
SangerSampleID00000003

Output

Once it is all finished, a folder (with the name as whatever you provide for --study) will be created under --outpath with the appropriate .cram files converted to .fastq.gz files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cram2fastq-0.0.5.tar.gz (7.2 kB view details)

Uploaded Source

Built Distribution

cram2fastq-0.0.5-py3-none-any.whl (12.0 kB view details)

Uploaded Python 3

File details

Details for the file cram2fastq-0.0.5.tar.gz.

File metadata

  • Download URL: cram2fastq-0.0.5.tar.gz
  • Upload date:
  • Size: 7.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.15

File hashes

Hashes for cram2fastq-0.0.5.tar.gz
Algorithm Hash digest
SHA256 9f562dac5d252d74121af5bc1b503994f6a9576474ac654af13ff6dca8ff6160
MD5 4e30ab11a6dd3a0875e6e15d62220960
BLAKE2b-256 39bfe89bc3bc2b29dc86e31c8a83b7b9ae54b5e222996720a794687f96d98c8b

See more details on using hashes here.

File details

Details for the file cram2fastq-0.0.5-py3-none-any.whl.

File metadata

  • Download URL: cram2fastq-0.0.5-py3-none-any.whl
  • Upload date:
  • Size: 12.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.15

File hashes

Hashes for cram2fastq-0.0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 3e69a4331f2be7edddf069b5c69493b73af43c90851349a3b92fe4b0203e86b5
MD5 c018b443b9a5dc5920990a9a67526774
BLAKE2b-256 3e9c8cc2100956ba0f5787131a13fc36c395ff12a6ae4c414a6169f9f907f776

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page