Skip to main content

Download and convert cram-to-fastq from irods.

Project description

cram2fastq

A python script to retrieve and convert crams from irods to fastq files. For internal use within the sanger HPC environment.

Installation

pip install cram2fastq
# or
pip install git+https://github.com/clatworthylab/cram2fastq.git

Before using the tool, check that:

  1. samtools is available on your $PATH. If not:
export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH
  1. REF_PATH is set as well. If not:
export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s

If setting it up for the first time, just do this once:

echo 'export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH' >> ~/.bashrc
echo 'export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s' >> ~/.bashrc
source ~/.bashrc

Instructions

usage: cram2fastq.py [-h] [--meta META] [--study STUDY] [--outpath OUTPATH] [--bulk] [--bsub] [--DNAP] [--queue QUEUE] [--ncpu NCPU] [--mem MEM] [--dryrun]

optional arguments:
  -h, --help         show this help message and exit
  --meta META        txt/csv file containing the SANGER SAMPLE IDS as per manifest as a separate line for each sample.
  --study STUDY      Study ID. This will be the name of the output folder.
  --outpath OUTPATH  Path to the directory holding the converted files.
  --bulk             If passed, assume file is bulk data rather than 10x data.
  --bsub             If passed, submits as job to bsub.
  --DNAP             If passed, treats samples as created using semiautomated pipeline from DNAP (i.e. same ID for GEX/TCR/BCR). Output will be separated as folders.
  --queue QUEUE      bsub queue. Only works if --bsub is passed.
  --ncpu NCPU        bsub ncpu. Only works if --bsub is passed.
  --mem MEM          bsub memory. Only works if --bsub is passed.
  --dryrun           If passed, prints command rather than actually run.

After installation, it is as easy as doing:

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk

Adding the --bsub option will submit this as a job if you have many samples to process.

cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk --bsub

sampleids.txt is simply a single column .txt or .csv file with the sanger sample ids (no header). The IDs should correspond to SANGER SAMPLE ID column in the manifest.

For example:

SangerSampleID00000001
SangerSampleID00000002
SangerSampleID00000003

Output

Once it is all finished, a folder (with the name as whatever you provide for --study) will be created under --outpath with the appropriate .cram files converted to .fastq.gz files.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cram2fastq-0.0.5.tar.gz (7.2 kB view hashes)

Uploaded Source

Built Distribution

cram2fastq-0.0.5-py3-none-any.whl (12.0 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page