Download and convert cram-to-fastq from irods.
Project description
cram2fastq
A python script to retrieve and convert crams from irods to fastq files. For internal use within the sanger HPC environment.
Installation
pip install cram2fastq
# or
pip install git+https://github.com/clatworthylab/cram2fastq.git
Before using the tool, check that:
- samtools is available on your
$PATH
. If not:
export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH
REF_PATH
is set as well. If not:
export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s
If setting it up for the first time, just do this once:
echo 'export PATH=/nfs/team297/kt16/Softwares/samtools-1.11/bin:$PATH' >> ~/.bashrc
echo 'export REF_PATH=/lustre/scratch117/core/sciops_repository/cram_cache/%2s/%2s/%s:/lustre/scratch118/core/sciops_repository/cram_cache/%2s/%2s/%s:URL=http:://refcache.dnapipelines.sanger.ac.uk::8000/%s' >> ~/.bashrc
source ~/.bashrc
Instructions
usage: cram2fastq.py [-h] [--meta META] [--study STUDY] [--outpath OUTPATH] [--bulk] [--bsub] [--DNAP] [--queue QUEUE] [--ncpu NCPU] [--mem MEM] [--dryrun]
optional arguments:
-h, --help show this help message and exit
--meta META txt/csv file containing the SANGER SAMPLE IDS as per manifest as a separate line for each sample.
--study STUDY Study ID. This will be the name of the output folder.
--outpath OUTPATH Path to the directory holding the converted files.
--bulk If passed, assume file is bulk data rather than 10x data.
--bsub If passed, submits as job to bsub.
--DNAP If passed, treats samples as created using semiautomated pipeline from DNAP (i.e. same ID for GEX/TCR/BCR). Output will be separated as folders.
--queue QUEUE bsub queue. Only works if --bsub is passed.
--ncpu NCPU bsub ncpu. Only works if --bsub is passed.
--mem MEM bsub memory. Only works if --bsub is passed.
--dryrun If passed, prints command rather than actually run.
After installation, it is as easy as doing:
cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk
Adding the --bsub
option will submit this as a job if you have many samples to process.
cram2fastq.py --meta sampleids.txt --study test --outpath /path/to/folder --bulk --bsub
sampleids.txt
is simply a single column .txt
or .csv
file with the sanger sample ids (no header).
The IDs should correspond to SANGER SAMPLE ID
column in the manifest.
For example:
SangerSampleID00000001
SangerSampleID00000002
SangerSampleID00000003
Output
Once it is all finished, a folder (with the name as whatever you provide for --study
) will be created under --outpath
with the appropriate .cram
files converted to .fastq.gz
files.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file cram2fastq-0.0.5.tar.gz
.
File metadata
- Download URL: cram2fastq-0.0.5.tar.gz
- Upload date:
- Size: 7.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9f562dac5d252d74121af5bc1b503994f6a9576474ac654af13ff6dca8ff6160 |
|
MD5 | 4e30ab11a6dd3a0875e6e15d62220960 |
|
BLAKE2b-256 | 39bfe89bc3bc2b29dc86e31c8a83b7b9ae54b5e222996720a794687f96d98c8b |
File details
Details for the file cram2fastq-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: cram2fastq-0.0.5-py3-none-any.whl
- Upload date:
- Size: 12.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.7.15
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3e69a4331f2be7edddf069b5c69493b73af43c90851349a3b92fe4b0203e86b5 |
|
MD5 | c018b443b9a5dc5920990a9a67526774 |
|
BLAKE2b-256 | 3e9c8cc2100956ba0f5787131a13fc36c395ff12a6ae4c414a6169f9f907f776 |