Skip to main content

Not your grandmas NGS analysis - software for analyzing FASTQs from REALLY libraries

Project description

ClaretBio's REALLY Library Processing Software

This software is for the basic informatic processing of sequencing data generated using ClaretBio's REALLY library prep kit with or without using unique molecular identifiers (UMIs).

Installation

This software can be installed as a python package using the command pip install reallyrun

Usage

The basic analysis can be run with really runsamples when running on standard libraries or really runsamples --umi when running on libraries with UMIs. The software takes in raw fastqs and trims adapters, aligns to a user-specified reference transcriptome, and marks duplicates. For UMI aware demltiplexing of REALLY libraries please use our SRSLYumi python package (more info at https://github.com/claretbio/SRSLYumi)

In order to run, this software requires an installation of conda. For speed, we recommend mamba which is best installed from mambaforge. If you prefer to use standard conda, installation instructions can be found here.

Required Arguments

--starindex : a path to the STAR index if one exists

OR

--reference: a path to the reference genome you wish to have converted to a STAR transcriptome for alignment

--gtf: a path to the GTF of genes that corresponds to the reference you are aligning to

--refflat: a path to the refFlat of genes that correspond to the reference you are aligning to

--ribosomal: a path to the rRNA interval list that corresponds to the reference you are aligning to

--libraries or --libfile: the library IDs you would like analzed in comma separated format or the path to a file with one ID per line, repsectively

Optional Arguments

--fastqdir : a path to the directory containing the raw fastqs you wish to process (if not specified, defaults to current working directory)

--resultsdir : a path to the directory you would like the output to be in (if not specified, defaults to current working directory)

--indexdir: a path to the directory where you would like the STAR index to be created (if not specified, defalts to current working directory). Use one of --starindex or --indexdir, not both

Helpful information about arguments

The library IDs provided should match the beginning of the fastq files. For example, the library ID for the fastq files named lib1_R1.fastq.gz and lib1_R2.fastq.gz would be lib1. This can be provided directly on the command line with a comma separated list: --libraries lib1,lib2 or as a file that lists one library ID per line: --libfile libfile.txt.

Details on GTF format can be found here

Picard's BedToIntervalList tool can be used to generate the ribosomal interval list. An example interval list can be found here.

For an example refFlat file for GRCh38, see refFlat.txt.gz at https://hgdownload.cse.ucsc.edu/goldenPath/hg38/database/.

Example Command

really runsamples --fastqdir /home/user/fastqfiles \
--resultsdir /home/user/amazing-results \
--reference /home/user/data/hg38.fa \
--gtf /home/user/data/hg38.gtf \
--refflat /home/user/data/hg38_refflat.txt \
--ribosomal /home/user/data/hg38_rrna.interval_list \
--libraries lib1,lib2,lib3

For reproducibility's sake and to ensure appropriate versions we use snakemake wrappers for many of the tools in this pipeline, which are often slow to create the first time they are used. As a result, your first time running the software may take a long time - don't worry, this is totally normal!

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

reallyrun-0.1.tar.gz (17.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

reallyrun-0.1-py3-none-any.whl (15.0 kB view details)

Uploaded Python 3

File details

Details for the file reallyrun-0.1.tar.gz.

File metadata

  • Download URL: reallyrun-0.1.tar.gz
  • Upload date:
  • Size: 17.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for reallyrun-0.1.tar.gz
Algorithm Hash digest
SHA256 bf64e3d9c1805cd5eef2ee50695283c357126f52a292da2d48fba65c2ddddc39
MD5 1cf417ee44b03e869100b2cb7fecaaa1
BLAKE2b-256 771b464b3d7041e74b9b54d090dc9d4f4af00ebc400baca39f6872d7ea3f9887

See more details on using hashes here.

File details

Details for the file reallyrun-0.1-py3-none-any.whl.

File metadata

  • Download URL: reallyrun-0.1-py3-none-any.whl
  • Upload date:
  • Size: 15.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.12

File hashes

Hashes for reallyrun-0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 575ffbe48b0502ece70bbbdd4aaa8f61eb8ad1a2243ded519ee1e429ad5c1efd
MD5 edf63fc842582d2a1c05d56c8aba2bdc
BLAKE2b-256 0bed963c8c6645abb4619c3d5085102469e6d33089d80d196f5e139fff26b6d5

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page