Skip to main content

A standardized R-loop-mapping pipeline

Project description

RLPipes

logo

Build Status codecov Version license downloads

RLPipes is an upstream workflow for R-loop-mapping data.

The primary outputs of the pipeline are:

  1. Coverage (.bw) tracks
  2. Peaks (.broadpeak) files
  3. Alignment (.bam) files
  4. RLSeq report (.html and .rda) files

Following RLPipes, the RLSeq R package can be used for more fine-grained downstream analysis.

Install

The preferred installation method is mamba or conda (slower):

mamba create -n rlpipes -c conda-forge -c bioconda rlpipes
conda activate rlpipes

Using pip

RLPipes can also be installed with pip. However, system dependencies will still need to be installed. To accomplish this, do the following:

git clone https://github.com/Bishop-Laboratory/RLPipes.git
cd RLPipes/
conda install -c conda-forge mamba -y
mamba env create -f rlpipes.yml --force
conda activate rlpipes
python -m pip install -e .

Basic Usage

To run RLPipes, you will need a samples.csv file that describes your samples. Here is an example file provided for testing purposes:

experiment control
SRX113814
SRX1025890 SRX1025893
SRX1025899

The basic usage of RSeq follows a three-step process: build, check , and run.

Build

RLPipes build generates a config.json file that controls the underlying snakemake workflow.

RLPipes build -m DRIP rlpipes_out/ tests/test_data/samples.csv

Output:

Success! RSeq has been initialized at the specified directory: rlpipes_out/

Run 'RLPipes check rlpipes_out/' to verify the configuration.

Check

Verifies that the run will succeed and generates a plot of the workflow jobs.

RLPipes check rlpipes_out/

Output:

Success! The DAG has been generated successfully. You can view it here: rlpipes_out/dag.png

Run 'RLPipes run rlpipes_out/' to execute the workflow.

Run

Executes the workflow rules.

RLPipes run rlpipes_out/

If multiple cores are available, they can be specified using the --threads/-t option.

RLPipes run -t 30 rlpipes_out/

Usage Reference

Top-level usage:

Usage: RLPipes [OPTIONS] COMMAND [ARGS]...

  RSeq: An R-loop mapping pipeline with built-in QC.

Options:
  --version  Show the version and exit.
  --help     Show this message and exit.

Commands:
  build  Configure an RSeq workflow.
  check  Validate an RSeq workflow.
  run    Execute an RSeq workflow.
  

Build

Usage: RLPipes build [OPTIONS] RUN_DIR SAMPLES

  Configure an RLPipes workflow.

  RUN_DIR: Directory for RLPipes Execution. Will be created if it does not
  exist.

  SAMPLES: A CSV file with at least one column "experiment" that provides the
  path to either local fastq files, bam files, or public sample accessions
  (SRX or GSM). Input controls should be in the "control" column.

  If providing paired-end fastq files, enter: "exp_1.fastq~exp_2.fastq".

  Columns may also include "genome" and "mode" columns. These will override
  the -g, -m, and -n  options.

  "genome" (-g/--genome) is not required if providing public data accessions.



  Example #1: "RLPipes build -m DRIP outdir/ samples.csv"

  samples.csv:

      experiment

      SRX113812

      SRX113813



  Example #2: "RLPipes build outdir/ samples.csv"

  samples.csv:

      experiment, control, genome, mode

      qDRIP_siGL3_1.fq~qDRIP_siGL3_2.fq, , hg38, qDRIP

      DRIPc_3T3.fq, Input_3T3.fq, mm10, DRIPc



Options:
  -m, --mode TEXT    The type of sequencing (e.g., "DRIP"). The available
                     options are currently: DRIP, DRIPc, qDRIP, sDRIP, ssDRIP,
                     R-ChIP, RR-ChIP, RDIP, S1-DRIP, DRIVE, RNH-CnR, and MapR
  -g, --genome TEXT  UCSC genome for samples (e.g., 'hg38'). Not required if
                     providing public data accessions.
  -n, --name TEXT    Sample names for use in output report. By default,
                     inferred from inputs.
  --help             Show this message and exit.

Check

Usage: RLPipes check [OPTIONS] RUN_DIR

  Validate an RLPipes workflow.

  RUN_DIR: Directory configured with `RLPipes build` and ready for checking
  and execution.

Options:
  -s, --smargs TEXT      Dict of arguments passed to the snakemake python API.
                         Default: "{'use_conda': True}". Read the snakemake
                         API reference for the full list of options.
  -t, --threads INTEGER  Number of threads to use. Default: 1
  --bwamem2              Align with BWA-MEM2 instead of BWA. BWA MEM2 Needs >
                         70GB RAM avaialble to build index, but shows > 3x
                         speed increase. Default: False.
  --macs2                Call peaks using macs2 instead of macs2
  -G, --groupby TEXT     Column(s) which identify biologically-meaningful
                         grouping(s) of samples (i.e., conditions).  Can be
                         any column name from the `samples` CSV file. If using
                         public data accessions,  it may also include "study".
                         NOTE: If --groupby is set and there R-loop-mapping
                         and expression samples within groups, expression-
                         matched analysis will be run. This can be disabled
                         with the --noexp flag.
                         
                         Example #1: "RSeqCLI build outdir/ samples.csv
                         --groupcols tissue"
                         
                             samples.csv:
                         
                               experiment, mode, tissue
                         
                               GSM1720615, DRIP, NT2
                         
                               GSM1720616, DRIP, NT2
                         
                               GSM1720619, DRIP, K562
                         
                         
                         
                           Example #2: "RSeqCLI build outdir/ samples.csv
                          --groupby tissue"
                         
                             samples.csv:
                         
                               experiment, mode, tissue
                         
                               GSM1720615, DRIP, NT2
                         
                               GSM1720616, DRIP, NT2
                         
                               GSM1720613, DRIPc, NT2
                         
                               GSM1720614, DRIPc, NT2
                         
                               GSM1720622, RNA-seq, NT2
                         
                               GSM1720623, RNA-seq, NT2
                         
  --noexp                If set, no expression-matched analysis will be
                         performed.
  --noreport             If set, RLSeq reports will not be generated.
  --debug                Run pipeline on subsampled number of reads (for
                         testing).
  --tsv                  Obtain config from config.tsv file instead of
                         config.json.
  --noaws                If set, prefetch from SRA tools will be used to 
                         download any public SRA data instead of AWS S3.
  --help                 Show this message and exit.

Run

Usage: RLPipes run [OPTIONS] RUN_DIR

  Execute an RLPipes workflow.

  RUN_DIR: Directory configured with `RLPipes build` and ready for checking
  and execution.

Options:
  -s, --smargs TEXT      Dict of arguments passed to the snakemake python API.
                         Default: "{'use_conda': True}". Read the snakemake
                         API reference for the full list of options.
  -t, --threads INTEGER  Number of threads to use. Default: 1
  --bwamem2              Align with BWA-MEM2 instead of BWA. BWA MEM2 Needs >
                         70GB RAM avaialble to build index, but shows > 3x
                         speed increase. Default: False.
  --macs2                Call peaks using macs2 instead of macs2
  -G, --groupby TEXT     Column(s) which identify biologically-meaningful
                         grouping(s) of samples (i.e., conditions).  Can be
                         any column name from the `samples` CSV file. If using
                         public data accessions,  it may also include "study".
                         NOTE: If --groupby is set and there R-loop-mapping
                         and expression samples within groups, expression-
                         matched analysis will be run. This can be disabled
                         with the --noexp flag.
                         
                         Example #1: "RSeqCLI build outdir/ samples.csv
                         --groupcols tissue"
                         
                             samples.csv:
                         
                               experiment, mode, tissue
                         
                               GSM1720615, DRIP, NT2
                         
                               GSM1720616, DRIP, NT2
                         
                               GSM1720619, DRIP, K562
                         
                         
                         
                           Example #2: "RSeqCLI build outdir/ samples.csv
                          --groupby tissue"
                         
                             samples.csv:
                         
                               experiment, mode, tissue
                         
                               GSM1720615, DRIP, NT2
                         
                               GSM1720616, DRIP, NT2
                         
                               GSM1720613, DRIPc, NT2
                         
                               GSM1720614, DRIPc, NT2
                         
                               GSM1720622, RNA-seq, NT2
                         
                               GSM1720623, RNA-seq, NT2
                         
  --noexp                If set, no expression-matched analysis will be
                         performed.
  --noreport             If set, RLSeq reports will not be generated.
  --debug                Run pipeline on subsampled number of reads (for
                         testing).
  --tsv                  Obtain config from config.tsv file instead of
                         config.json.
  --help                 Show this message and exit.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rlpipes-0.9.4.tar.gz (61.9 kB view details)

Uploaded Source

Built Distribution

rlpipes-0.9.4-py3-none-any.whl (63.8 kB view details)

Uploaded Python 3

File details

Details for the file rlpipes-0.9.4.tar.gz.

File metadata

  • Download URL: rlpipes-0.9.4.tar.gz
  • Upload date:
  • Size: 61.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for rlpipes-0.9.4.tar.gz
Algorithm Hash digest
SHA256 3844395f1d11c1742e4de18e44f54f56fdc032d3e6f935e2cadb5b1419d45a3c
MD5 fc19214343f52bf620a6ac97546f3196
BLAKE2b-256 a624e439e68e558c3c62591a57e47f15fed972809f2ecf06ed55199edd0c9fb3

See more details on using hashes here.

File details

Details for the file rlpipes-0.9.4-py3-none-any.whl.

File metadata

  • Download URL: rlpipes-0.9.4-py3-none-any.whl
  • Upload date:
  • Size: 63.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.0 CPython/3.9.19

File hashes

Hashes for rlpipes-0.9.4-py3-none-any.whl
Algorithm Hash digest
SHA256 2772ecb70ef49f3f88c9c7fb90c0b405942e602b20f80fd7675491ddf96aa5a6
MD5 702dce84e76bb064ec4eab401b244eb9
BLAKE2b-256 856388edf062dcae0864da12c5257f582ed5935d880c72025cc4cbfc4cf30ba0

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page