RNA virus analysis toolkit
Project description
RolyPoly
RolyPoly is an RNA virus analysis toolkit, including a variety of commands and wrappers for external tools (from raw read processing to genome annotation). It also includes an "end-2-end" command that employs an entire pipeline.
For more detailed information, please refer to the docs.
Installation
We hope to have rolypoly available from bioconda in the near future.
In the meantime, it can be installed with the quick_setup.sh script which which will also fetch the pre-generated data rolypoly will require.
curl -O https://code.jgi.doe.gov/rolypoly/rolypoly/-/raw/main/src/setup/quick_setup.sh && \
bash quick_setup.sh
You can specify custom paths for the code, databases, and conda enviroment location:
bash quick_setup.sh /path/to/conda/env /path/to/install/rolypoly_code /path/to/store/databases /path/to/logfile
By default if no positional arguments are supplied, rolypoly is installed into the session current folder (path the quick_setup.sh is called from):
- database in
./rolypoly/data/ - code in
./rolypoly/code/ - conda enviroment in
./rolypoly/env/ - log file in
./RolyPoly_quick_setup.log
To install rolypoly in development mode, use:
bash quick_setup.sh /path/to/conda/env /path/to/install/rolypoly_code /path/to/store/databases /path/to/logfile TRUE
Usage
RolyPoly is a command-line tool with subcommands for different stages of the RNA virus identification pipeline. For a detailed help (in terminal), use rolypoly help. For more specific help, see the docs.
rolypoly [OPTIONS] COMMAND [ARGS]...
Project Status
Active development. Currently implemented features:
- ✅ NGS raw read filtering (Host, rRNA, adapters, artefacts) and quality control report(
filter-reads) - ✅ Assembly (SPAdes, MEGAHIT and penguin) (
assembly) - ✅ Contig filtering and clustering (
filter-contigs) - ✅ Marker gene search with pyhmmer (mainly RdRps, genomad VV's or user-provided) (
marker-search) - ✅ RNA secondary structure prediction, annotation and ribozyme identification (
annotate-rna) - ✅ Nucleotide search vs known viruses (
search-viruses) - ✅ Prepare external data (
prepare-external-data)
Under development:
- 🚧 Protein annotation (
annotate-protein) - 🚧 Host prediction (
host-predict) - 🚧 Genome binning and refinement (
TBD) - 🚧 Virus taxonomic classification (
TBD) - 🚧 Virus feature prediction (+/-ssRNA/dsRNA, circular/linear, mono/poly-segmented, capsid type, etc.) (
TBD) - 🚧 Cross-sample analysis (
TBD)
For more details about the implementation status, roadmap, additional commands, and more, see the workflow documentation.
Dependencies
Click to show dependencies
Non-Python
- SPAdes.
- seqkit
- datasets
- bbmap - via bbmapy
- megahit
- mmseqs2
- plass and penguin
- diamond
- pigz
- prodigal - via pyrodigal-gv (add link)
- linearfold
- HMMER - via pyhmmer
- needletail
- infernal
- aragorn
- tRNAscan-SE
- bowtie1
- falco
Python Libraries
Databases used by rolypoly
RolyPoly will try to remind you to cite these (along with tools) based on the commands you run. For more details, see the citation_reminder.py script.
Click to show databases
- NCBI RefSeq rRNAs - Reference RNA sequences from NCBI RefSeq
- NCBI RefSeq viruses - Reference viral sequences from NCBI RefSeq
- PFAM_A_37 - RdRp and RT profiles from Pfam-A version 37
- RVMT - RNA Virus Meta-Transcriptomes database
- SILVA_138 - High-quality ribosomal RNA database
- NeoRdRp_v2.1 - Collection of RdRp profiles
- RdRp-Scan - RdRp profile database incorporating PALMdb
- TSA_2018 - RNA virus profiles from transcriptome assemblies
- Rfam - Database of RNA families (structural/catalytic/both)
Motivation
Current workflows for RNA virus detection are functional but could be improved, especially by utilizing raw reads instead of pre-existing, general-purpose made, assemblies. Here we proceed with more specific processes tailored for RNA viruses.
Several similar software exist, but have different uses, for example:
- hecatomb (github.com/shandley/hecatomb): uses mmseqs for homology detection and thus is less sensitive than the additional HMMer based identification herein.
- AliMarko (biorxiv.org/content/10.1101/2024.07.19.603887): Utilizes a single-sample assembly only approach, not supporting co/cross assembly of multiple samples. Additionally, AliMarko uses a small, partially outdated (IMO) HMM profile set.
Reporting Issues and Contribution
RolyPoly is hosted on GitHub (issue tracking and development) and JGI's gitlab (Documentation, releases and archiving).
Please report bugs you find in the Issues page.
Suggestions and Contributions are welcome - either fork the repo and open a pull request or contact us directly.
Authors
Click to show authors
- Uri Neri
- Brian Bushnell
- Simon Roux
- Antônio Pedro Camargo
- Andrei Stecca Steindorff
- Clement Coclet
- David Parker
- Dimitris Karapliafis
Acknowledgments
Thanks to the DOE Joint Genome Institute for infrastructure support. Special thanks to all contributors who have offered insights and improvements.
Copyright Notice
RolyPoly (rp) Copyright (c) 2024, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
If you have questions about your rights to use or distribute this software, please contact Berkeley Lab's Intellectual Property Office at IPO@lbl.gov.
NOTICE. This Software was developed under funding from the U.S. Department of Energy and the U.S. Government consequently retains certain rights. As such, the U.S. Government has been granted for itself and others acting on its behalf a paid-up, nonexclusive, irrevocable, worldwide license in the Software to reproduce, distribute copies to the public, prepare derivative works, and perform publicly and display publicly, and to permit others to do so.
License Agreement
GPL v3 License
RolyPoly (rp) Copyright (c) 2024, The Regents of the University of California, through Lawrence Berkeley National Laboratory (subject to receipt of any required approvals from the U.S. Dept. of Energy). All rights reserved.
This program is free software: you can redistribute it and/or modify it under the terms of the GNU General Public License as published by the Free Software Foundation, either version 3 of the License, or (at your option) any later version.
This program is distributed in the hope that it will be useful, but WITHOUT ANY WARRANTY; without even the implied warranty of MERCHANTABILITY or FITNESS FOR A PARTICULAR PURPOSE. See the GNU General Public License for more details.
You should have received a copy of the GNU General Public License along with this program. If not, see https://www.gnu.org/licenses/.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file rolypoly_tk-0.6.17.tar.gz.
File metadata
- Download URL: rolypoly_tk-0.6.17.tar.gz
- Upload date:
- Size: 151.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4609b82b0cadea7f6d523ebde746507afd68e69591a015e9185149e066cc4e49
|
|
| MD5 |
75feb0e0ca2a4da1f986b9d40fb6d519
|
|
| BLAKE2b-256 |
5d02661b07ab6ff35c60eeafa33ab910ed431353412cbded631324ed1d36c9f8
|
File details
Details for the file rolypoly_tk-0.6.17-py3-none-any.whl.
File metadata
- Download URL: rolypoly_tk-0.6.17-py3-none-any.whl
- Upload date:
- Size: 169.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.11
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a1ca1530e2c08c8572e7a831da1dcdad09b83eb7c28716c32f0ee8678bdd48f9
|
|
| MD5 |
475c1b04c2147ac195e9969e7201c681
|
|
| BLAKE2b-256 |
427af628a927abbb2a1c144d5e383263eaca45786b27b93c027582197c0b6342
|