Prepare a splici transcriptome
Project description
pyroe
Introduction
Alevin-fry
is a fast, accurate, and memory frugal quantification tool for preprocessing single-cell RNA-sequencing data. Detailed information can be found in the alevin-fry pre-print, and paper.
The pyroe
package provides useful functions for preparing input files required by alevin-fry
, which consists of
- preparing the splici reference for the
USA
mode ofalevin-fry
, which will export a unspliced, a spliced, and an ambiguous molecule count for each gene within each cell.
Installation
The pyroe
package can be accessed from its github repository, installed via pip
, or via bioconda
. To install the pyroe
package via pip
use the command:
pip install pyroe
Preparing a splici index for quantification with alevin-fry
The USA mode in alevin-fry requires a special index reference, which is called the splici reference. The splici reference contains the spliced transcripts plus the intronic sequences of each gene. The make_splici_txome()
function is designed to make the splici reference by taking a genome FASTA file and a gene annotation GTF file as the input. Details about the splici can be found in Section S2 of the supplementary file of the alevin-fry paper. To run pyroe, you also need to specify the read length argument read_length
of the experiment you are working on and the flank trimming length flank_trim_length
. A final flank length will be computed as the difference between the read_length and flank trimming length and will be attached to the ends of each intron to absorb the intron-exon junctional reads.
Following is an example of calling the pyroe
to make the splici index reference. The final flank length is calculated as the difference between the read length and the flank_trim_length, i.e., 5-2=3. This function allows you to add extra spliced and unspliced sequences to the splici index, which will be useful when some unannotated sequences, such as mitochondrial genes, are important for your experiment. Note : to make pyroe
work more quickly, it is recommended to have the latest version of bedtools
(Aaron R. Quinlan and Ira M. Hall, 2010) installed.
pyroe make-splici extdata/small_example_genome.fa extdata/small_example.gtf 5 splici_txome \
--flank-trim-length 2 --filename-prefix transcriptome_splici --dedup-seqs
The pyroe
program writes two files to your specified output directory output_dir
. They are
- A FASTA file that stores the extracted splici sequences.
- A three columns' transcript-name-to-gene-name file that stores the name of each transcript in the splici index reference, their corresponding gene name, and the splicing status (
S
for spliced andU
for unspliced) of those transcripts.
Full usage
usage: pyroe make-splici [-h] [--filename-prefix FILENAME_PREFIX]
[--flank-trim-length FLANK_TRIM_LENGTH]
[--extra-spliced EXTRA_SPLICED]
[--extra-unspliced EXTRA_UNSPLICED]
[--bt-path BT_PATH] [--dedup-seqs] [--no-bt]
[--no-flanking-merge]
genome-path gtf-path read-length output-dir
positional arguments:
genome-path The path to a gtf file.
gtf-path The path to a gtf file.
read-length The read length of the single-cell experiment
being processed (determines flank size).
output-dir The output directory where splici reference
files will be written.
optional arguments:
-h, --help show this help message and exit
--filename-prefix FILENAME_PREFIX
The file name prefix of the generated output files.
--flank-trim-length FLANK_TRIM_LENGTH
Determines the amount subtracted from the read length
to get the flank length.
--extra-spliced EXTRA_SPLICED
The path to an extra spliced sequence fasta file.
--extra-unspliced EXTRA_UNSPLICED
The path to an extra unspliced sequence fasta file.
--bt-path BT_PATH The path to bedtools v2.30.0 or greater.
--dedup-seqs A flag indicates whether identical sequences will be
deduplicated.
--no-bt A flag indicates whether to disable bedtools.
--no-flanking-merge A flag indicates whether introns will be merged after
adding flanking length.
the splici index
The splici index of a given species consists of the transcriptome of the species, i.e., the spliced transcripts, and the intronic sequences of the species. Within a gene, if the flanked intronic sequences overlap with each other, the overlapped intronic sequences will be collapsed as a single intronic sequence to make sure each base will appear only once in the intronic sequences. For more detailed information, please check the section S2 in the supplementary file of alevin-fry manuscript.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pyroe-0.1.0.tar.gz
.
File metadata
- Download URL: pyroe-0.1.0.tar.gz
- Upload date:
- Size: 8.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.48.2 importlib-metadata/1.7.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 782bf740e98369dc949e610e6e86ce53778bcf52e21ec1b45689ff9eaa4e4860 |
|
MD5 | f73d702bb88186c4db7c5ddd4d121067 |
|
BLAKE2b-256 | d7d0e4e5994c30584bf08dea2ecb01f4bc4ddc025814093b2d66d244237d9100 |
File details
Details for the file pyroe-0.1.0-py3-none-any.whl
.
File metadata
- Download URL: pyroe-0.1.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/33.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.48.2 importlib-metadata/1.7.0 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.7.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | cb1b93c7e5669e859a8c2273e4fbe5fa4ae28f28e7bdbe90f3da003d101ccfc2 |
|
MD5 | 7def419a7bcb6814cbb9f1a676fca2cf |
|
BLAKE2b-256 | 4d8097e478bd58ea6a79dd942a7ea1e2eb7055d5a8659951f4ded65874c54cdd |