Skip to main content

assign long RNA-seq reads to transcripts

Project description

TranSigner: assigning long RNA-seq reads to transcripts

TranSigner is a Python program that assigns long reads (compatible with both ONT and PacBio) to transcripts. This tool takes in a set of reads, and the assembled transcriptome. The transcriptome must be available in both fasta and gtf/gff formats. If you only have it in a gtf/gff format, you'll need to run gffread as follows:

gffread -w transcripts.fa -g genome.fa transcripts.gtf

Installation

You can install TranSigner with pip by running:

pip install transigner

or from source:

git clone https://github.com/haydenji0731/transigner transigner
cd transigner
python setup.py install

You also need minimap2 and samtools installed. You can also use the precompiled binaries as long as they are available in your PATH. If you encounter trouble during installation, please open up a Github issue.

Usage

TranSigner consists of three modules: align, prefilter, and em (short for expectation-maximization). You can check the arguments for each module by running:

transigner [module] -h

The align module takes a set of reads contained in a fastq file and align them to a transcriptome provided as a fasta file. See below:

transigner align -q reads.fastq -t transcripts.fa -d output_dir -o alignment.bam -p threads

Next, TranSigner processes the alignment results to compute compatibility scores between reads and transcripts and filter out alignments with questioning 5' and/or 3' end positions. The recommendedfilter thresholds differ by the read type:

transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -1 # noisy ONT direct RNA reads
transigner prefilter -a alignment.bam -t transcripts.fa -o output_dir --filter -tp -500 -fp -600 # ONT cDNA or PacBio IsoSeq

Finally, an expectation-maximization algorithm is run as follows:

transigner em -s output_dir/scores.tsv -i output_dir/ti.pkl -o output_dir --drop --use-score

By running all three modules, you'll obtain abundances.tsv and assignments.tsv files. See below for the header information for these files:

# abundances.tsv
transcript_id  read_count  relative_abundance
NR_024540.1	1.7149614376942495e-28	1.6757869165652874e-21

# assignments.tsv
read_id  (transcript_id_1, read_fraction_1)  (transcript_id_2, read_fraction_2)  (transcript_id_3, read_fraction_3) ...
57ebcba2-cde0-4096-b9b9-9bdc4306cb6c    (ENST00000559163, 6.640795584395568e-07)        (ENST00000559884, 0.9507564850356304)   (ENST00000354296, 0.04924285088481117)

You can also add the --push flag to obtain hard, 1-to-1 assignments between reads and transcripts.

References

Ji, H. J. and M. Pertea (2024). "Enhancing transcriptome expression quantification through accurate assignment of long RNA sequencing reads with TranSigner." bioRxiv: 2024.2004.2013.589356. [doi:https://doi.org/10.1101/2024.04.13.589356]

Pertea, G., & Pertea, M. (2020). GFF utilities: GffRead and GffCompare [version 2; peer review: 3 approved]. F1000Research, 9:304. [doi:10.12688/f1000research.23297.2]

Danecek, P., Bonfield, J. K., Liddle, J., Marshall, J., Ohan, V., Pollard, M. O., ... & Li, H. (2021). Twelve years of SAMtools and BCFtools. Gigascience, > 10(2), giab008. [doi:10.1093/gigascience/giab008]

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. [doi:10.1093/bioinformatics/bty191]

Li, H. (2021). New strategies to improve minimap2 alignment accuracy. Bioinformatics, 37:4572-4574. [doi:10.1093/bioinformatics/btab705]

Lianming Du, Qin Liu, Zhenxin Fan, Jie Tang, Xiuyue Zhang, Megan Price, Bisong Yue, Kelei Zhao. Pyfastx: a robust Python package for fast random access to sequences from plain and gzipped FASTA/Q files. Briefings in Bioinformatics, 2021, 22(4):bbaa368.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

transigner-1.1.3.tar.gz (32.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

transigner-1.1.3-py3-none-any.whl (31.6 kB view details)

Uploaded Python 3

File details

Details for the file transigner-1.1.3.tar.gz.

File metadata

  • Download URL: transigner-1.1.3.tar.gz
  • Upload date:
  • Size: 32.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for transigner-1.1.3.tar.gz
Algorithm Hash digest
SHA256 7a6126584c58520622456a2f11286f782dac387ebc3d705bb14060c5404aec7f
MD5 00f0884332093d96b003428acec7d3ce
BLAKE2b-256 c982459f9dc7a6d90d0d588f5b2d9c2966a1aae300d799453affafcbc41b3514

See more details on using hashes here.

File details

Details for the file transigner-1.1.3-py3-none-any.whl.

File metadata

  • Download URL: transigner-1.1.3-py3-none-any.whl
  • Upload date:
  • Size: 31.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.13.1

File hashes

Hashes for transigner-1.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 0cc7a89737103425296a7adeb8cd6210317a012860257ab60c17a1eea13df7ee
MD5 3453c12594c5f6468283291d9f7022e2
BLAKE2b-256 c8b26beceab99a91207fe91d952b440f9f39644d9a9c1e155bb02f349ffaccac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page