A python implementation of tximport to transform transcript into gene counts
Project description
pytximport
pytximport
is a Python package for fast gene count estimation based on transcript quantification files produced by pseudoalignment/quasi-mapping tools such as kallisto
or salmon
. pytximport
is a port of the popular tximport Bioconductor R package.
Documentation
Detailled documentation is made available at: https://pytximport.readthedocs.io.
Development status
pytximport
is still in development and has not yet reached version 1.0.0 in the SemVer versioning scheme. While it should work for most use cases and we regularly compare outputs against the R implementation, expect breaking changes. If you encounter any problems, please open a GitHub issue. If you are a Python developer, we welcome pull requests implementing missing features, adding more extensive unit tests and bug fixes.
Motivation
The tximport
package has become a main stay in the bulk RNA sequencing community and has been used in hundreds of scientific publications. However, its accessibility has remained limited since it requires the R programming language and cannot be used from within Python scripts or the command line. Other tools of the bulk RNA sequencing analysis stack, like DESeq2
(in the form of PyDESeq2
), decoupler
, liana
and others all have Python versions. Additionally, pseudoalignment tools like salmon
and kallisto
can be installed via conda
and can be used from the command line.
tximport
thus constitutes the missing link in many common analysis workflows. pytximport
fills this gap and allows these workflows to be entirely done in Python, which is preinstalled on most development machines, and from the command line.
Installation
pip install pytximport
Quick Start
You can either use it from the command line:
pytximport -i ./sample_1.sf -i ./sample_2.sf -t salmon -m ./tx2gene_map.tsv -o ./output_counts.csv
Common options are:
-i
: The input files.-t
: The input type, e.g.,salmon
,kallisto
ortsv
.-m
: The map to match transcript ids to their gene ids. Expected column names aretranscript_id
andgene_id
.-o
: The output path.-c
: The count transform to apply. Leave out for none, other options includescaled_tpm
,length_scaled_tpm
anddtu_scaled_tpm
.-tx
: Whether to return transcript-level counts without gene summarization.-id
: The column name containing the transcript ids, in case it differs from the typical naming standards for the configured input file type.-counts
: The column name containing the transcript counts, in case it differs from the typical naming standards for the configured input file type.-length
: The column name containing the transcript lenghts, in case it differs from the typical naming standards for the configured input file type.-tpm
: The column name containing the transcript abundance, in case it differs from the typical naming standards for the configured input file type.
Or import the tximport
function in your Python files:
from pytximport import tximport
results = tximport(
file_paths,
"salmon",
transcript_gene_mapping,
)
Citation
Please cite both the original publication as well as this Python implementation:
- Charlotte Soneson, Michael I. Love, Mark D. Robinson. Differential analyses for RNA-seq: transcript-level estimates improve gene-level inferences, F1000Research, 4:1521, December 2015. doi: 10.12688/f1000research.7563.1
- Kuehl, M., & Puelles, V. (2024). pytximport: Fast gene count estimation from transcript quantification files in Python (Version 0.4.0) [Computer software]. https://github.com/complextissue/pytximport
License
The software is provided under the GNU General Public License version 3. Please consult LICENSE
for further information.
Differences
Generally, outputs from pytximport
correspond to the outputs from tximport
within the accuracy allowed by multiple floating point operations and small implementation differences in its dependencies when using the same configuration. If you observe larger discrepancies, please open an issue.
While the outputs are roughly identical for the same configuration, there remain some differences between the packages:
pytximport
can be used from the command line.pytximport
supportsAnnData
format outputs (setoutput_type
toanndata
), enabling seamless integration with thescverse
.pytximport
currently does not support inferential replicates. If these are valuable to your workflow, we appreciate pull requests to add support.pytximport
currently does not support gene-level inputs. If these are valuable to your workflow, we appreciate pull requests to add support.- Argument order and argument defaults may differ between the implementations.
- Additional features:
- When
ignore_transcript_version
is set, the transcript version will not only be scrapped from the quantization file but also from the provided transcript to gene mapping. - When
biotype_filter
is set, all transcripts that do not contain any of the provided biotypes will be removed prior to all other steps. - When
save_path
is configured, a count matrix will be saved as a .csv file.
- When
Building the documentation locally
The documentation can be build locally by navigating to the docs
folder and running: make html
.
This requires that the development requirements of the package as well as the package itself have been installed in the same virtual environment and that pandoc
has been added, e.g. by running brew install pandoc
on macOS operating systems.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pytximport-0.4.0.tar.gz
.
File metadata
- Download URL: pytximport-0.4.0.tar.gz
- Upload date:
- Size: 34.1 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4393758fc71d4b5ece49afa4a0df2ef77fff64a6682d014113ed08ef370d8a62 |
|
MD5 | a6bbb0f16773c77f93100edc25a799eb |
|
BLAKE2b-256 | 8bf16c390d6a0af42683e7e25b3b9feb528de799575739bb0cc2b5dc680be28d |
File details
Details for the file pytximport-0.4.0-py3-none-any.whl
.
File metadata
- Download URL: pytximport-0.4.0-py3-none-any.whl
- Upload date:
- Size: 35.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.32.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 2960cbde11053306317ab8a2630310ccc8b7ae60bbfa9c1e7d28871d4917c073 |
|
MD5 | 3b828be320f09b138550d602dc861b1b |
|
BLAKE2b-256 | d7ef38695ab62d62ddb23825022e81ff5cb710abf66e852880e0a4fb9f4e9d7d |