Skip to main content

RNAlysis provides a modular analysis pipeline for RNA sequencing data. RNAlysis includes various methods for filtering, data visualisation, exploratory analyses, enrichment anslyses and clustering.

Project description

logo

Useful links: Documentation | Source code | Bug reports | PyPI version | Python versions supported | Build status | Coverage | Downloads


What is RNAlysis?

RNAlysis is a Python-based modular analysis pipeline for RNA sequencing data. You can use it to normalize, filter and visualize your data, cluster genes based on their expression patterns, and perform enrichment analysis for both Gene Ontology terms and user-defined attributes.

RNAlysis allows you to perform filtering operations and analyses at any order you wish. You can save or load your progress at any given point; the operations you performed on your data and their order will be reflected in saved file’s name.

RNAlysis works with gene expression matrices and differential expression tables in general, and integrates in particular with Python’s HTSeq-count and R’s DESeq2.


What can I do with RNAlysis?

  • Filter your gene expression matrices, differential expression tables, fold change data, and tabular data in general.

  • Normalize your gene expression matrices

  • Visualise, explore and describe your sequencing data

  • Find global relationships between sample expression profiles with clustering analyses and dimensionality reduction

  • Create and share analysis pipelines

  • Perform enrichment analysis with pre-determined Gene Ontology terms, or with used-defined attributes

  • Perform enrichment analysis on a single ranked list, instead of a test set and a background set


How do I install it?

You can install RNAlysis via PyPI. Use the following command in the python prompt:

pip install RNAlysis

Dependencies

All of RNAlysis’s dependencies can be installed automatically via PyPI.


Credits

How do I cite RNAlysis?

Teichman, G. (2021) RNAlysis: RNA Sequencing analysis pipeline (Python package version 2.0.0).

Development Lead

Contributors

  • Or Ganon

  • Netta Dunsky

  • Shachar Shani


This package was created with Cookiecutter and the audreyr/cookiecutter-pypackage project template.

History

2.0.0 (2021-12-05)

  • This version introduces new method to cluster your read count matrices, including K-Means/Medoids clustering, Hierarchical clustering, and HDBSCAN.

  • This version introduces many new ways to perform enrichment analysis and to visualize your results, including highly customizable GO Enrichment, enrichment based on ranked lists of genes, and enrichment for non-categorical attributes.

  • This version introduces Pipelines - a quicker and more convenient way to apply a particular analysis pipeline to multiple Filter objects.

  • This version improves the performance of many functions in RNAlysis, and in particular the performance of randomization tests.

  • This version includes changes to names and signatures of some functions in the module, as elaborated below.

Added

  • Added class Pipeline to filtering module, which applies a series of filter functions to specified Filter objects.

  • Added CountFilter.split_kmeans(), CountFilter.split_kmedoids(), CountFilter.split_hierarchical() and CountFilter.split_hdbscan(), which split your read count matrices into clusters with similar expression patterns.

  • Added class RankedSet to enrichment module, which accepts a ranked list of genes/features, and can perform single-list enrichment analysis

  • Added RankedSet.single_set_enrichment(), which can perfofm single-list enrichment analysis of user-defined attributes using XL-mHG test (see Eden et al. (PLoS Comput Biol, 2007) and Wagner (PLoS One, 2015) ).

  • Added FeatureSet.go_enrichment() and RankedSet.single_set_go_enrichment(), which let you compute Gene Ontology enrichment for any organism of your choice, and filter the GO annotations used according to your preferences.

  • Added FeatureSet.enrich_hypergeometric(), which can perform enrichment analysis using the Hypergeometric Test.

  • Added more visualization functions, such CountFilter.enhanced_box_plot().

  • Added FeatureSet.change_set_name(), to give a new ‘set_name’ to a FeatureSet object.

Changed

  • FeatureSet.enrich_randomization_parallel() was deprecated. Instead, you can compute your enrichment analysis with parallel computing by calling FeatureSet.enrich_randomization() with the argument ‘parallel_processing=True’. Moreover, parallel session will now start automatically if one was not already active.

  • Improved running time of enrich_randomization() about six-fold.

  • Filter objects can be created from any delimiter-separated file format (.csv, .tsv, .txt, etc).

  • CountFilter.pca() can now be plotted without labeled points.

  • Filter.index_string is now sorted by the current order of indices in the Filter object, instead of by alphabetical order.

  • CountFilter.violin_plot() now accepts a y_title argument.

  • Added more optional arguments to visualization functions such as CountFilter.violin_plot() and CountFilter.clustergram().

  • Automatic filenames for Filter objects should now reflect more clearly the operations that were performed.

  • The DataFrame returned by enrich_randomization() and enrich_randomization_parallel() now contains the additional column ‘data_scale’, determined by the new optional argument ‘data_scale’.

  • The columns ‘n obs’ and ‘n exp’ in the DataFrame returned by enrich_randomization() and enrich_randomization_parallel() were renamed to ‘obs’ and ‘exp’ respectively.

  • FeatureSets no longer support in-place set operations (intersection, union, difference, symmetric difference). Instead, these functions return a new FeatureSet.

  • Filter.biotypes() now accepts the boolean parameter ‘long_format’ instead of the str parameter ‘format’.

  • Filter.biotypes() and FeatureSet.biotypes() now count features which do not appear in the Biotype Reference Table as ‘_missing_from_biotype_reference’ instead of ‘not_in_biotype_reference’.

Fixed

  • Updated type-hinting of specific functions.

  • Filter.biotypes() and FeatureSet.biotypes() now support Biotype Reference Tables with different column names.

  • Generally improved performance of RNAlysis.

  • Fixed bug in Filter.filter_percentile() where the value at the exact percentile speficied (e.g. the median for percentile=0.5) would be removed from the Filter object.

  • Fixed bug in enrichment.FeatureSet, where creating a FeatureSet from input string would result in an empty set.

  • Various minor bug fixes.

1.3.5 (2020-05-27)

  • This version introduces minor bug fixes and a few more visualization options.

Added

  • Added Filter.filter_missing_values(), which can remove rows with NaN values in some (or all) columns.

  • Added the visualization function CountFilter.box_plot().

Changed

  • Updated docstrings and printouts of several functions.

  • Slightly improved speed and performance across the board.

  • Filter.feature_string() is now sorted alphabetically.

  • Enrichment randomization functions in the enrichment module now accept a ‘random_seed’ argument, to be able to generate consistent results over multiple sessions.

  • Enrichment randomization functions can now return the matplotlib Figure object, in addition to the results table.

Fixed

  • Fixed DepracationWarning on parsing functions from the general module.

  • Fixed bug where saving csv files on Linux systems would save the files under the wrong directory.

  • Fixed a bug where UTF-8-encoded Reference Tables won’t be loaded correctly

  • Fixed a bug where enrichment.upsetplot() and enrichment.venn_diagram() would sometimes modify the user dict input ‘objs’.

  • Fixed a bug in CountFilter.pairplot where log2 would be calculated without a pseudocount added, leading to division by 0.

1.3.4 (2020-04-07)

  • This version fixed a bug that prevented installation of the package.

Changed

  • Updated docstrings and printouts of several functions

Fixed

  • Fixed a bug with installation of the previous version

1.3.3 (2020-03-28)

  • First stable release on PyPI.

Added

  • Added Filter.sort(), and upgraded the functionality of Filter.filter_top_n().

  • Added UpSet plots and Venn diagrams to enrichment module.

  • User-defined biotype reference tables can now be used.

  • Filter operations now print out the result of the operation.

  • Enrichment randomization tests now also support non-WBGene indexing.

  • Filter.biotypes() and FeatureSet.biotypes() now report genes that don’t appear in the biotype reference table.

  • Filter.biotypes() can now give a long-form report with descriptive statistics of all columns, grouped by biotype.

  • Added code examples to the user guide and to the docstrings of most functions.

Changed

  • Changed argument order and default values in filtering.CountFilter.from_folder().

  • Changed default title in scatter_sample_vs_sample().

  • Changed default filename in CountFilter.fold_change().

  • Settings are now saved in a .yaml format. Reading and writing of settings have been modified.

  • Changed argument name ‘deseq_highlight’ to ‘highlight’ in scatter_sample_vs_sample(). It can now accept any Filter object.

  • Updated documentation and default ‘mode’ value for FeatureSet.go_enrichment().

  • Updated the signature and function of general.load_csv() to be clearer and more predictable.

  • Changed argument names in CountFilter.from_folder().

  • Modified names and signatures of .csv test files functions to make them more comprehensible.

  • Renamed ‘Filter.filter_by_ref_table_attr()’ to ‘Filter.filter_by_attribute()’.

  • Renamed ‘Filter.split_by_ref_table_attr()’ to ‘Filter.split_by_attribute()’.

  • Renamed ‘Filter.norm_reads_with_size_factor()’ to ‘Filter.normalize_with_scaling_factors()’. It can now use any set of scaling factors to normalize libraries.

  • Renamed ‘Filter.norm_reads_to_rpm()’ to ‘Filter.normalize_to_rpm()’.

  • Made some functions in the general module hidden.

Fixed

  • Various bug fixes

Removed

  • Removed the ‘feature_name_to_wbgene’ module from RNAlysis.

1.3.2 (2019-12-11)

  • First beta release on PyPI.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

RNAlysis-2.0.0.tar.gz (10.4 MB view hashes)

Uploaded Source

Built Distribution

RNAlysis-2.0.0-py3-none-any.whl (1.3 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page