Skip to main content

Compute genetic distance matrices from RNA-seq data.

Project description

RNA-clique

DOI

This is the repository for RNA-clique, a tool for computing pairwise genetic distances from RNA-seq data. The software accepts as input assembled transcriptomes from two or more samples and produces as its output a matrix containing pairwise distances ranging from 0 to 1.

Installation

This software is written in Python. The software additionally requires NCBI BLAST+ and several Python libraries. Guides are provided for installation on specific systems. Alternatively, for installing on other systems, you can see the requirements

Installation guides

Basic usage

To run RNA-clique on your assembled transcriptomes, first make sure that your data are in a format understood by RNA-clique.

Then, run rna-clique with the directories containing your transcriptomes, an output directory, and a setting for the number of top genes to select.

rna-clique -O my_rna_clique_out -n 50000 \
           path/to/transcriptome1_dir \
           path/to/transcriptome2_dir \
		   path/to/transcriptome3_dir ...

RNA-clique produces an output matrix at my_rna_clique_out/matrix.h5. To see it in a human-readable format, use export_matrix.

python -m rna_clique.export_matrix -m my_rna_clique_out/matrix.h5 

More details about the usage of RNA-clique can be found in the Command-line usage guide

Downstream analyses

The export_matrix program prints the calculated matrix to the standard output, so you can use redirection or pipes to save the results to a file. You could then use the matrix in any downstream application capable of loading arbitrary matrices from files.

For example, if you output the matrix to a file named distances, you could load the matrix in R using the following code:

dis <- as.matrix(read.table("distances", sep=" "))

Using RNA-clique in Python code

You can use RNA-clique directly from your Python code. For example,

from rna_clique.rna_clique import rna_clique
from pathlib import Path

out_dir = Path("rna_clique_out")
out_dir.mkdir(exist_ok=True)
# Get the SampleSimilarity object and a dict mapping paths to their sample
# names.
sim, path_to_sample = rna_clique(
    [
        Path("path/to/transcriptome1_dir"),
        Path("path/to/transcriptome2_dir"),
        Path("path/to/transcriptome3_dir"),
    ],
	out_dir_1=out_dir / "od1",
	out_dir_2=out_dir / "od2",
	cache_dir=out_dir / "db_cache",
    output_graph=output_dir / "graph.pkl",
    output_matrix=output_dir / "matrix.h5",
	top_genes=50000
)
print(sim.get_dissimilarity_df())

For information on finer-grained control via RNA-clique's Python API, see the API guide.

License

All code is licensed under the MIT license, which may be found at LICENSE at the root of this repository.

A machine-readable copyright file in Debian format may also be found at copyright.

Citation

If you use RNA-clique for your work, please cite "RNA-clique: a method for computing genetic distances from RNA-seq data".

@article{tapia2024rna,
  title={{RNA-clique: a method for computing genetic distances from RNA-seq data}},
  author={Tapia, Andrew C and Jaromczyk, Jerzy W and Moore, Neil and Schardl, Christopher L},
  journal={BMC Bioinformatics},
  volume={25},
  year={2024},
  publisher={BioMed Central},
  keywords={pub}
}

Additional documentation

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rna_clique-0.3.0a1.tar.gz (295.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

rna_clique-0.3.0a1-py3-none-any.whl (124.4 kB view details)

Uploaded Python 3

File details

Details for the file rna_clique-0.3.0a1.tar.gz.

File metadata

  • Download URL: rna_clique-0.3.0a1.tar.gz
  • Upload date:
  • Size: 295.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for rna_clique-0.3.0a1.tar.gz
Algorithm Hash digest
SHA256 9cc32fcbbab3b4993cc82365c7b801922df8157758e66df91418d5a8dd137968
MD5 687ef7c4dd7ece8a81489945f6545192
BLAKE2b-256 52d524099c5151d772b84277f9994693a3c74b79613ef473ac2e760a2e4bdd11

See more details on using hashes here.

Provenance

The following attestation bundles were made for rna_clique-0.3.0a1.tar.gz:

Publisher: package.yml on actapia/rna_clique

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file rna_clique-0.3.0a1-py3-none-any.whl.

File metadata

  • Download URL: rna_clique-0.3.0a1-py3-none-any.whl
  • Upload date:
  • Size: 124.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.0.1 CPython/3.12.8

File hashes

Hashes for rna_clique-0.3.0a1-py3-none-any.whl
Algorithm Hash digest
SHA256 801a9b69c1558d2f16c5a2b44e48efb6a21d6d55dcd700128ff8296717707ba1
MD5 5eb2e61dd5231fad563b5d178b8f2907
BLAKE2b-256 9959d1a82957d2c78581c9d064849eeaaa398681b4108d7da14a41cf5b1d1651

See more details on using hashes here.

Provenance

The following attestation bundles were made for rna_clique-0.3.0a1-py3-none-any.whl:

Publisher: package.yml on actapia/rna_clique

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page