Reduced dimension embeddings for pathogen sequences
Project description
Cartography
Reduced dimension embeddings for pathogen sequences
Cartography is an open-source software for scientists, epidemiologists, etc. to run reduced dimension embeddings (PCA, MDS, t-SNE, and UMAP) on viral populations. This is the source code from the paper Cartography written by Sravani Nanduri and John Huddleston.
Documentation
Source Code
Bug reports
Installing the package
Simply install the package using pip.
pip install pathogen-embed
src.embed module
Command line interface
The full Documentation.
The below documentation does not detail the named and positional arguments.
Reduced dimension embeddings for pathogen sequences
usage: embed [-h] [--distance-matrix DISTANCE_MATRIX] [--separator SEPARATOR]
[--alignment ALIGNMENT] [--cluster-data CLUSTER_DATA]
[--cluster-threshold CLUSTER_THRESHOLD]
[--random-seed RANDOM_SEED] [--output-dataframe OUTPUT_DATAFRAME]
[--output-figure OUTPUT_FIGURE]
{pca,t-sne,umap,mds} ...
Sub-commands:
pca
Principal Component Analysis
embed pca [-h] [--components COMPONENTS]
[--explained-variance EXPLAINED_VARIANCE]
t-sne
t-distributed Stochastic Neighborhood Embedding
embed t-sne [-h] [--perplexity PERPLEXITY] [--learning-rate LEARNING_RATE]
umap
Uniform Manifold Approximation and Projection
embed umap [-h] [--nearest-neighbors NEAREST_NEIGHBORS] [--min-dist MIN_DIST]
mds
Multidimensional Scaling
embed mds [-h] [--components COMPONENTS]
API
src.embed.get_hamming_distances(genomes)
Calculate pairwise Hamming distances between the given list of genomes and return the nonredundant array of values for use with scipy’s squareform function. Bases other than standard nucleotides (A, T, C, G) are ignored.
-
Parameters
genomes (list) – a list of strings corresponding to genomes that should be compared
-
Returns
a list of distinct Hamming distances as a vector-form distance vector
-
Return type
list
>>> genomes = ["ATGCT", "ATGCT", "ACGCT"]
>>> get_hamming_distances(genomes)
[0, 1, 1]
>>> genomes = ["AT-GCT", "AT--CT", "AC--CT"]
>>> get_hamming_distances(genomes)
[0, 1, 1]
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pathogen_embed-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c492bf0fa18902e1ed2ab3f38f02e15b70b3e9bcb7dfead2dfcd16311785f658 |
|
MD5 | fc3e2235242f7e40dc20003c7a0bcb1f |
|
BLAKE2b-256 | 0c487da103e36f4562a4b943f2793c448bb6b86481352bf89f721a44ca58bac4 |