MANAclust

Multi Affinity Network Association

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU Affero General Public License v3
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

MANAclust|756x240,5%

Merged Affinity Network Association clustering

About

MANAclust helps to identify subtypes of diseases by integration of medical records and multi-omics
It takes in categorical and numeric datasets that you want to use for joint unsupervised clustering to identify subgroups
Then it does upsupervised feature selection using an information theory driven algorithm for categorical variables and looks for negative correlations for numeric variables
Next MANAclust calculates normalized affinity matrices for each of the input datasets
MANAclust then merges the affinity matrices from your datasets into a final affinity matrix which is then used for clustering
MANAclust's clustering algorithm is a new clustering algorithm that combines the strengths and weaknesses of Louvain modularity and affinity propagation
MANAclust then goes through all of the clusters comparing them to each other by each dataset, doing Chi-square or ANOVAs for categorical and numeric datasets respectively. For globally significant features, post-hocs are also performed to identify the individual differences between all of the sample groups.

How do I get set up?

python3 -m pip install manaclust

Note that you might have to use sudo for this command depending on if you're doing a global or local installation.

How do I use MANAclust?

You basically just need to tell MANAclust where your categorical (-cat) and numeric (-num) dataset files are. Note that these have to be in tab-delimited text file format.
You can also give it files that you want to use to evaluate the efficacy of clustering. For example, if you have diagnoses that you want to use, but don't want MANAclust to actually do clustering on, you can feed that into the -test_cat argument. Similarly for any numeric values you don't want used for clustering, but want to look for differences in (-test_num).
The basic syntax is:

python3 -m mana_clust.mana_clust -cat <path_to_categorical_dataset_1.tsv> \
                                      <path_to_categorical_dataset_2.tsv> \
							     -num <path_to_numeric_dataset_1.tsv> \
								      <path_to_numeric_dataset_2.tsv> \
							     -test_cat <path_to_categorical_dataset_not_used_for_clustering.tsv> \
								 -test_num <path_to_numeric_dataset_not_used_for_clustering.tsv> \
								 -out_dir <path_to_the_output_directory>

Included in this repository are some dummy files for you to test things out with. You can make sure everything is up and running by using these files with the syntax below:

python3 -m mana_clust.mana_clust -cat ~/Downloads/manaclust/test/categorical_data_file_0.tsv \
							     -num ~/Downloads/manaclust/test/numeric_data_file_0.tsv \
								      ~/Downloads/manaclust/test/numeric_data_file_0.tsv \
							     -test_cat <path_to_categorical_dataset_not_used_for_clustering.tsv> \
								 -test_num <path_to_numeric_dataset_not_used_for_clustering.tsv> \
								 -out_dir ~/Downloads/manaclust/test/out/

How do I use my own datasets?

MANAclust takes in either categorical or numeric datasets.
Both categorical and numeric datasets: must be in tab delimited formats.
Categorical datasets formats: As is traditional in small categorical datasets for medical records, categoical datasets are assumed to have each row as a subject, and each column as a variable.
Numeric datasets formats: As is traditional for large omic datasets, the features (genes or similar) are assumed to be in rows, while the subjects are in columns.
Subjects that are missing from a dataset: That's fine! Given the reality of clinical datasets, we intentinally designed the MANAclust program to be able to handle missing data in every way.
Does the order of the subjects matter in the datasets?: Nope - the subjects can be in any order in all of your datasets, MANAclust will piece everything together correctly
Missing values in categorical datasets: That's fine! As long as the missing values are noted with a single notation (i.e.: "N/A"), you can give this as an argument to MANAclust using the -MD argument (which stands for missing_data).
Mixed categorical/numeric datasets: should be fed in as categorical datasets. Any variables that have float/int values will automatically be digitized into bins such that they can be treated as categorical variables. Note this drawback however: numeric variables will be treated as categorical. This could result in power loss, but will also enable the detection of non-monotonic patterns. In theory an entirely numeric dataset could also be given

How do I interpret the output?

MANAclust generates an html file that gives you a walk through of all the figures and analyses that MANAclust performs. Simply open up the "MANAclust_summary.html" file and start exploring!

How can I dig into the data further?

MANAclust also generates a meta_ome_results.pkl file. This object saves all the nitty gritty aspects of the analysis that you can explore interactively. Just start print out the object & you'll see documentation of everything that might be of interest.

from mana_clust.common_functions import import_dict
from mana_clust.mana_clust import cluster_omes
from mana_clust.mana_cat import feature, categorical_ome
from mana_clust.mana_continuous import num_ome

meta_ome = import_dict('<path/to/meta_ome_results.pkl>' )

print(meta_ome)

To look at a specific input dataset, you can print out the objects in the meta_ome.cat_omes and meta_ome.num_omes lists:

print(meta_ome.cat_omes[0])

print(meta_ome.num_omes[0])

Who do I talk to?

Scott Tyler: scottyler89@gmail.com

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: GNU Affero General Public License v3
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.7.3

Apr 12, 2024

0.7.2

Mar 12, 2023

0.7.0

Jan 20, 2022

0.6.8

Oct 15, 2021

0.6.7

Jun 7, 2021

0.6.6

Jan 6, 2021

0.6.1

Sep 10, 2020

0.6

Sep 2, 2020

0.5

Dec 12, 2019

0.4

Nov 18, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

MANAclust-0.7.3-py3-none-any.whl (70.8 kB view details)

Uploaded Apr 12, 2024 Python 3

File details

Details for the file MANAclust-0.7.3-py3-none-any.whl.

File metadata

Download URL: MANAclust-0.7.3-py3-none-any.whl
Upload date: Apr 12, 2024
Size: 70.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.0.0 CPython/3.11.5

File hashes

Hashes for MANAclust-0.7.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`65bd24be6e97942d4d781fbd6e5c6ee75753f8bf89c9f4f79976cafe7cdb8cfa`
MD5	`24bfe80978585895e90c879ff3689734`
BLAKE2b-256	`35f505cb6df288099e4fe4c4a235260fadabc8cc0249602155fee8738b0d3b18`

See more details on using hashes here.

MANAclust 0.7.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Merged Affinity Network Association clustering

About

How do I get set up?

How do I use MANAclust?

How do I use my own datasets?

How do I interpret the output?

How can I dig into the data further?

Who do I talk to?

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distribution

File details

File metadata

File hashes