a package to cluster metabolomics data and plot dendrograms
A package to cluster metabolomics data, and plot dendrograms
- Project owner: Catherine Rawlinson (PhD candidate)
- Email: firstname.lastname@example.org
Converts MGF format and component list into non-redundant list. Component-analyte list is converted into a data matrix and analytes are dynamically binned and clustered.
Install on Linux or Mac from bash
Installing is easiest with pip. Assuming you have python3 installed you can run the following to install.
python3 -m pip install --user biodendro # or git clone email@example.com:CurtinIC/BioDendro.git && cd BioDendro python3 -m pip install --user biodendro
--user flag tells pip to install to a user directory rather than a system directory.
Generally this will be under
~/.local for Mac and Linux.
Make sure that
~/.local/bin is added to your
$PATH if this is the case see here.
To install as root, you can omit
--user, though this is generally discouraged.
sudo python3 -m pip install biodendro
To install the latest and greatest version, you can use git, to install directly from the repository.
python3 -m pip install --user git+https://github.com/CurtinIC/BioDendro.git # or git clone firstname.lastname@example.org:CurtinIC/BioDendro.git && cd BioDendro python3 -m pip install --user .
BioDendro script and the python package will now be available to use (assuming Python is configured correctly).
Quick Start Example - command line
The quickest way to run is using the command-line interface.
A list of options can be obtained with the
To run the basic pipeline using the example MGF and components file do:
BioDendro --results-dir my_results_dir MSMS.mgf component_list.txt
Quick Start Example - Python library
The pipeline is also available as a python function/library. The command above would be equivalent to the following in python.
import BioDendro tree = BioDendro.pipeline("MSMS.mgf", "component_list.txt", results_dir="my_results_dir")
From there you could analyse the results stored in
The example jupyter notebooks contain more detailed explanations of different parameters.
quick-start-example.ipynb contains basic information about running the pipelines.
longer-example.ipynb contains more detailed information about how the pipeline works, and how you can modify parameters.
Command line API
The pipeline can also be run from a bash or bash-like terminal. This is useful if you're not planning on tweaking the parameters much and just want to run the darn thing.
For these examples, we're using the ipython magic command
%%bash to run the commands in bash.
You can omit the %%bash bit if you're running straight in the terminal.
To get a list of all options available use the
%%bash BioDendro --help
The minimum options to run the pipeline are the MGF file and a components list.
Using the example data in the BioDendro repo we could run...
%%bash BioDendro MSMS.mgf component_list.txt
As before, the results will be stored in a directory with the current date and the current time added to the end of it.
You can change the parameters to use by supplying additional flags, however, this will run the whole pipeline again, so it you just need to adjust the cutoff or decide to use braycurtis instead of jaccard distances, you might be better off using the python API.
%%bash BioDendro --scaling --cluster-method braycurtis --cutoff 0.5 MSMS.mgf component_list.txt
would be equivalent to running the following in python
tree = BioDendro.pipeline("MSMS.mgf", "component_list.txt", clustering_method="braycurtis", scaling=True, cutoff=0.5)