Skip to main content

Metagenomic profiling using a reference phylogeny

Project description

Important bug notice:

There was a bug identified in the taxonomic mapping modules due to the pressence of multiple NCBI taxa with the name "environmental samples". This bug has been fixed in version 1.4.0.0, however versions previous to this may contain errors in the taxonomic output. Please update to at least version 1.4.0.0. Note phylogenetic output can be re-mapped using the expam to_taxonomy command.

expam logo

Updates since version 1.2

  • Now using ETE3 for interfacing with NCBI taxonomy. This fixes a bug with taxonomic conversion of phylogenetic results. This is a critical bug - results from any version behind Expam 1.4 may contain errors with current versions of the NCBI taxonomy. Please update to Expam 1.4.
  • Added ‘total’ counts column to phylogenetic and taxonomic output – combines SL and ML counts.
  • Create cumulative cutoff summary file, which combines sample outputs after accumulation and cutoff into a single table.
    • This occurs for both phylogenetic and taxonomic output. The taxonomic output also contains taxonomic metadata in the final column.
  • Employ cutoff based on total counts, not SL and ML separately.
  • Taxonomic IDs for each input sequence must now be specified manually in the third column of ‘accession_ids.csv’ after a successful database build.
  • Fixed a bug in employing cutoff in phylogenetic output.
  • Only the taxonomic name associated with the ID is reported in taxonomic sample summaries, where previously the entire lineage was reported.
  • Removed --phyla flag.
  • Fixed bug in expam tree ... --sourmash which would not check for presence of signatures file before attempting distance matrix calculation.
  • Added CountUniqueKmers.py script.
  • Remove the --cutoff flag. Automated cutoffs can now only be applied in terms of --cpm.

Install.

From Bioconda (Recommended)

Conda installation is recommended, and best practise is to install expam in a new environment. Some users may wish to use the ETE3 toolkit for plotting, while others may prefer the iTOL tool. Both commands are included in respective order.

With ETE3

conda create -n expam -c conda-forge -c bioconda -c etetoolkit expam ete3

Without ETE3

conda create -n expam -c conda-forge -c bioconda expam

From PyPI

Mac

You will need a local installation of HDF5. This may already be installed on your machine, but can be installed using Homebrew with the following commands.

brew install pkg-config
brew install hdf5

If you encounter any errors, check the FAQ section on GitHub for solutions.

Then upgrade pip and install expam.

python3 -m pip install --upgrade pip
python3 -m pip install expam

Linux

You may need to update g++ resources on your local machine. For linux, you can run the following.

apt update
apt-get install build-essential

Then upgrade pip and install expam.

python3 -m pip install --upgrade pip
python3 -m pip install expam

From GitLab source

To install from source, you need a local installation of Python >=3.8, as well as numpy and cython. There are some commonly encountered problems when installing on Linux, the most common of which are outlined in the FAQ section below.

You may need to update g++ resources on your local machine. For linux, you can run the following.

apt update
apt-get install build-essential

First download the source code from the GitLab repository.

git clone https://github.com/seansolari/expam.git

This can then be installed locally by executing the following command from the source code root:

cd expam
python3 -m pip install --upgrade pip
python3 -m pip install -r requirements.txt
python3 setup.py install

Documentation

View our online documentation!

https://expam.readthedocs.io/en/latest/index.html

See the Quick Start Tutorial for a guide to expam's basic usage and download links for pre-built databases.

Quick Start Tutorial


FAQ

Problems during installation

error: g++: Command not found

This is simply a matter of updating the compiler.

sudo apt-get install build-essential

fatal error: Python.h: No such file or directory

This simply means you need to install/update the Python development files for version 3.

sudo apt-get install python3-dev

(Reference - SO)


ERROR:: Could not find a local HDF5 installation (Mac)

Ensure you have HDF5 installed using Homebrew:

brew install pkg-config
brew install hdf5

If you see

You may need to explicitly state where your local HDF5 headers and library can be found by setting the ''HDF5_DIR'' environment variable or by using the ''--hdf5'' command-line option.

you will need to explicitly set the HDF5_DIR environment variable. To see where HDF5 has been installed, run

brew info hdf5

You should see something like /usr/local/Cellar/hdf5/VERSION... or /opt/local/Cellar/hdf5/VERSION... (ie. ignore everything after the complete version, which will have numbers separated by dots). Then set this environment variable with

HDF5_DIR=/opt/local/Cellar/hdf5/VERSION,

replacing this path with your output from brew info.

Now retry the installation having set this environment variable.


ete3 importing errors

For instance, ImportError: cannot import name 'NodeStyle'.

The ete3 module depends on Qt to draw things, and there are two stages to getting this to work: first, Qt needs to be installed, and then you need to let Python know that Qt is installed. Follow the following instructions depending on your OS.

Mac

Install qt5 using brew.

brew install qt5
brew list --versions qt5

This should show you the precise version that brew installed. We now tell Python which version of Qt5 to link up with. Say we have qt@5 5.15.3 from the above command, then we would run

python3 -m pip install pyqt5==5.15

Had the output been qt@5 5.12.0, we would run

python3 -m pip install pyqt5==5.12

ie. the first two parts of the version from brew. This should remedy the problem.

Linux

First update the local installation of Qt.

sudo apt-get install qt5-default

Now double-check which version of Qt has been installed.

dpkg -l | grep "pyqt5"

Take the first two parts of the version output from this, and pass it to this following install with Pip. For instance, if we have qt5 5.12.0, take the 5.12 component. Install the corresponding Python interface to Qt.

python3 -m pip install pyqt5==5.12

OOM Killer

If you run into the unlikely circumstance where the OOM killer has been invoked and the program experiences an ungraceful exit, the operating system may not have cleaned all shared memory resources expam used, leading to potentially problematic memory leaks.

To prevent this occurring, make prudent use of the expam_limit functionality (see documentation), and don't use an extremely high number of processes (particularly for large databases). Within the range of 10-30 processes will likely be suitable for high-memory machines.

If you suspect that OOM killer has been invoked, this can be confirmed using the following command:

dmesg -T | egrep -i 'killed process'

In the event OOM killer has been called, it is prudent to check how much shared memory is currently being used by the system.

df -h /dev/shm

If the amount of shared memory used is higher than you would expect, you can first check if there are any residual resources that need to be cleaned up.

ls -lah /dev/shm

If there are files starting with 'psm' and owned by you, these may be residual files that need to be cleaned up. Contact your systems administrator to remove these files.

It may also be the case that OOM killer has killed some child process, leaving the parent process sleeping (and therefore holding onto resources). You will need your system administrator's assistance to clean this up.

To check for sleeping (expam) processes, run

sudo lsof /dev/shm | grep "expam"

These sleeping processes should then be killed by running

kill -9 <PID>

Confirm that the leaked memory has been freed by running df -h /dev/shm.


Commands

A complete list of available commands can by found by using the -h/--help flags.

expam --version
expam --help
...

Bug Reports

Please raise any bug reports at https://github.com/seansolari/expam/issues accompanied by any error messages, a rough description of the database/setup and parameters used to create the database.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

expam-1.4.0.5.tar.gz (573.6 kB view details)

Uploaded Source

Built Distribution

expam-1.4.0.5-cp39-cp39-macosx_13_0_x86_64.whl (360.9 kB view details)

Uploaded CPython 3.9 macOS 13.0+ x86-64

File details

Details for the file expam-1.4.0.5.tar.gz.

File metadata

  • Download URL: expam-1.4.0.5.tar.gz
  • Upload date:
  • Size: 573.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.19

File hashes

Hashes for expam-1.4.0.5.tar.gz
Algorithm Hash digest
SHA256 7b89e363b95e8b04da05a5cc9c5b51ac29988a926e9ca510ca8735faae4a5cfb
MD5 248a86656ca453d4b6c9ff6ea7a022e2
BLAKE2b-256 f8f8fbfa3e79d7bb9ad81ac80753e4385782fcb4f77643b7c0ba4b6594338ff8

See more details on using hashes here.

File details

Details for the file expam-1.4.0.5-cp39-cp39-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for expam-1.4.0.5-cp39-cp39-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 908da54d2502fecf316cdc38d2c46057bdb882229b2a16c195ac047c5e6af7c5
MD5 ae8077b969b195bde535c04930ad8a45
BLAKE2b-256 602ed478dbbb2f2cc70ef14534d060c0de30ffeb31988b9516f65caca3984e2c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page