Skip to main content

Yet Another Metagenome Binner

Project description

pyYAMB

pyYAMB is an implementation of YAMB (Yet another metagenome binner) on Python (>=3.8). Originally YAMB was described in the preprint https://www.biorxiv.org/content/10.1101/521286.abstract and it's main points were use of tSNE and HDBSCAN to process tetramer frequencies and coverage depth of metagenome fragments. pyYAMB strives for parallel computing wherever possible, currently coverage depth extraction is single threaded and takes too much time.

pyYAMB data processing includes

  • contig filtering and fragmenting
  • read mapping with minimap2
  • mapping files processin and coverage depth extraction with pysam
  • k-mer (dy default tetramer) frequency calculation
  • data diminsions reduction with tSNE
  • data clustering with HDBSCAN
  • writing bins to FASTA

Features in far future

  • read processing
  • metagenome assembly
  • bin QC

How to start

Warning! pyYAMB now is in alpha-testing and may be unstable, use it at Your own risk.

Instalation

pyYAMB will be available in PyPI soon and in conda some time later. Until then you need to clone the repository and run python setup.py install. It installs pyYAMB and python libraries. Problems may appear with hdbscan module and cython. Just reinstall hdbscan using pip install hdbscan and try again python setup.py install.

Usage

pyYAMB entry point is the all-in-one command pyyamb. pyYAMB has two dozens of arguments, their description is available after running pyyamb -h

You may start from metagenome assembly and processed (quality trimmed etc.) reads, e.g.:

pyyamb -1 Sample_1.fastq.gz -2 Sample_2.fastq.gz -i assembly -o results/will/be/here

Results and benchmarks

pyYAMB will be tested on CAMI dataset soon. YAMB showed quality compared with CONCOCT binner (see the preprint for details).

References

Van Der Maaten, L. (2014). Accelerating t-SNE using tree-based algorithms. The Journal of Machine Learning Research, 15(1), 3221-3245.

Campello, R. J., Moulavi, D., & Sander, J. (2013, April). Density-based clustering based on hierarchical density estimates. In Pacific-Asia conference on knowledge discovery and data mining (pp. 160-172). Springer, Berlin, Heidelberg.

Li, H. (2018). Minimap2: pairwise alignment for nucleotide sequences. Bioinformatics, 34:3094-3100. https://dx.doi.org/10.1093/bioinformatics/bty191

Parks, D. H., Imelfort, M., Skennerton, C. T., Hugenholtz, P., & Tyson, G. W. (2015). CheckM: assessing the quality of microbial genomes recovered from isolates, single cells, and metagenomes. Genome research, 25(7), 1043-1055. https://dx.doi.org/10.1101/gr.186072.114

Cock, P. J., Antao, T., Chang, J. T., Chapman, B. A., Cox, C. J., Dalke, A., ... & De Hoon, M. J. (2009). Biopython: freely available Python tools for computational molecular biology and bioinformatics. Bioinformatics, 25(11), 1422-1423. https://doi.org/10.1093/bioinformatics/btp163

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyYAMB-0.1a0.linux-x86_64.tar.gz (19.7 kB view details)

Uploaded Source

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pyYAMB-0.1a0-py3.8.egg (23.3 kB view details)

Uploaded Egg

pyYAMB-0.1a0-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file pyYAMB-0.1a0.linux-x86_64.tar.gz.

File metadata

  • Download URL: pyYAMB-0.1a0.linux-x86_64.tar.gz
  • Upload date:
  • Size: 19.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.0

File hashes

Hashes for pyYAMB-0.1a0.linux-x86_64.tar.gz
Algorithm Hash digest
SHA256 a6287f199cf6e579bd6c3100359a6fd681352033246fdac6e2707a6be8f25d04
MD5 c37267a20454af2def8f940d5ef13e32
BLAKE2b-256 63a1dc92f083b9235d34b4ac7a29d661c3b3e307ba33b819073da68f36221b39

See more details on using hashes here.

File details

Details for the file pyYAMB-0.1a0-py3.8.egg.

File metadata

  • Download URL: pyYAMB-0.1a0-py3.8.egg
  • Upload date:
  • Size: 23.3 kB
  • Tags: Egg
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.0

File hashes

Hashes for pyYAMB-0.1a0-py3.8.egg
Algorithm Hash digest
SHA256 06598b3dfced8f6ab86201cc6f2ba18a3fe73964de15fc3573d42386654b3ee8
MD5 f038cc9bf373499ede7c54b0dd2bb774
BLAKE2b-256 3e6142ac6c13b95fdf6ec4e61783ec451f6945614e25743c262eb2e9444a44e8

See more details on using hashes here.

File details

Details for the file pyYAMB-0.1a0-py3-none-any.whl.

File metadata

  • Download URL: pyYAMB-0.1a0-py3-none-any.whl
  • Upload date:
  • Size: 11.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.8.0

File hashes

Hashes for pyYAMB-0.1a0-py3-none-any.whl
Algorithm Hash digest
SHA256 c70294ed7fdf5f390a57dd32b73e5a9b6ca6c64a579efecb740b526b81227507
MD5 cb79997a5f93ed71eb57f4933d8ce114
BLAKE2b-256 a0260ab8ce1b30b6afe2e5a21a1c6ef3ae20d6f064ff51d13f39d318239ded40

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page