Skip to main content

Classification and prediction of the origin of metagenomic samples

Project description

Build Status Coverage Status Anaconda-Server Badge Documentation Status DOI DOI


Sourcepredict is a Python package distributed through Conda, to classify and predict the origin of metagenomic samples, given a reference dataset of known origins, a problem also known as source tracking. Sourcepredict solves this problem by using machine learning classification on dimensionally reduced datasets.

Installation

With conda (recommended)

$ conda install -c conda-forge -c maxibor sourcepredict

With pip

$ pip install sourcepredict

Example

Input

Usage

$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sink_sample.csv -O dog_example.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_labels.csv -O sp_labels.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_sources.csv -O sp_sources.csv
$ sourcepredict -s sp_sources.csv -l sp_labels.csv dog_example.csv
Step 1: Checking for unknown proportion
  == Sample: ERR1915662 ==
	Adding unknown
	Normalizing (GMPR)
	Computing Bray-Curtis distance
	Performing MDS embedding in 2 dimensions
	KNN machine learning
	Training KNN classifier on 2 cores...
	-> Testing Accuracy: 1.0
	----------------------
	- Sample: ERR1915662
		 known:98.61%
		 unknown:1.39%
Step 2: Checking for source proportion
	Computing weighted_unifrac distance on species rank
	TSNE embedding in 2 dimensions
	KNN machine learning
	Performing 5 fold cross validation on 2 cores...
	Trained KNN classifier with 10 neighbors
	-> Testing Accuracy: 0.99
	----------------------
	- Sample: ERR1915662
		 Canis_familiaris:96.1%
		 Homo_sapiens:2.47%
		 Soil:1.43%
Sourcepredict result written to dog_test_sample.sourcepredict.csv

Output

Sourcepredict output the predicted source contribution to each sink sample, and the embedding of all samples in the lower dimensional space. See documentation for details.

Runtime

Depending on the normalization method (-n), the embedding (-me) method, the cpus available for parallel processing (-t), and the data, the runtime should be between a few seconds and a few minutes per sink sample.

Documentation

The documentation of SourcePredict is available here: sourcepredict.readthedocs.io

Sourcepredict example files

Environments included in the example source file

  • Homo sapiens gut microbiome (1, 2, 3, 4, 5, 6)
  • Canis familiaris gut microbiome (1)
  • Soil microbiome (1, 2, 3)

Contributing Code, Documentation, or Feedback

If you wish to contribute to Sourcepredict, you are welcome and encouraged to contribute by opening an issue, or creating a pull-request. All contributions will be made under the GPLv3 license. More informations can found on the contributing page.

How to cite

Sourcepredict has been published in JOSS.

@article{Borry2019Sourcepredict,
	journal = {Journal of Open Source Software},
	doi = {10.21105/joss.01540},
	issn = {2475-9066},
	number = {41},
	publisher = {The Open Journal},
	title = {Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification},
	url = {http://dx.doi.org/10.21105/joss.01540},
	volume = {4},
	author = {Borry, Maxime},
	pages = {1540},
	date = {2019-09-04},
	year = {2019},
	month = {9},
	day = {4}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcepredict-0.5.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

sourcepredict-0.5-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file sourcepredict-0.5.tar.gz.

File metadata

  • Download URL: sourcepredict-0.5.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for sourcepredict-0.5.tar.gz
Algorithm Hash digest
SHA256 d6eb89492b266c33fdf61121de849fdcdc1bdcd692227f559c63f74a9059d3ad
MD5 136388c09901cf842676ec2950444748
BLAKE2b-256 2e68e6b41b0fc16bb785095fa51c4f047d775fca1b64c7c192a6f45916c66592

See more details on using hashes here.

File details

Details for the file sourcepredict-0.5-py3-none-any.whl.

File metadata

  • Download URL: sourcepredict-0.5-py3-none-any.whl
  • Upload date:
  • Size: 26.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/45.2.0 requests-toolbelt/0.9.1 tqdm/4.43.0 CPython/3.7.6

File hashes

Hashes for sourcepredict-0.5-py3-none-any.whl
Algorithm Hash digest
SHA256 89bf5f33b67fd666c33a4a946ff561845b8f8b684c0ce8ea303ea00953118690
MD5 7daff5478e8c9e575592f592d5aac9d0
BLAKE2b-256 5e1942b014da61b11013aca0b2fd61d2f0bde541f53f320a831c011e91d4a2a6

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page