Skip to main content

Classification and prediction of the origin of metagenomic samples

Project description

Build Status Coverage Status Anaconda-Server Badge Documentation Status DOI DOI


Sourcepredict is a Python package distributed through Conda, to classify and predict the origin of metagenomic samples, given a reference dataset of known origins, a problem also known as source tracking. Sourcepredict solves this problem by using machine learning classification on dimensionally reduced datasets.

Installation

With conda (recommended)

$ conda install -c conda-forge -c maxibor sourcepredict

With pip

$ pip install sourcepredict

Example

Input

Usage

$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/test/dog_test_sink_sample.csv -O dog_example.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_labels.csv -O sp_labels.csv
$ wget https://raw.githubusercontent.com/maxibor/sourcepredict/master/data/modern_gut_microbiomes_sources.csv -O sp_sources.csv
$ sourcepredict -s sp_sources.csv -l sp_labels.csv dog_example.csv
Step 1: Checking for unknown proportion
  == Sample: ERR1915662 ==
	Adding unknown
	Normalizing (GMPR)
	Computing Bray-Curtis distance
	Performing MDS embedding in 2 dimensions
	KNN machine learning
	Training KNN classifier on 2 cores...
	-> Testing Accuracy: 1.0
	----------------------
	- Sample: ERR1915662
		 known:98.61%
		 unknown:1.39%
Step 2: Checking for source proportion
	Computing weighted_unifrac distance on species rank
	TSNE embedding in 2 dimensions
	KNN machine learning
	Performing 5 fold cross validation on 2 cores...
	Trained KNN classifier with 10 neighbors
	-> Testing Accuracy: 0.99
	----------------------
	- Sample: ERR1915662
		 Canis_familiaris:96.1%
		 Homo_sapiens:2.47%
		 Soil:1.43%
Sourcepredict result written to dog_test_sample.sourcepredict.csv

Output

Sourcepredict output the predicted source contribution to each sink sample, and the embedding of all samples in the lower dimensional space. See documentation for details.

Runtime

Depending on the normalization method (-n), the embedding (-me) method, the cpus available for parallel processing (-t), and the data, the runtime should be between a few seconds and a few minutes per sink sample.

Documentation

The documentation of SourcePredict is available here: sourcepredict.readthedocs.io

Sourcepredict example files

Environments included in the example source file

  • Homo sapiens gut microbiome (1, 2, 3, 4, 5, 6)
  • Canis familiaris gut microbiome (1)
  • Soil microbiome (1, 2, 3)

Contributing Code, Documentation, or Feedback

If you wish to contribute to Sourcepredict, you are welcome and encouraged to contribute by opening an issue, or creating a pull-request. All contributions will be made under the GPLv3 license. More informations can found on the contributing page.

How to cite

Sourcepredict has been published in JOSS.

@article{Borry2019Sourcepredict,
	journal = {Journal of Open Source Software},
	doi = {10.21105/joss.01540},
	issn = {2475-9066},
	number = {41},
	publisher = {The Open Journal},
	title = {Sourcepredict: Prediction of metagenomic sample sources using dimension reduction followed by machine learning classification},
	url = {http://dx.doi.org/10.21105/joss.01540},
	volume = {4},
	author = {Borry, Maxime},
	pages = {1540},
	date = {2019-09-04},
	year = {2019},
	month = {9},
	day = {4}
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sourcepredict-0.5.1.tar.gz (26.6 kB view details)

Uploaded Source

Built Distribution

sourcepredict-0.5.1-py3-none-any.whl (26.3 kB view details)

Uploaded Python 3

File details

Details for the file sourcepredict-0.5.1.tar.gz.

File metadata

  • Download URL: sourcepredict-0.5.1.tar.gz
  • Upload date:
  • Size: 26.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for sourcepredict-0.5.1.tar.gz
Algorithm Hash digest
SHA256 6e3e5a418e73f55e6a518e53d5f28838bcdad6307e481d755ad907d2a76b74c9
MD5 8e8a054fc2a1ad9b2fe938827941ba71
BLAKE2b-256 79ccf93fe258f3b994c371dfea39136a370314a87edb4d5a7d9ba4d25c60735f

See more details on using hashes here.

File details

Details for the file sourcepredict-0.5.1-py3-none-any.whl.

File metadata

File hashes

Hashes for sourcepredict-0.5.1-py3-none-any.whl
Algorithm Hash digest
SHA256 9274508874034b8905e9f380bb7129d31590be481a5ec397defdee26156d4f5b
MD5 e39a6576a7a0301b3a857b7dc0a979dc
BLAKE2b-256 a1c7caa7f64a925221baacc87cde7a0b02d2dca8d9ca3d7497ee0ccae4ed18f7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page