Uses a randomForest model to predict which OTUs are present in a microbiome

These details have not been verified by PyPI

Project links

Project description

OTU_predictor

OTU_predictor uses a trained RandomForestClassifier ML model to predict 'real' OTU presence from ancient metagenomic samples (although it's use is not limited to ancient samples). The training dataset consists of 200 simulated populations generated through InSilicoSeq and deaminated using gargammel. Each population contains between 5 and 20 microbial species with know, variable abundance. OTU_predictor uses input files generated in centrifgure, specifically centrifugeReport.txt files.

Install package

OTU_predictor is currently running on python 3.11. It may run on earlier python versions also, but this has not been extensively tested. The easiest way to install OTU_predictor is using pip. Either of the following commands will do this:

pip install OTU-predictor

pip install git+https://github.com/DrATedder/OTU_predictor.git

Basic Usage

1. Converting your data

OTU_predictor works with centrifugeReport.txt files. Before you can run the model prediction step, some minor format teaks are required (see example output below). This can be done in the following way:

import OTU_predictor

centrifugeReport = "/path/to/your/file_centrifugeReport.txt"
OTU_predictor.convert_file(centrifugeReport)

If this step is successful, you will see a message similar to the one below:

'Data file /path/to/your/file_centrifugeReport_data.txt created'

The output data format should look something like this:

name	taxID	taxRank	genomeSize	numReads	numUniqueReads	abundance	genus
Bacteria	2	superkingdom	0	127	103	0.00026298841815572643	NA
Azorhizobium	6	genus	5369772	1	0	2.070774946108082e-06	Azorhizobium
Azorhizobium caulinodans	7	species	5369772	3	0	6.212324838324246e-06	Azorhizobium
Buchnera aphidicola	9	species	602805	3	1	6.212324838324246e-06	Buchnera
Cellulomonas gilvus	11	species	3526441	15	0	3.106162419162123e-05	Cellulomonas
Phenylobacterium	20	genus	4379231	1	0	2.070774946108082e-06	Phenylobacterium
Shewanella	22	genus	5140018	10	1	2.0707749461080822e-05	Shewanella
Shewanella putrefaciens	24	species	4749735	2	1	4.141549892216164e-06	Shewanella
Myxococcales	29	order	9638245	171	0	0.00035410251578448204	NA
Myxococcaceae	31	family	9636120	9	0	1.863697451497274e-05	NA
Myxococcus	32	genus	9487953	10	0	2.0707749461080822e-05	Myxococcus
Myxococcus xanthus	34	species	9139763	47	10	9.732642246707986e-05	Myxococcus
Myxococcus macrosporus	35	species	8973512	20	8	4.1415498922161644e-05	Myxococcus
Archangiaceae	39	family	10085598	11	0	2.2778524407188902e-05	NA
Stigmatella	40	genus	10260756	2	0	4.141549892216164e-06	Stigmatella
Stigmatella aurantiaca	41	species	10260756	1	0	2.070774946108082e-06	Stigmatella
Cystobacter	42	genus	0	1	0	2.070774946108082e-06	Cystobacter

Note. You will notice the addition of three new columns. these are variables used by the model during training, and while it is essential for them to be included for the file to be valid, they are not interpreted as part of the prediction step. It is also worth pointing out that the 'tab-delimitation' is replaced by 'comma-delimitation'.

2. Making predictions

The make_predictions() function uses your data file and the trained model which packages with this distribution. It is possible to create your own model if this is preferable though. The basic steps to make predictions are as follows:

converted_data = "/path/to/your/file_centrifugeReport_data.txt"

OTU_predictor.make_predictions(converted_data)

The output will be a list (of dictionaries) similar to the one shown below:

[{'Species': 'Neisseria mucosa', 'TaxID': 488, 'Prediction': 1, 'Certainty': 0.68},
{'Species': 'Streptococcus sanguinis', 'TaxID': 1305, 'Prediction': 1, 'Certainty': 0.72},
{'Species': 'Actinomyces sp. oral taxon 414', 'TaxID': 712122, 'Prediction': 1, 'Certainty': 0.97},
{'Species': 'Olsenella sp. oral taxon 807', 'TaxID': 712411, 'Prediction': 1, 'Certainty': 0.88},
{'Species': 'Anaerolineaceae bacterium oral taxon 439', 'TaxID': 1889813, 'Prediction': 1, 'Certainty': 0.87},
{'Species': 'Desulfobulbus oralis', 'TaxID': 1986146, 'Prediction': 1, 'Certainty': 0.84}]

Note. As you can see from the output list, OTU (species - although it can be at any taxonomic level determined by centrifuge) and taxID are given, along with a certainty score. These scores will be between 0 and 1, with higher scores indicating increased certainty. Prediction: 1 is OTU presence in the sample. The model also determines (but does not show) OTU absence (Prediction: 0).

Users should choose a certainty score that fits their experimental purpose.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.1.0

Jun 10, 2025

1.0

Dec 9, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

otu_predictor-1.1.0.tar.gz (948.7 kB view details)

Uploaded Jun 10, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

otu_predictor-1.1.0-py3-none-any.whl (5.9 kB view details)

Uploaded Jun 10, 2025 Python 3

File details

Details for the file otu_predictor-1.1.0.tar.gz.

File metadata

Download URL: otu_predictor-1.1.0.tar.gz
Upload date: Jun 10, 2025
Size: 948.7 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for otu_predictor-1.1.0.tar.gz
Algorithm	Hash digest
SHA256	`79934df220a0beb3f23de52542bc638f1d5767efd630c57e527083db57573dd5`
MD5	`cde2f73d4d8e0b7bd47bf1643779a67c`
BLAKE2b-256	`7a380920a15e24ddcd1a2279f76e1140f28834002eada58a1a9ecb01650696d8`

See more details on using hashes here.

File details

Details for the file otu_predictor-1.1.0-py3-none-any.whl.

File metadata

Download URL: otu_predictor-1.1.0-py3-none-any.whl
Upload date: Jun 10, 2025
Size: 5.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.3

File hashes

Hashes for otu_predictor-1.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e1f2d89d3c2f5857a1f9920fe81c464a6cccbfe5eef7a12e12cb5b35cc51e86c`
MD5	`2fa4208fe4cce79aa7c0598478fc94cc`
BLAKE2b-256	`dd2e39fd1069e79d9af4daad37498f16703dbf568238afb876a8576ef8d67d52`

See more details on using hashes here.

OTU-predictor 1.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

OTU_predictor

Install package

Basic Usage

1. Converting your data

2. Making predictions

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes