Skip to main content

A package to parse, organize, calculate, and save data from metagenomic profile files

Project description

MeTEA

(Metagenomic Taxa Evaluation and Assessment) Parse, organize, calculate and save data from metagenomic profile files.

This package reads the data from profile files (with the '.profile' extension), calculates each Tax ID's confusion matrix for each tool, creates dendrograms based on taxa levels and confusion matrix metrics, then saves the confusion matrix data as an excel file and the dendrograms as png files.

For this package, you only need to use the main() function from the 'Misc' class in 'precall.'

Confusion matrices have four metrics: True Positives, False Negatives, False Positives, and True Negatives.True Positives are calculated by adding up how many samples a Tax ID is predicted to be in when it is truly there (per the ground truth profile). False Negatives are calculated by adding up the number of samples that truly have that Tax ID (per the ground truth profile) but are missing from the predicted profile. False Positives are calculated by adding up the number of samples that are predicted to have a Tax ID but are missing from the ground truth profile. True Negatives are calculated by adding up the amount of samples that a Tax ID is not predicted in and is not in the ground truth profile. This is iterated over every Tax ID in each predicted profile. Each metric of the confusion matrix is saved to a separate sheet in the Excel file.

Precision and recall are also included in the excel file.

Afterwards, dendrograms are created for each taxa level based on the bray curtis similarity of each tool. Those are then saved as png files in the same directory as the confusion matrix excel file.

The ground truth file, output excel file name, input file directory for all profiles, and output file directory are all specified when calling the main() function with an object from the ‘Misc’ class in ‘precall’ (all profile files should be in the same directory, including the ground truth). For example, if your predicted profile files and the ground truth file are in a folder called “inputs” and the ground truth file is called “ground_truth.profile” and you want to save the output files to a folder called “outputs,” the command should be:

from MeTEA.precall import Misc
Quick = Misc()
Quick.main(ground_truth.profile, TaxaEvaluation_byTool, C:\\Users\\user\\inputs, C:\\Users\\user\\outputs)

The main function takes in five arguments, three of which are optional:

  • Input: the name of the ground truth profile file
  • Output: the excel file name, a .xlsx of six sheets: True Positives, False Negatives, False Positives, True Negatives, Precision, and Recall of each tool
  • (optional) input directory of all profiles, including the ground truth; <Default: Directory of Package Manual>
  • (optional) the output directory; <Default: Directory of Package Manual>
  • (optional) "yes" if you want individual .csv files of each tool's confusion matrix; <Default: "no">

A spreadsheet and heat map of the top Taxa based on difficulty and a metric can also me made.

Quick.get_top_taxid(3, "tp", "easy", "yes")
Quick.create_heat_map("Top_Easy-TP_taxid.xlsx")

Arguments <get_top_taxid()>:

  • Input: The number of Tax IDs to include
  • (optional) The metric to evaluate by; <Default: "tp" (True Positives)>
  • (optional) The difficulty level; <Default: "eas">
  • (optional) "yes" if you want to only include Tax IDs present in the ground truth profile, "no" if you want to include Tax IDs found in the ground truth and predicted profiles; <Default: "yes">

The output is a .xlsx file containing the Tax IDs. It's name is formatted like this: 'Top_Difficulty-METRIC_taxid.xlsx'

Arguments <create_heat_map()>:

  • Input: The name of the spreadsheet containing the list of Tax IDs. They should be formatted into a column labeled 'Tax ID'.

The output is a heat map with a dnedrogram at the top. It's name is formatted as the name of the input file with the extentsion replaced with '_Heat_Map.png'.

You can find a quick start example and more details on the package in the jupyter note book 'Package Manual'.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MeTEA-0.0.4.tar.gz (27.3 kB view details)

Uploaded Source

Built Distribution

MeTEA-0.0.4-py3-none-any.whl (28.6 kB view details)

Uploaded Python 3

File details

Details for the file MeTEA-0.0.4.tar.gz.

File metadata

  • Download URL: MeTEA-0.0.4.tar.gz
  • Upload date:
  • Size: 27.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.0

File hashes

Hashes for MeTEA-0.0.4.tar.gz
Algorithm Hash digest
SHA256 6955d24b85bb550197b1241da0d64a5c2105857ee2614316343362b9fa82f44e
MD5 65b1e020b2027721b7d3a301e59c9a55
BLAKE2b-256 abab9d7d49b74a56e0c885d249dbb9e235c1187ae23cc9b3ba41860f1300a678

See more details on using hashes here.

File details

Details for the file MeTEA-0.0.4-py3-none-any.whl.

File metadata

  • Download URL: MeTEA-0.0.4-py3-none-any.whl
  • Upload date:
  • Size: 28.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.1 CPython/3.8.0

File hashes

Hashes for MeTEA-0.0.4-py3-none-any.whl
Algorithm Hash digest
SHA256 b71a87fd465d26470137b63f6f5cef8645fb8398795a58cb39c5c2964ba62487
MD5 c8a85e0371ec73ee471ccde546650ec3
BLAKE2b-256 4968a9b839904733c966562c07586c21e1480a935d4fbe890fd031d91a86ec5a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page