Skip to main content

Your all-inclusive package for aggregating and visualizing metagenomic BLAST results.

Project description

metagenompy

PyPI Tests

Your all-inclusive package for aggregating and visualizing metagenomic BLAST results.

Installation

$ pip install metagenompy

Usage

Summary statistics for BLAST results

After blasting your reads against a sequence database, generating summary reports using metagenompy is a blast.

import metagenompy
import pandas as pd


# read BLAST results file with columns 'qseqid' and 'staxids'
df_blast = metagenompy.load_example_dataset()
df = (df_blast.set_index('qseqid')['staxids']
              .str.split(';')
              .explode()
              .dropna()
              .reset_index()
              .rename(columns={'staxids': 'taxid'})
)

df.head()
##   qseqid    taxid
## 0  read1  1811693
## 1  read2   327160
## 2  read3      821
## 3  read4  1871047
## 4  read5    69360

# classify taxons at multiple ranks
graph = metagenompy.generate_taxonomy_network(auto_download=True)

rank_list = ['species', 'genus', 'class', 'superkingdom']
df = metagenompy.classify_dataframe(
    graph, df,
    rank_list=rank_list
)

# aggregate read matches
agg_rank = 'genus'
df_agg = metagenompy.aggregate_classifications(df, agg_rank)

df_agg.head()
##            taxid                        species           genus                class superkingdom
## qseqid
## read1    1811693  Pelotomaculum sp. PtaB.Bin104   Pelotomaculum           Clostridia     Bacteria
## read10   2488860         Erythrobacter spongiae   Erythrobacter  Alphaproteobacteria     Bacteria
## read100    78398      Pectobacterium odoriferum  Pectobacterium  Gammaproteobacteria     Bacteria
## read101  1843082           Macromonas sp. BK-30      Macromonas   Betaproteobacteria     Bacteria
## read102  2665644      Paracoccus sp. YIM 132242      Paracoccus  Alphaproteobacteria     Bacteria

# visualize outcome
metagenompy.plot_piechart(df_agg)

NCBI taxonomy as NetworkX object

The core of metagenompy is a taxonomy as a networkX object. This means that all your favorite algorithms work right out of the box.

import metagenompy
import networkx as nx


# load taxonomy
graph = metagenompy.generate_taxonomy_network(auto_download=True)

# print path from human to pineapple
for node in nx.shortest_path(graph.to_undirected(as_view=True), '9606', '4615'):
    print(node, graph.nodes[node])
## 9606 {'rank': 'species', 'authority': 'Homo sapiens Linnaeus, 1758', 'scientific_name': 'Homo sapiens', 'genbank_common_name': 'human', 'common_name': 'man'}
## 9605 {'rank': 'genus', 'authority': 'Homo Linnaeus, 1758', 'scientific_name': 'Homo', 'common_name': 'humans'}
## [..]
## 4614 {'rank': 'genus', 'authority': 'Ananas Mill., 1754', 'scientific_name': 'Ananas'}
## 4615 {'rank': 'species', 'authority': ['Ananas comosus (L.) Merr., 1917', 'Ananas lucidus Mill., 1754'], 'scientific_name': 'Ananas comosus', 'synonym': ['Ananas comosus var. comosus', 'Ananas lucidus'], 'genbank_common_name': 'pineapple'}

Easy transformation and visualization of taxonomic tree

Extract taxonomic entities of interest and visualize their relations:

import metagenompy
import matplotlib.pyplot as plt


# load and condense taxonomy to relevant ranks
graph = metagenompy.generate_taxonomy_network(auto_download=True)
metagenompy.condense_taxonomy(graph)

# highlight interesting nodes
graph_zoom = metagenompy.highlight_nodes(graph, [
    '9606',  # human
    '9685',  # cat
    '9615',  # dog
    '4615',  # pineapple
    '3747',  # strawberry
    '4113',  # potato
])

# visualize result
fig, ax = plt.subplots(figsize=(10, 10))
metagenompy.plot_network(graph_zoom, ax=ax, labels_kws=dict(font_size=10))
fig.tight_layout()
fig.savefig('taxonomy.pdf')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metagenompy-0.4.4.tar.gz (15.0 kB view hashes)

Uploaded Source

Built Distribution

metagenompy-0.4.4-py3-none-any.whl (13.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page