Skip to main content

package to manipulate pangenomes produced by PanGraph

Project description

PyPangraph

This repository contains a collection of utilities to load, explore and analyze pangrenome graphs produced by PanGraph.

The package can be installed via pip, see the documentation:

pip install pypangraph

Below are some examples showcasing some of the main functions in the package. More detailed information and examples can be found in the documentation.

Loading and interacting with pangraph objects:

A pangraph object can be loaded from a json file with:

# load the library
import pypangraph as pp
# this returns a pangraph object
graph = pp.Pangraph.from_json("path/to/pangraph.json")

Pangraph objects have three main properties: blocks, paths, and nodes.

  • blocks encode for alignments of homologous sequences across genomes. Each entry in the alignment is a node.
  • paths encode genomes as a list of nodes.
  • nodes connect paths and blocks.

See the documentation for more details on this data structure.

These elements are contained in different properties of the graph:

graph.blocks # dictionary of block ids -> block objects
graph.paths # dictionary of path ids -> path objects
graph.nodes # dataframe of nodes

And can be accessed by either iterating through the items or via their ids.

Block object

Block object have these properties/methods:

# get a block by its id
block = graph.blocks[12252014572476775186]

block.id # unique id of the block
block.consensus() # block consensus sequence
block.depth() # n. of nodes in the block alignment
block.to_biopython_alignment() # returns a biopython alignment object

Path object

# paths can be accessed by their name
path = graph.paths['NZ_CP014647']
# or by their numerical id
path = graph.paths.list[4]

path.id # path numerical id
path.name # path name
path.nuc_len # total length of the path in nucleotides
path.circular # whether the path is circular or linear
path.nodes # list of node ids in the path

Node object

The nodes property of the pangraph is a pandas dataframe with the following columns:

graph.nodes
                    
# node_id                           block_id path_id strand  start    end
# 11484376918084368      6227233701292645975      12   True  87911  88000
# 31660532043830364     12252014572476775186       7  False  88597  91675
# 35440216894469496      5326177636996110751      12   True  12629  13292
# ...
  • node_id is the unique id of the node
  • block_id is the id of the block the node belongs to
  • path_id is the id of the path the node belongs to
  • strand is a boolean indicating whether the node occurs in forward (True) or reverse (False) orientation in the path.
  • start and end are the start and end positions of the node in the input genome.

Nodes can be accessed by their id:

node = graph.nodes[12252014572476775186]

Block statistics

The function to_blockstats_df() returns a pandas dataframe containing summary statistics on the blocks, indexed by block id:

graph.to_blockstats_df()

# block_id              count  n_strains  duplicated   core   len
# 124231456905500231       15         15       False   True  2202
# 149501466629434994        2          2       False  False   210
# 279570196774736738        4          2        True  False  1308
# ...                     ...        ...         ...    ...   ...
  • count is the number of entries in the block alignment, i.e. the total number of times the block appears in all paths.
  • n_strains is the number of unique paths in which the block appears. It can be different from count if the block is duplicated in some paths.
  • duplicated indicates whether the block is duplicated in any paths.
  • core indicates whether the block is core, i.e. present exactly once in every path.
  • len is the length of the block consensus sequence in basepairs.

Block count matrix

The function to_blockcount_df() returns a pandas dataframe whose columns are path ids, and indices are block ids. The entries are the number of times a block is present in a given strain.

graph.to_blockcount_df()
# path_id               RCS48_p1  RCS49_p1  RCS64_p2  ...
# block_id
# 124231456905500231           1         1         1  ...
# 149501466629434994           0         1         0  ...
# 279570196774736738           0         0         2  ...
# ...                        ...       ...       ...  ...

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypangraph-1.1.0.tar.gz (42.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pypangraph-1.1.0-py3-none-any.whl (34.3 kB view details)

Uploaded Python 3

File details

Details for the file pypangraph-1.1.0.tar.gz.

File metadata

  • Download URL: pypangraph-1.1.0.tar.gz
  • Upload date:
  • Size: 42.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pypangraph-1.1.0.tar.gz
Algorithm Hash digest
SHA256 0c920dd8065b79c598d6061f1a2a322909fcc65b00b43e262329cab739f8dc66
MD5 5683603c01602c1ec0948a5109f4ffbc
BLAKE2b-256 db9b74f1394c5be0683712fe2ceea22db453d3dc519853eda0232f3008b81fce

See more details on using hashes here.

File details

Details for the file pypangraph-1.1.0-py3-none-any.whl.

File metadata

  • Download URL: pypangraph-1.1.0-py3-none-any.whl
  • Upload date:
  • Size: 34.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.14.5

File hashes

Hashes for pypangraph-1.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 6752494fb46b7a249f4fa3a5d1c121af6ddd2a311cba19f3d87215e25fa9acbd
MD5 c4348e18ee13db9636f62e725f44dec0
BLAKE2b-256 3ddaebeba8463caf33f5f9235f97c52f1fefc5934375ebe94a2a8719f6917c09

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page