package to manipulate pangenomes produced by PanGraph

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

PyPangraph

This repository contains a collection of utilities to load, explore and analyze pangrenome graphs produced by PanGraph.

The package can be installed via pip, see the documentation:

pip install pypangraph

Below are some examples showcasing some of the main functions in the package. More detailed information and examples can be found in the documentation.

Loading and interacting with pangraph objects:

A pangraph object can be loaded from a json file with:

# load the library
import pypangraph as pp
# this returns a pangraph object
graph = pp.Pangraph.from_json("path/to/pangraph.json")

Pangraph objects have three main properties: blocks, paths, and nodes.

blocks encode for alignments of homologous sequences across genomes. Each entry in the alignment is a node.
paths encode genomes as a list of nodes.
nodes connect paths and blocks.

See the documentation for more details on this data structure.

These elements are contained in different properties of the graph:

graph.blocks # dictionary of block ids -> block objects
graph.paths # dictionary of path ids -> path objects
graph.nodes # dataframe of nodes

And can be accessed by either iterating through the items or via their ids.

Block object

Block object have these properties/methods:

# get a block by its id
block = graph.blocks[12252014572476775186]

block.id # unique id of the block
block.consensus() # block consensus sequence
block.depth() # n. of nodes in the block alignment
block.to_biopython_alignment() # returns a biopython alignment object

Path object

# paths can be accessed by their name
path = graph.paths['NZ_CP014647']
# or by their numerical id
path = graph.paths.list[4]

path.id # path numerical id
path.name # path name
path.nuc_len # total length of the path in nucleotides
path.circular # whether the path is circular or linear
path.nodes # list of node ids in the path

Node object

The nodes property of the pangraph is a pandas dataframe with the following columns:

graph.nodes
                    
# node_id                           block_id path_id strand  start    end
# 11484376918084368      6227233701292645975      12   True  87911  88000
# 31660532043830364     12252014572476775186       7  False  88597  91675
# 35440216894469496      5326177636996110751      12   True  12629  13292
# ...

node_id is the unique id of the node
block_id is the id of the block the node belongs to
path_id is the id of the path the node belongs to
strand is a boolean indicating whether the node occurs in forward (True) or reverse (False) orientation in the path.
start and end are the start and end positions of the node in the input genome.

Nodes can be accessed by their id:

node = graph.nodes[12252014572476775186]

Block statistics

The function to_blockstats_df() returns a pandas dataframe containing summary statistics on the blocks, indexed by block id:

graph.to_blockstats_df()

# block_id              count  n_strains  duplicated   core   len
# 124231456905500231       15         15       False   True  2202
# 149501466629434994        2          2       False  False   210
# 279570196774736738        4          2        True  False  1308
# ...                     ...        ...         ...    ...   ...

count is the number of entries in the block alignment, i.e. the total number of times the block appears in all paths.
n_strains is the number of unique paths in which the block appears. It can be different from count if the block is duplicated in some paths.
duplicated indicates whether the block is duplicated in any paths.
core indicates whether the block is core, i.e. present exactly once in every path.
len is the length of the block consensus sequence in basepairs.

Block count matrix

The function to_blockcount_df() returns a pandas dataframe whose columns are path ids, and indices are block ids. The entries are the number of times a block is present in a given strain.

graph.to_blockcount_df()
# path_id               RCS48_p1  RCS49_p1  RCS64_p2  ...
# block_id
# 124231456905500231           1         1         1  ...
# 149501466629434994           0         1         0  ...
# 279570196774736738           0         0         2  ...
# ...                        ...       ...       ...  ...

Project details

These details have not been verified by PyPI

Project links

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

1.0.1

Feb 13, 2025

1.0.0

Feb 12, 2025

0.1.3

Aug 27, 2024

0.1.2

Aug 27, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pypangraph-1.0.1.tar.gz (18.9 kB view details)

Uploaded Feb 13, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pypangraph-1.0.1-py3-none-any.whl (18.4 kB view details)

Uploaded Feb 13, 2025 Python 3

File details

Details for the file pypangraph-1.0.1.tar.gz.

File metadata

Download URL: pypangraph-1.0.1.tar.gz
Upload date: Feb 13, 2025
Size: 18.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pypangraph-1.0.1.tar.gz
Algorithm	Hash digest
SHA256	`845e7febf61d87dd49607ce10da188f65c923c3c667d9bcba338c2a598bffbcc`
MD5	`006b2a6161ff731210181c610b8caa01`
BLAKE2b-256	`b041d235bff5c6b7c878c3d4f61984473d65695e07b9aec9a81b2f84f776f4be`

See more details on using hashes here.

File details

Details for the file pypangraph-1.0.1-py3-none-any.whl.

File metadata

Download URL: pypangraph-1.0.1-py3-none-any.whl
Upload date: Feb 13, 2025
Size: 18.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for pypangraph-1.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6c9079da693b1accd9324235daf8033c68570ba4586b1a0fe9d357f67d825c28`
MD5	`69cedb0cca84c34ec0eed6828cfda292`
BLAKE2b-256	`62ef2abb4681fcac172e8cdcd5f92c3a8cb943c8579b09947ecf4cc94190ca03`

See more details on using hashes here.

pypangraph 1.0.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

PyPangraph

Loading and interacting with pangraph objects:

Block object

Path object

Node object

Block statistics

Block count matrix

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes