Skip to main content

Library for handling data in the GFA1 and GFA2 formats

Project description

The Graphical Fragment Assembly (GFA) are formats for the representation of sequence graphs, including assembly, variation and splicing graphs. Two versions of GFA have been defined (GFA1 and GFA2) and several sequence analysis programs have been adopting the formats as an interchange format, which allow to easily combine different sequence analysis tools.

This library implements the GFA1 and GFA2 specification described at https://github.com/GFA-spec/GFA-spec/blob/master/GFA-spec.md. It allows to create a Gfa object from a file in the GFA format or from scratch, to enumerate the graph elements (segments, links, containments, paths and header lines), to traverse the graph (by traversing all links outgoing from or incoming to a segment), to search for elements (e.g. which links connect two segments) and to manipulate the graph (e.g. to eliminate a link or a segment or to duplicate a segment distributing the read counts evenly on the copies).

The GFA format can be easily extended by users by defining own custom tags and record types. In Gfapy, it is easy to write extensions modules, which allow to define custom record types and datatypes for the parsing and validation of custom fields. The custom lines can be connected, using references, to each other and to lines of the standard record types.

Requirements

Gfapy has been written for Python 3 and tested using Python version 3.3. It does not require any additional Python packages or other software.

Installation

Gfapy is distributed as a Python package and can be installed using the Python package manager pip, as well as conda (in the Bioconda channel).

The following command installs the current stable version from the Python Packages index:

pip install gfapy

If you would like to install the current development version from Github, use the following command:

pip install -e git+https://github.com/ggonnella/gfapy.git#egg=gfapy

Alternatively it is possible to install gfapy using conda. Gfapy is included in the Bioconda (https://bioconda.github.io/) channel:

conda install -c bioconda gfapy

Usage

If you installed gfapy as described above, you can import it in your script using the conventional Python syntax:

>>> import gfapy

Documentation

The documentation, including this introduction to Gfapy, an user manual and the API documentation is hosted on the ReadTheDocs server, at the URL http://gfapy.readthedocs.io/en/latest/ and it can be downloaded as PDF from the URL https://github.com/ggonnella/gfapy/blob/master/manual/gfapy-manual.pdf.

References

Giorgio Gonnella and Stefan Kurtz “GfaPy: a flexible and extensible software library for handling sequence graphs in Python”, Bioinformatics (2017) btx398 https://doi.org/10.1093/bioinformatics/btx398

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
gfapy-1.0.0-py3-none-any.whl (228.9 kB) Copy SHA256 hash SHA256 Wheel py3 Jun 28, 2017
gfapy-1.0.0.tar.gz (371.7 kB) Copy SHA256 hash SHA256 Source None Jun 28, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page