Skip to main content

Convert GFF3-formatted data to BED format

Project description

gff2bed

Overview

GFF3 and BED are common formats for storing the coordinates of genomic features such as genes. GFF3 format is more versatile, but BED format is simpler and enjoys a rich ecosystem of utilities such as bedtools. For this reason, it is often convenient to store genomic features in GFF3 format and convert them to BED format for genome arithmetic.

This module provides two convenience functions to streamline converting data from GFF3 to BED format for bioinformatics analysis: parse(), which reads data from a GFF3 file, and convert(), which converts GFF3-formatted data to BED-formatted data that can be passed on e.g. to pybedtools.

Documentation

See full online documentation at http://salk-tm.gitlab.io/gff2bed

Installation

With conda

gff2bed is available from bioconda, and can be installed with conda

conda install -c bioconda gff2bed

With pip

gff2bed is available from PyPI, and can be installed with pip

pip install gff2bed

Tutorial

To follow this tutorial, first ensure you have the following modules installed in addition to gff2bed:

This tutorial will involve working with some files on disk, so we'll make a temporary directory for easy cleanup later.

from tempfile import TemporaryDirectory
temp_dir = TemporaryDirectory()

Next, download an example GFF3 file

import urllib3
import shutil
import os.path
GFF3_URL = 'https://gitlab.com/salk-tm/gff2bed/-/raw/main/test/data/ColCEN_AT1G01010-20_TAIR10.gff3.gz'
GFF3_FILE = os.path.join(temp_dir.name, 'ColCEN_AT1G01010-20_TAIR10.gff3.gz')
http = urllib3.PoolManager()
with http.request('GET', GFF3_URL, preload_content=False) as r, open(GFF3_FILE, 'wb') as dest_file:
    shutil.copyfileobj(r, dest_file)

To read the GFF3 file into a Pandas data frame without converting to BED, use gff2bed.parse()

import pandas as pd
import gff2bed
gff_data = pd.DataFrame(gff2bed.parse(GFF3_FILE))
gff_data.head()
      0     1      2  3                                                  4
0  Chr1  7489   9757  +  {'ID': 'AT1G01010', 'Note': 'protein_coding_ge...
1  Chr1  9786  12596  -  {'ID': 'AT1G01020', 'Note': 'protein_coding_ge...

Note: The implementation of gff2bed follows a philosophy of simplicity. It depends on nothing but the built-in python libraries, and it includes nothing but the parse() and convert() functions. Typically when applying gff2bed in practice, you will use it in conjunction with other modules such as pandas or pybedtools.

To create a data frame of BED formatted data, pass the stream to gff2bed.convert() before passing to pd.DataFrame()

bed_data = pd.DataFrame(gff2bed.convert(gff2bed.parse(GFF3_FILE)))
bed_data.head()
      0     1      2          3  4  5
0  Chr1  7488   9757  AT1G01010  0  +
1  Chr1  9785  12596  AT1G01020  0  -

You can similarly create a BedTool with pybedtools

from pybedtools import BedTool
bed_data = BedTool(gff2bed.convert(gff2bed.parse(GFF3_FILE))).saveas()
bed_data.head()
Chr1    7488    9757    AT1G01010       0       +
 Chr1   9785    12596   AT1G01020       0       -

To complete the tutorial, clean up the temporary directory

temp_dir.cleanup()

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gff2bed-1.0.3.tar.gz (5.0 kB view details)

Uploaded Source

Built Distribution

gff2bed-1.0.3-py3-none-any.whl (5.5 kB view details)

Uploaded Python 3

File details

Details for the file gff2bed-1.0.3.tar.gz.

File metadata

  • Download URL: gff2bed-1.0.3.tar.gz
  • Upload date:
  • Size: 5.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for gff2bed-1.0.3.tar.gz
Algorithm Hash digest
SHA256 8766ad23a73986688c0641c7121e2ba8b34daf483a66329d6991ceb05d76c9f3
MD5 6b4d97877517576952841175815d29c5
BLAKE2b-256 8ec317cdc391be7d13fc064d9c2f247f66da7e2d37e28d1348be0ec23daafd95

See more details on using hashes here.

File details

Details for the file gff2bed-1.0.3-py3-none-any.whl.

File metadata

  • Download URL: gff2bed-1.0.3-py3-none-any.whl
  • Upload date:
  • Size: 5.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10

File hashes

Hashes for gff2bed-1.0.3-py3-none-any.whl
Algorithm Hash digest
SHA256 e4399efb633b64e09016367b772745b045013d36d823434457ece2f0e2b3319a
MD5 4ac48096d6ab894ab576ef06f2b8f017
BLAKE2b-256 6573a2e955eda2100d15defce1b68982c7c6dcbfbba924e2b5d6d47847653287

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page