Convert GFF3-formatted data to BED format
Project description
gff2bed
Overview
GFF3 and BED are common formats for storing the coordinates of genomic features such as genes. GFF3 format is more versatile, but BED format is simpler and enjoys a rich ecosystem of utilities such as bedtools. For this reason, it is often convenient to store genomic features in GFF3 format and convert them to BED format for genome arithmetic.
This module provides two convenience functions to streamline converting data from GFF3 to BED format for bioinformatics analysis: parse()
, which reads data from a GFF3 file, and convert()
, which converts GFF3-formatted data to BED-formatted data that can be passed on e.g. to pybedtools.
Documentation
See full online documentation at http://salk-tm.gitlab.io/gff2bed
Installation
With conda
gff2bed
is available from bioconda, and can be installed with conda
conda install -c bioconda gff2bed
With pip
gff2bed
is available from PyPI, and can be installed with pip
pip install gff2bed
Tutorial
To follow this tutorial, first ensure you have the following modules installed
in addition to gff2bed
:
This tutorial will involve working with some files on disk, so we'll make a temporary directory for easy cleanup later.
from tempfile import TemporaryDirectory
temp_dir = TemporaryDirectory()
Next, download an example GFF3 file
import urllib3
import shutil
import os.path
GFF3_URL = 'https://gitlab.com/salk-tm/gff2bed/-/raw/main/test/data/ColCEN_AT1G01010-20_TAIR10.gff3.gz'
GFF3_FILE = os.path.join(temp_dir.name, 'ColCEN_AT1G01010-20_TAIR10.gff3.gz')
http = urllib3.PoolManager()
with http.request('GET', GFF3_URL, preload_content=False) as r, open(GFF3_FILE, 'wb') as dest_file:
shutil.copyfileobj(r, dest_file)
To read the GFF3 file into a Pandas data frame without converting to BED, use gff2bed.parse()
import pandas as pd
import gff2bed
gff_data = pd.DataFrame(gff2bed.parse(GFF3_FILE))
gff_data.head()
0 1 2 3 4
0 Chr1 7489 9757 + {'ID': 'AT1G01010', 'Note': 'protein_coding_ge...
1 Chr1 9786 12596 - {'ID': 'AT1G01020', 'Note': 'protein_coding_ge...
Note: The implementation of
gff2bed
follows a philosophy of simplicity. It depends on nothing but the built-in python libraries, and it includes nothing but theparse()
andconvert()
functions. Typically when applyinggff2bed
in practice, you will use it in conjunction with other modules such aspandas
orpybedtools
.
To create a data frame of BED formatted data, pass the stream to gff2bed.convert()
before passing to pd.DataFrame()
bed_data = pd.DataFrame(gff2bed.convert(gff2bed.parse(GFF3_FILE)))
bed_data.head()
0 1 2 3 4 5
0 Chr1 7488 9757 AT1G01010 0 +
1 Chr1 9785 12596 AT1G01020 0 -
You can similarly create a BedTool
with pybedtools
from pybedtools import BedTool
bed_data = BedTool(gff2bed.convert(gff2bed.parse(GFF3_FILE))).saveas()
bed_data.head()
Chr1 7488 9757 AT1G01010 0 +
Chr1 9785 12596 AT1G01020 0 -
To complete the tutorial, clean up the temporary directory
temp_dir.cleanup()
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gff2bed-1.0.3.tar.gz
.
File metadata
- Download URL: gff2bed-1.0.3.tar.gz
- Upload date:
- Size: 5.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8766ad23a73986688c0641c7121e2ba8b34daf483a66329d6991ceb05d76c9f3 |
|
MD5 | 6b4d97877517576952841175815d29c5 |
|
BLAKE2b-256 | 8ec317cdc391be7d13fc064d9c2f247f66da7e2d37e28d1348be0ec23daafd95 |
File details
Details for the file gff2bed-1.0.3-py3-none-any.whl
.
File metadata
- Download URL: gff2bed-1.0.3-py3-none-any.whl
- Upload date:
- Size: 5.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.8.0 pkginfo/1.8.2 readme-renderer/32.0 requests/2.27.1 requests-toolbelt/0.9.1 urllib3/1.26.8 tqdm/4.62.3 importlib-metadata/4.10.1 keyring/23.5.0 rfc3986/2.0.0 colorama/0.4.4 CPython/3.9.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e4399efb633b64e09016367b772745b045013d36d823434457ece2f0e2b3319a |
|
MD5 | 4ac48096d6ab894ab576ef06f2b8f017 |
|
BLAKE2b-256 | 6573a2e955eda2100d15defce1b68982c7c6dcbfbba924e2b5d6d47847653287 |