Convert GFF3-formatted data to BED format
Project description
gff2bed
Overview
GFF3 and BED are common formats for storing the coordinates of genomic features such as genes. GFF3 format is more versatile, but BED format is simpler and enjoys a rich ecosystem of utilities such as bedtools. For this reason, it is often convenient to store genomic features in GFF3 format and convert them to BED format for genome arithmetic.
This package provides two convenience functions to streamline converting data from GFF3 to BED format for bioinformatics analysis: parse()
, which reads data from a GFF3 file, and convert()
, which converts GFF3-formatted data to BED-formatted data that can be passed on e.g. to pybedtools.
Installation
Install gff2bed
with pip
pip install gff2bed
Example
import urllib3
import shutil
import pandas as pd
import pybedtools
import gff2bed
GFF3_URL = 'https://gitlab.com/salk-tm/gff2bed/-/raw/main/test/data/ColCEN_AT1G01010-20_TAIR10.gff3.gz'
# Download the example GFF3 file
http = urllib3.PoolManager()
with http.request('GET', GFF3_URL, preload_content=False) as r, open('ColCEN_AT1G01010-20_TAIR10.gff3.gz', 'wb') as dest_file:
shutil.copyfileobj(r, dest_file)
# Parse the GFF3 data into a Pandas data frame
genes_df = pd.DataFrame(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))
genes_df.head()
# Parse the GFF3 data into a pybedtools BedTool
genes_bt = pybedtools.BedTool(gff2bed.convert(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))).saveas('ColCEN_AT1G01010-20_TAIR10.bed')
genes_bt.head()
API
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.