Skip to main content

Convert GFF3-formatted data to BED format

Project description

gff2bed

Overview

GFF3 and BED are common formats for storing the coordinates of genomic features such as genes. GFF3 format is more versatile, but BED format is simpler and enjoys a rich ecosystem of utilities such as bedtools. For this reason, it is often convenient to store genomic features in GFF3 format and convert them to BED format for genome arithmetic.

This package provides two convenience functions to streamline converting data from GFF3 to BED format for bioinformatics analysis: parse(), which reads data from a GFF3 file, and convert(), which converts GFF3-formatted data to BED-formatted data that can be passed on e.g. to pybedtools.

Installation

Install gff2bed with pip

pip install gff2bed

Example

import urllib3
import shutil
import pandas as pd
import pybedtools
import gff2bed

GFF3_URL = 'https://gitlab.com/salk-tm/gff2bed/-/raw/main/test/data/ColCEN_AT1G01010-20_TAIR10.gff3.gz'

# Download the example GFF3 file
http = urllib3.PoolManager()
with http.request('GET', GFF3_URL, preload_content=False) as r, open('ColCEN_AT1G01010-20_TAIR10.gff3.gz', 'wb') as dest_file:
    shutil.copyfileobj(r, dest_file)

# Parse the GFF3 data into a Pandas data frame
genes_df = pd.DataFrame(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))
genes_df.head()

# Parse the GFF3 data into a pybedtools BedTool
genes_bt = pybedtools.BedTool(gff2bed.convert(gff2bed.parse('ColCEN_AT1G01010-20_TAIR10.gff3.gz'))).saveas('ColCEN_AT1G01010-20_TAIR10.bed')
genes_bt.head()

API

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gff2bed-0.2.0.tar.gz (6.9 kB view hashes)

Uploaded Source

Built Distribution

gff2bed-0.2.0-py3-none-any.whl (4.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page