Skip to main content

Wrapper around BEDTools for bioinformatics work

Project description

Overview

https://travis-ci.org/daler/pybedtools.png?branch=master https://badge.fury.io/py/pybedtools.svg?style=flat https://img.shields.io/badge/install%20with-bioconda-brightgreen.svg

The BEDTools suite of programs is widely used for genomic interval manipulation or “genome algebra”. pybedtools wraps and extends BEDTools and offers feature-level manipulations from within Python.

See full online documentation, including installation instructions, at http://daler.github.io/pybedtools/.

Why pybedtools?

Here is an example to get the names of genes that are <5 kb away from intergenic SNPs:

from pybedtools import BedTool

snps = BedTool('snps.bed.gz')  # [1]
genes = BedTool('hg19.gff')    # [1]

intergenic_snps = snps.subtract(genes)                       # [2]
nearby = genes.closest(intergenic_snps, d=True, stream=True) # [2, 3]

for gene in nearby:             # [4]
    if int(gene[-1]) < 5000:    # [4]
        print gene.name         # [4]

Useful features shown here include:

  • [1] support for all BEDTools-supported formats (here gzipped BED and GFF)

  • [2] wrapping of all BEDTools programs and arguments (here, subtract and closest and passing the -d flag to closest);

  • [3] streaming results (like Unix pipes, here specified by stream=True)

  • [4] iterating over results while accessing feature data by index or by attribute access (here [-1] and .name).

In contrast, here is the same analysis using shell scripting. Note that this requires knowledge in Perl, bash, and awk. The run time is identical to the pybedtools version above:

snps=snps.bed.gz
genes=hg19.gff
intergenic_snps=/tmp/intergenic_snps

snp_fields=`zcat $snps | awk '(NR == 2){print NF; exit;}'`
gene_fields=9
distance_field=$(($gene_fields + $snp_fields + 1))

intersectBed -a $snps -b $genes -v > $intergenic_snps

closestBed -a $genes -b $intergenic_snps -d \
| awk '($'$distance_field' < 5000){print $9;}' \
| perl -ne 'm/[ID|Name|gene_id]=(.*?);/; print "$1\n"'

rm $intergenic_snps

See the Shell script comparison in the docs for more details on this comparison, or keep reading the full documentation at http://daler.github.io/pybedtools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybedtools-0.10.0.tar.gz (12.6 MB view details)

Uploaded Source

File details

Details for the file pybedtools-0.10.0.tar.gz.

File metadata

  • Download URL: pybedtools-0.10.0.tar.gz
  • Upload date:
  • Size: 12.6 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.12.2

File hashes

Hashes for pybedtools-0.10.0.tar.gz
Algorithm Hash digest
SHA256 1a6fbaad23b013becc741d7d5922a2df03e391bc44ff92772ffb7dd456711161
MD5 f9718c9ac32bc8c1748e1f67c4950478
BLAKE2b-256 cc90cea4197772a029e925bd5d414108b5438d621dfbb1b0cc2627529d1ec524

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page