Skip to main content

Fast Parser for Ensembl formated GTF Files to Pandas DataFrames

Project description

mbf_gtf

Possibly the fastes Ensembl-GTF parser around (reads the 1GB human GTF in about 10s on my system).

Usage: mbf_gtf.parse_ensembl_gtf("filename.gtf", []) -> A dict of DataFrames

The file may be compressed with gzip - it must then end with ".gz".

The second parameter may be a list of 'features' to retrieve - getting just a subset can greatly improve performance.

Note that this is very ensembl specific, it does not deal with any other GTF format, and that it throws away attributes that are repeated on the sub elements - ie. exons have only gene_id, not gene_name, gene_version, gene_....

The resulting coordinates are pythonic - ie. starting at 0 (ie. shifted -1 from the values in the GTF).

This is part of the mbf_* family of bioinformatic libraries.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

mbf_gtf-0.6.0.tar.gz (10.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

mbf_gtf-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (334.6 kB view details)

Uploaded CPython 3.8+manylinux: glibc 2.17+ x86-64

File details

Details for the file mbf_gtf-0.6.0.tar.gz.

File metadata

  • Download URL: mbf_gtf-0.6.0.tar.gz
  • Upload date:
  • Size: 10.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: maturin/1.8.3

File hashes

Hashes for mbf_gtf-0.6.0.tar.gz
Algorithm Hash digest
SHA256 6a8a08cf41583edd8db8dbc99d5113f485ca5b084f33915582099adb0648567e
MD5 a12669c480b835c4fe1054f9aa6b883b
BLAKE2b-256 31884d42232c766dbbd2e1f3d4a9915cf5060eca953d78b88ff04420923fba72

See more details on using hashes here.

File details

Details for the file mbf_gtf-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for mbf_gtf-0.6.0-cp38-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 46af31b57fdc533371cf3bed14b36c8b11492b264212559feafedb25f30d13f0
MD5 ef2e599879133c96b613b825d819d9da
BLAKE2b-256 46c48d370fc769467205679b7375eac9257dc4432e0fe122fcfb0f53baf08b74

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page