Skip to main content

Parse GTF files with Polars

Project description

gtf-polars

Parse GTF files with Polars

Implements a memory-efficient GTF parser that stays fully lazy until .collect() is called. For more information on Polars Lazy API, see this link

Scripts

scripts/subset_gtf_feature.py

Filters a GTF file to keep only rows matching one or more feature types (e.g. gene, transcript, exon) and writes the result as a tab-separated GTF file.

python scripts/subset_gtf_feature.py gencode.v39.annotation.gtf \
    --feature gene transcript \
    --output subset.gtf
Argument Description
gtf_file Path to the input GTF or gzipped GTF file
--feature One or more feature types to keep
--output Output path (default: subset.gtf)

scripts/transcript_to_gene.py

Builds a transcript-to-gene mapping CSV from a GTF file by extracting transcript_id, gene_id, and gene_name from transcript rows. Useful for downstream tools (e.g. alevin, tximeta) that require a tx2gene table.

python scripts/transcript_to_gene.py isoseq.gtf --output transcript_to_gene.csv
Argument Description
gtf_file Path to the input GTF file
--output Output CSV path (default: transcript_to_gene.csv)

Library usage

from gtf_polars import parse_gtf
import polars as pl

lf = parse_gtf("gencode.v39.annotation.sorted.gtf", attributes_to_extract=["gene_id", "gene_name"])

df = (lf.filter(pl.col("feature") == 'transcript').select(['seqname', 'start','end','gene_id', 'gene_name']).collect())

df.head()
shape: (5, 5)
┌─────────┬───────┬───────┬───────────────────┬─────────────┐
│ seqname ┆ start ┆ end   ┆ gene_id           ┆ gene_name   │
│ ---     ┆ ---   ┆ ---   ┆ ---               ┆ ---         │
│ str     ┆ i64   ┆ i64   ┆ str               ┆ str         │
╞═════════╪═══════╪═══════╪═══════════════════╪═════════════╡
│ chr1    ┆ 11869 ┆ 14409 ┆ ENSG00000223972.5 ┆ DDX11L1     │
│ chr1    ┆ 12010 ┆ 13670 ┆ ENSG00000223972.5 ┆ DDX11L1     │
│ chr1    ┆ 14404 ┆ 29570 ┆ ENSG00000227232.5 ┆ WASH7P      │
│ chr1    ┆ 17369 ┆ 17436 ┆ ENSG00000278267.1 ┆ MIR6859-1   │
│ chr1    ┆ 29554 ┆ 31097 ┆ ENSG00000243485.5 ┆ MIR1302-2HG │
└─────────┴───────┴───────┴───────────────────┴─────────────┘

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gtf_polars-0.1.1.tar.gz (5.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gtf_polars-0.1.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file gtf_polars-0.1.1.tar.gz.

File metadata

  • Download URL: gtf_polars-0.1.1.tar.gz
  • Upload date:
  • Size: 5.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gtf_polars-0.1.1.tar.gz
Algorithm Hash digest
SHA256 67e15a2b1e2b891a74326724dca373d2131a7e3dbbd03e07ef88bac89d9afa37
MD5 5cf494c11bca4d71654701bb8a75130b
BLAKE2b-256 c819d42496215d9b17b9aad02dcb3c200aa321a6febdf5183481ceb24b417ed2

See more details on using hashes here.

Provenance

The following attestation bundles were made for gtf_polars-0.1.1.tar.gz:

Publisher: pypi-publish.yml on indapa/gtf-polars

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gtf_polars-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: gtf_polars-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for gtf_polars-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 74af1e6a7045eb4552443e3ffe0fd662ae26a4170aa5ee9b4c9618e28a46ab71
MD5 46940e6af814a8a5764f5e5e3c6381c4
BLAKE2b-256 322a52ceda196ed1315caa53580961003a3249f119aa8fdd78e6f544c54ad83c

See more details on using hashes here.

Provenance

The following attestation bundles were made for gtf_polars-0.1.1-py3-none-any.whl:

Publisher: pypi-publish.yml on indapa/gtf-polars

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page