Fast Cython-backed parsing for GTF attribute columns.
Project description
gtfreader
gtfreader is a small package for parsing and reading GTF files into pandas dataframes.
Install
python -m pip install -e .
Usage
from gtfreader import read_gtf, read_gtf_python
df = read_gtf("annotation.gtf")
df_python = read_gtf_python("annotation.gtf")
read_gtf(...) uses the compiled parser path when available. read_gtf_python(...) uses the high-level pure Python parser path used by the current pyrunges reader style.
If you want to use the compiled low-level parser directly, pass it raw attribute strings from column 9 of the GTF before they have been expanded:
import pandas as pd
from gtfreader import find_first_data_line_index, parse_chunk_columns
skiprows = find_first_data_line_index("annotation.gtf")
attribute_lines = pd.read_csv(
"annotation.gtf",
sep="\t",
header=None,
usecols=[8],
names=["Attribute"],
skiprows=skiprows,
)["Attribute"].tolist()
compiled_columns = parse_chunk_columns(attribute_lines)
Build
python -m build
Test
python -m pytest -q
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
gtfreader-0.1.0.tar.gz
(111.7 kB
view details)
File details
Details for the file gtfreader-0.1.0.tar.gz.
File metadata
- Download URL: gtfreader-0.1.0.tar.gz
- Upload date:
- Size: 111.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
701d2308fdc2d66384f4a3c7000284d8dbcd3efb2975100482b8eeade4c645d5
|
|
| MD5 |
dc74dd91e84d53668cdba5cb61771eb6
|
|
| BLAKE2b-256 |
b826010b61e71fd1f9773f89c0ac4092480c9d83c5aa288978eb497047d51535
|