A fast gtf/gff parser.
Project description
gxf is a fast gtf/gff parser based pandas.
GFF/GTF file format
The GFF (General Feature Format) format consists of one line per feature, each containing 9 columns of data, plus optional track definition lines. The following documentation is based on the Version 2 specifications.
The GTF (General Transfer Format) is identical to GFF version 2.
Fields must be tab-separated. Also, all but the final field in each feature line must contain a value; "empty" columns should be denoted with a '.'
chr_id
- name of the chromosome or scaffold; chromosome names can be given with or without the 'chr' prefix. Important note: the seqname must be one used within Ensembl, i.e. a standard chromosome name or an Ensembl identifier such as a scaffold ID, without any additional content such as species or assembly. See the example GFF output below.source
- name of the program that generated this feature, or the data source (database or project name)type
- feature type name, e.g. Gene, Variation, Similaritystart
- Start position* of the feature, with sequence numbering starting at 1.end
- End position* of the feature, with sequence numbering starting at 1.score
- A floating point value.strand
- defined as + (forward) or - (reverse).phase
- One of '0', '1' or '2'. '0' indicates that the first base of the feature is the first base of a codon, '1' that the second base is the first base of a codon, and so on..attributes
- A semicolon-separated list of tag-value pairs, providing additional information about each feature. *- Both, the start and end position are included. For example, setting start-end to 1-2 describes two bases, the first and second base in the sequence.
Note that where the attributes contain identifiers that link the features together into a larger structure, these will be used by Ensembl to display the features as joined blocks.
Usage
query all lines that type is 'gene'
from gxf import GXF
filename = 'test.gff'
gff = GXF(filename)
gff.filter(type='gene')
Multi-condition query
from gxf import GXF
filename = 'test.gff'
gff = GXF(filename)
gff.filter(type='gene', strand=1)
You can query not only equality, but also inequality.
The query name is field_name
+ __
+ oper
, and oper is one of the ge
、le
、eq
、ne
、gt
、lt
.
query start >= 200
from gxf import GXF
filename = 'test.gff'
gff = GXF(filename)
gff.filter(start__ge=200)
query end < 100
from gxf import GXF
filename = 'test.gff'
gff = GXF(filename)
gff.filter(end__lt=100)
preprocessing data
You can use Inherits GXF
to rewrite some method to preprocess or Post-process.
the method format is before/after
+ _handle_
+ field_name
, eg. after_handle_attributes
, and the method need one arg.
from gxf import GXF
filename = 'test.gff'
class MyGXF(GXF):
def before_handle_type(self, x):
return x.lower()
def after_handle_type(self, x):
return x.upper()
gff = MyGXF(filename)
gff.filter(type='gene')
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file gxf-0.0.5.tar.gz
.
File metadata
- Download URL: gxf-0.0.5.tar.gz
- Upload date:
- Size: 7.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a42bb60c054c8442911fd198f730a76b66d7b250bc9dc7bd4b3e239bd8a742d2 |
|
MD5 | 06b19ff65850c3d416f3b8e411ba9f73 |
|
BLAKE2b-256 | 06e13d2ca3307599c95e2474546ddbfe1ff1e2aead1effd6b275250d907cc10c |
File details
Details for the file gxf-0.0.5-py3-none-any.whl
.
File metadata
- Download URL: gxf-0.0.5-py3-none-any.whl
- Upload date:
- Size: 6.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4a988c44a54a24d499d40f347b1d869c5eca5e0343cd469cfaf1b936a578e5cf |
|
MD5 | 2b69483ae6d190c7763ccdf7078bf7f1 |
|
BLAKE2b-256 | 3e47058c5b840b59b9df8c593a23a78c290ca5faae53f05a62f3da94846f9734 |