Skip to main content

Quickly parse genes and exons from a RefGene file

Project description

<h1 align="center">RefGene-Parser</h1>

<h3 align="center">Installation</h3>

Clone the respository and then install locally:

```bash
$ git clone https://github.com/clintval/refgene-parser.git
$ pip install refgene-parser
```

<h3 align="center">Dependencies</h3>

- None

<h3 align="center">Tutorial</h3>

Iterate over the records in a RefGene file:

```python

from refgene_parser import RefGene

refgene = RefGene('mm10.refGene.txt.gz')

for i, gene in enumerate(refgene):
if i >= 10: break
print(gene)
```

```python
Gene("chr2", 11705292, 11733985, "+", name="Il15ra", id="NM_001271498")
Gene("chr7", 142434976, 142440396, "+", name="Syt8", id="NM_001285857")
Gene("chr1", 78424744, 78488897, "-", name="Farsb", id="NM_011811")
Gene("chr11", 62574485, 62600305, "+", name="Trpv2", id="NM_011706")
Gene("chr12", 100199434, 100209824, "+", name="Calm1", id="NM_009790")
Gene("chr5", 30933142, 30945480, "-", name="Cgref1", id="NM_026770")
Gene("chr4", 142084297, 142088101, "+", name="Tmem51os1", id="NR_027137")
Gene("chr10", 77257772, 77259223, "-", name="Gm10941", id="NR_026944")
Gene("chr10", 77706586, 77706986, "+", name="Gm10272", id="NR_026831")
Gene("chr7", 100549116, 100607996, "-", name="Mrpl48", id="NR_003559")
Gene("chr7", 7212995, 7278289, "-", name="Vmn2r29", id="NR_003555")
```


<h5 align="center">Exact match for a gene symbol name</h5>

> Will return the first record matching with `name == Kras`

```python
Kras = refgene.gene_by_name('Kras')
print(Kras)
print(Kras.sam_interval)
print(Kras.num_exons)
```

```python
Gene("chr6", 145216698, 145250231, "-", name="Kras", id="NM_021284")
'chr6:145216698-145250231'
5
```


<h5 align="center">Exact match for a gene ID</h5>

> Will return the first record matching `id == NM_009085`

```python
NM_009085 = refgene.gene_by_id('NM_009085')
```

```python
Gene("chr17", 46243919, 46248045, "-", name="Polr1c", id="NM_009085")
```


<h5 align="center">Fuzzy match for a gene symbol name</h5>

```python
list(refgene.genes_by_name_pattern('^.ras$'))
```

```python
[Gene("chr3", 103058284, 103067914, "+", name="Nras", id="NM_010937"),
Gene("chr7", 45018006, 45021644, "+", name="Rras", id="NM_009101"),
Gene("chrX", 7924275, 7928607, "-", name="Eras", id="NM_181548"),
Gene("chr7", 141189933, 141194004, "-", name="Hras", id="NM_008284"),
Gene("chr6", 145216698, 145250231, "-", name="Kras", id="NM_021284"),
Gene("chr9", 99385419, 99436712, "-", name="Mras", id="NM_008624"),
Gene("chr7", 141189933, 141194004, "-", name="Hras", id="NM_001130444"),
Gene("chr7", 141189933, 141194004, "-", name="Hras", id="NM_001130443")]
```


<h5 align="center">Fuzzy match for a gene ID</h5>

```python
list(refgene.genes_by_id_pattern('NR_00355*'))
```

```python

[Gene("chr7", 100549116, 100607996, "-", name="Mrpl48", id="NR_003559"),
Gene("chr7", 7212995, 7278289, "-", name="Vmn2r29", id="NR_003555"),
Gene("chr15", 82636749, 82642045, "-", name="Cyp2d13", id="NR_003552"),
Gene("chr3", 92373914, 92375229, "+", name="Sprr2g", id="NR_003548"),
Gene("chr13", 104173720, 104178466, "-", name="Trappc13", id="NR_003546"),
Gene("chr5", 30950065, 30955095, "+", name="Abhd1", id="NR_003522"),
Gene("chr17", 3064317, 3084183, "-", name="Pisd-ps2", id="NR_003519"),
Gene("chrUn_JH584304", 52673, 59689, "-", name="Pisd-ps3", id="NR_003518"),
Gene("chr11", 3124020, 3131944, "+", name="Pisd-ps1", id="NR_003517"),
Gene("chr16", 97536080, 97560901, "+", name="Mx2", id="NR_003508"),
Gene("chr5", 120812634, 120824160, "+", name="Oas1b", id="NR_003507"),
Gene("chr5", 10865028, 10870808, "+", name="Gm6455", id="NR_003596"),
Gene("chr19", 5842295, 5845480, "-", name="Neat1", id="NR_003513"),
Gene("chr15", 62217540, 62219451, "-", name="H2afy3", id="NR_003523"),
Gene("chrX_GL456233_random", 268798, 270075, "+", name="Zf12", id="NR_003547"),
Gene("chr16", 97447034, 97462906, "-", name="Mx1", id="NR_003520"),
Gene("chr11", 88964665, 88966917, "-", name="Gm15698", id="NR_003564"),
Gene("chr13", 12614064, 12650395, "-", name="Gpr137b-ps", id="NR_003568")]
```

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

refgene_parser-0.0.1.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file refgene_parser-0.0.1.tar.gz.

File metadata

File hashes

Hashes for refgene_parser-0.0.1.tar.gz
Algorithm Hash digest
SHA256 1ef7c7fa146b04d9a54403df5a93f1b0499bab673b6ff888c2aa1a4ed3b69b8b
MD5 8406926c82a5527ff15cd0af92caf1f5
BLAKE2b-256 1c9107a366e7eca7aedd160a456c6b12a26a987d223700191da062c39c3cceb8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page