Pandas ExtensionDtypes and ExtensionArray for working with genomics data
Project description
Pandas ExtensionDtypes and ExtensionArray for working with genomics data
Quickstart
Variant
objects holds information about a particular variant:
from pandas_genomics.scalars import Variant
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C', 'T'])
print(variant)
rs12462[chr=12;pos=112161652;ref=A;alt=C,T]
Each variant should have a unique ID, and a random ID is generated if one is not specified.
Genotype
objects are associated with a particular Variant
:
gt = variant.make_genotype("A", "C")
print(gt)
A/C
The GenotypeArray
stores genotypes with an associated variant and has useful methods and properties:
from pandas_genomics.scalars import Variant
from pandas_genomics.arrays import GenotypeArray
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C'])
gt_array = GenotypeArray([variant.make_genotype_from_str(s) for s in ["C/C", "A/C", "A/A"]])
print(gt_array)
<GenotypeArray>
[Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=1, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=0)]
Length: 3, dtype: genotype[12; 112161652; rs12462; A; C]
print(gt_array.astype(str))
['C/C' 'A/C' 'A/A']
print(gt_array.encode_dominant())
<IntegerArray>
[1.0, 1.0, 0.0]
Length: 3, dtype: float
There are also genomics
accessors for Series and DataFrame
import pandas as pd
print(pd.Series(gt_array).genomics.encode_codominant())
0 Hom
1 Het
2 Ref
Name: rs12462_C, dtype: category
Categories (3, object): ['Ref' < 'Het' < 'Hom']
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
pandas_genomics-0.12.1.tar.gz
(34.2 kB
view details)
Built Distribution
File details
Details for the file pandas_genomics-0.12.1.tar.gz
.
File metadata
- Download URL: pandas_genomics-0.12.1.tar.gz
- Upload date:
- Size: 34.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.7.12 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ae7c160edc15590039770e22bf439383f67ce829958898bb62afacb5711ce34 |
|
MD5 | 33d68e1fdd83f9c8c6f9ed98d9f9564b |
|
BLAKE2b-256 | 72f2fead449c0609df1d016655823521bc20e0ca7679159c13431060cf08dd1b |
File details
Details for the file pandas_genomics-0.12.1-py3-none-any.whl
.
File metadata
- Download URL: pandas_genomics-0.12.1-py3-none-any.whl
- Upload date:
- Size: 41.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/1.2.2 CPython/3.7.12 Darwin/21.3.0
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 884b36694335b8d1b988abb2927cf74e0fca693713233bafca3a413f1ef8c550 |
|
MD5 | 436af9966fdd2e769cfd367b887a2fb8 |
|
BLAKE2b-256 | d9e56b96ca59876d82ce2d8b944f5905d6e7ab91d040820537cb58d146494fe7 |