Skip to main content

Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Project description

pandas_genomics logo


Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Quickstart

Variant objects holds information about a particular variant:

from pandas_genomics.scalars import Variant
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C', 'T'])
print(variant)
rs12462[chr=12;pos=112161652;ref=A;alt=C,T]

Each variant should have a unique ID, and a random ID is generated if one is not specified.

Genotype objects are associated with a particular Variant:

gt = variant.make_genotype("A", "C")
print(gt)
A/C

The GenotypeArray stores genotypes with an associated variant and has useful methods and properties:

from pandas_genomics.scalars import Variant
from pandas_genomics.arrays import GenotypeArray
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C'])
gt_array = GenotypeArray([variant.make_genotype_from_str(s) for s in ["C/C", "A/C", "A/A"]])
print(gt_array)
<GenotypeArray>
[Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=1, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=0)]
Length: 3, dtype: genotype[12; 112161652; rs12462; A; C]
print(gt_array.astype(str))
    ['C/C' 'A/C' 'A/A']
print(gt_array.encode_dominant())
    <IntegerArray>
    [1.0, 1.0, 0.0]
    Length: 3, dtype: float

There are also genomics accessors for Series and DataFrame

import pandas as pd
print(pd.Series(gt_array).genomics.encode_codominant())
    0    Hom
    1    Het
    2    Ref
    Name: rs12462_C, dtype: category
    Categories (3, object): ['Ref' < 'Het' < 'Hom']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_genomics-0.12.1.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

pandas_genomics-0.12.1-py3-none-any.whl (41.9 kB view details)

Uploaded Python 3

File details

Details for the file pandas_genomics-0.12.1.tar.gz.

File metadata

  • Download URL: pandas_genomics-0.12.1.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.2.2 CPython/3.7.12 Darwin/21.3.0

File hashes

Hashes for pandas_genomics-0.12.1.tar.gz
Algorithm Hash digest
SHA256 5ae7c160edc15590039770e22bf439383f67ce829958898bb62afacb5711ce34
MD5 33d68e1fdd83f9c8c6f9ed98d9f9564b
BLAKE2b-256 72f2fead449c0609df1d016655823521bc20e0ca7679159c13431060cf08dd1b

See more details on using hashes here.

File details

Details for the file pandas_genomics-0.12.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pandas_genomics-0.12.1-py3-none-any.whl
Algorithm Hash digest
SHA256 884b36694335b8d1b988abb2927cf74e0fca693713233bafca3a413f1ef8c550
MD5 436af9966fdd2e769cfd367b887a2fb8
BLAKE2b-256 d9e56b96ca59876d82ce2d8b944f5905d6e7ab91d040820537cb58d146494fe7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page