Skip to main content

Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Project description

pandas_genomics logo


Pandas ExtensionDtypes and ExtensionArray for working with genomics data

Quickstart

Variant objects holds information about a particular variant:

from pandas_genomics.scalars import Variant
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C', 'T'])
print(variant)
rs12462[chr=12;pos=112161652;ref=A;alt=C,T]

Each variant should have a unique ID, and a random ID is generated if one is not specified.

Genotype objects are associated with a particular Variant:

gt = variant.make_genotype("A", "C")
print(gt)
A/C

The GenotypeArray stores genotypes with an associated variant and has useful methods and properties:

from pandas_genomics.scalars import Variant
from pandas_genomics.arrays import GenotypeArray
variant = Variant('12', 112161652, id='rs12462', ref='A', alt=['C'])
gt_array = GenotypeArray([variant.make_genotype_from_str(s) for s in ["C/C", "A/C", "A/A"]])
print(gt_array)
<GenotypeArray>
[Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=1, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=1),
Genotype(variant=rs12462[chr=12;pos=112161652;ref=A;alt=C], allele1=0, allele2=0)]
Length: 3, dtype: genotype[12; 112161652; rs12462; A; C]
print(gt_array.astype(str))
    ['C/C' 'A/C' 'A/A']
print(gt_array.encode_dominant())
    <IntegerArray>
    [1.0, 1.0, 0.0]
    Length: 3, dtype: float

There are also genomics accessors for Series and DataFrame

import pandas as pd
print(pd.Series(gt_array).genomics.encode_codominant())
    0    Hom
    1    Het
    2    Ref
    Name: rs12462_C, dtype: category
    Categories (3, object): ['Ref' < 'Het' < 'Hom']

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pandas_genomics-1.0.1.tar.gz (34.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pandas_genomics-1.0.1-py3-none-any.whl (42.5 kB view details)

Uploaded Python 3

File details

Details for the file pandas_genomics-1.0.1.tar.gz.

File metadata

  • Download URL: pandas_genomics-1.0.1.tar.gz
  • Upload date:
  • Size: 34.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.3 Darwin/23.4.0

File hashes

Hashes for pandas_genomics-1.0.1.tar.gz
Algorithm Hash digest
SHA256 d84a9c95b89539bcfb5884570b4890ff18e4f777ba90e2f8db8d852702a87547
MD5 bc8bb9c5f2777188cf94c7c14e9d9e82
BLAKE2b-256 430d12880263fa9b1a43779e53d69d184933d0d3f17cab7cf4728629f6636a2e

See more details on using hashes here.

File details

Details for the file pandas_genomics-1.0.1-py3-none-any.whl.

File metadata

  • Download URL: pandas_genomics-1.0.1-py3-none-any.whl
  • Upload date:
  • Size: 42.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.8.2 CPython/3.10.3 Darwin/23.4.0

File hashes

Hashes for pandas_genomics-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 aff4eda6b88354cb640313c3c9938a048e5765add245b8e25a4cf036a40e8fda
MD5 be8814c15215cc666cfebef7be2e1ddb
BLAKE2b-256 40b51537d3b5c096b07cd9724e3774c072b733a1e4819a78ebbe8812666cbca0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page