Skip to main content

Pipeline components to extract features from variants.

Project description

Bearclaw

Components to help extract features from variants, to be used as part of a pipeline.

Installation

pip3 install bearclaw

Usage

from bearclaw.preprocessing import VariantDataGenerator
from bearclaw.transforms import spectrum


# Dataframe containing locations of VCF files and labels.
dataframe = DataFrame({
    "vcf": [
        "src/test/resources/GRCh37/sample1.vcf",
        "src/test/resources/GRCh37/sample2.vcf",
    ],
    "class": [1, 0],
})
# Transform VCF files into features using `spectrum`, which counts the number of variants by flanking context.
dg = VariantDataGenerator(transform=spectrum)

# Convert dataframe to label `y` and features `X_spectrum` using `spectrum`.
X_spectrum, y = dg.flow_from_dataframe(dataframe, x_col="vcf")

Reference documentation

https://hylkedonker.gitlab.io/bearclaw/

License

The code in this repository is licensed under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bearclaw-0.0.4.tar.gz (182.4 kB view hashes)

Uploaded Source

Built Distribution

bearclaw-0.0.4-py3-none-any.whl (168.3 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page