Pipeline components to extract features from variants.
Project description
Bearclaw
Components to help extract features from variants, to be used as part of a pipeline.
Installation
pip3 install bearclaw
Usage
from bearclaw.preprocessing import VariantDataGenerator
from bearclaw.transforms import spectrum
# Dataframe containing locations of VCF files and labels.
dataframe = DataFrame({
"vcf": [
"src/test/resources/GRCh37/sample1.vcf",
"src/test/resources/GRCh37/sample2.vcf",
],
"class": [1, 0],
})
# Transform VCF files into features using `spectrum`, which counts the number of variants by flanking context.
dg = VariantDataGenerator(transform=spectrum)
# Convert dataframe to label `y` and features `X_spectrum` using `spectrum`.
X_spectrum, y = dg.flow_from_dataframe(dataframe, x_col="vcf")
Reference documentation
https://hylkedonker.gitlab.io/bearclaw/
License
The code in this repository is licensed under the MIT License.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
bearclaw-0.0.4.tar.gz
(182.4 kB
view hashes)
Built Distribution
bearclaw-0.0.4-py3-none-any.whl
(168.3 kB
view hashes)