Skip to main content

Distribution-based feature extraction

Project description

DBFE: Distribution-Based Feature Extractor

Python 3 License Discuss

DBFE is a Python library with feature extraction methods that facilitate classifier learning from distributions of genomic variants.

Installing dbfe

To install dbfe, just execute:

pip install dbfe

Quickstart

import pandas as pd

from sklearn.metrics import roc_auc_score
from sklearn.model_selection import train_test_split
from sklearn.pipeline import make_pipeline
from sklearn.preprocessing import StandardScaler
from sklearn.linear_model import LogisticRegression

import dbfe

# sample data
stat_vals = pd.read_csv(f"../experiments/data/ovarian/ovarian_cnv.csv.gz", index_col='SAMPLEID')
stat_vals = stat_vals.loc[stat_vals.SVCLASS == "DEL", :]
stat_vals = stat_vals.groupby(stat_vals.index)['LEN'].apply(list).to_frame()

labels = pd.read_csv(f"../experiments/data/ovarian/labels.tsv", sep='\t', index_col=0)
labels = (labels == "RES") * 1
stat_df = stat_vals.join(labels.CLASS_LABEL, how='inner')

# splitting into training and testing data
X = stat_df.loc[:, "LEN"]
y = stat_df.loc[:, "CLASS_LABEL"]
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.25, random_state=23, stratify=y)

# DBFE in a classification pipeline
extractor = dbfe.DistributionBasedFeatureExtractor(breakpoint_type='supervised', n_bins='auto', cv=10)
pipe = make_pipeline(extractor, StandardScaler(), LogisticRegression())
pipe.fit(X_train, y_train)

extractor.plot_data_with_breaks(X_train, y_train, plot_type='kde')
y_prob = pipe.predict_proba(X_test)
print("AUC on test data: {:.3}".format(roc_auc_score(y_test, y_prob[:, 1])))

License

  • This project is released under a permissive new BSD open source license (LICENSE-BSD3.txt) and commercially usable. There is no warranty; not even for merchantability or fitness for a particular purpose.
  • In addition, you may use, copy, modify and redistribute all artistic creative works (figures and images) included in this distribution under the directory according to the terms and conditions of the Creative Commons Attribution 4.0 International License. See the file LICENSE-CC-BY.txt for details. (Computer-generated graphics such as the plots produced by seaborn/matplotlib fall under the BSD license mentioned above).

Citing

If you use dbfe as part of your workflow in a scientific publication, please consider citing the associated paper:

  • Piernik, M. et al. (2022) DBFE: Distribution-based feature extraction from copy number and structural variants in whole-genome data.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dbfe-0.2.2.tar.gz (13.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dbfe-0.2.2-py3-none-any.whl (14.9 kB view details)

Uploaded Python 3

File details

Details for the file dbfe-0.2.2.tar.gz.

File metadata

  • Download URL: dbfe-0.2.2.tar.gz
  • Upload date:
  • Size: 13.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.9

File hashes

Hashes for dbfe-0.2.2.tar.gz
Algorithm Hash digest
SHA256 0eedef6b0975e58726352f5db1d54fef8cc9e6bf20a0cb2999fa1b44f9cfccb9
MD5 fc30218c3e7b13f01b7fc721b36d9149
BLAKE2b-256 26faf08da1511511c3cea5c1e2fa6117f7e8f87dc7d1195c17b62cb465947ebc

See more details on using hashes here.

File details

Details for the file dbfe-0.2.2-py3-none-any.whl.

File metadata

  • Download URL: dbfe-0.2.2-py3-none-any.whl
  • Upload date:
  • Size: 14.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.10.0 pkginfo/1.8.2 requests/2.27.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.7.9

File hashes

Hashes for dbfe-0.2.2-py3-none-any.whl
Algorithm Hash digest
SHA256 f15ac97838595bdb6e9cf6fdb479e4e079ce105533054ab00a20040a23b3f4f1
MD5 1f57c812769bf64299a3eb720da662e9
BLAKE2b-256 3f0c57d0fcf9b6f79e9e5effa44051ecbf8094de5eb63f04e544ab3ba462b826

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page