Skip to main content

Missing value imputation with the local least square algorithm in python

Project description

pyLLS

a Python library for missing gene value imputation using local least square algorithm

The Local Least Square (LLS) algorithm is an algorithm that is particularly effective at imputing missing values.
We developed pyLLS by implementing the LLS into python framework.
Our pyLLS offers more options and is significantly faster than LLS in R.
pyLLS is subjected to the MIT License, which guarantees free and commercial use for industry as well as the research community.

Please report bugs to Sejin Oh, at <sejin.ohtheragenbio.com> or at https://github.com/Theragen-Bio/pyLLS/issues.

Installation

You can install the pyLLS from pypi with:

pip install pyLLS

Example

If you run 'impute_missing_gene()' without any data,
it will return its description.

import pyLLS
pyLLS.impute_missing_gene()
>>> pyLLS.impute_missing_gene()

            This function estimates missing values of the specified target probes.
            # parameters
            ref (pd.DataFrame): reference data. gene x sample (n x p) DataFrame.
            target (pd.DataFrame) : Target table containing missing values. gene x sample (i x k) DataFrame.
            metric (str) : ['correlation'(default),'L1','L2']
                           Similarity metric to prioritize the probes for each target.
            maxK : maximum number of probe to be used in missing value estimation.
            useKneedle : It determines whether Kneedle algorithm should be used (True) or not (False).
                         If useKneedle==False, then maxK probes will be used to estimate missing values.
            verbose : If True, progress is reported. Otherwise, no progress is reported.
            n_jobs : Use all threads ('all') or speicified number of threads (int)
            addK = Intenger that added to Kneedle's K to prevent underfitting.
                   This will use K+addK probes to estimate missing values of a gene. (default=1)
            return_probes = if true, 'target-table and mgcp' will be returned else 'target' will be returned.
            # Return
            * target : table with estimated values of missing genes that are not present in original target table.
            matrix shape will be (n x k).
            * mgcp : missing gene correlative probes. If useKneedle == True, mgcp will have R2-square column.
            # tutorial
            <------omit-------->

Parameters

ref (pd.DataFrame): reference data. gene x sample (n x p) DataFrame.
target (pd.DataFrame) : Target table containing missing values. gene x sample (i x k) DataFrame.
metric (str) : ['correlation'(default),'L1','L2']
               Similarity metric to prioritize the probes for each target.
maxK : maximum number of probe to be used in missing value estimation.
useKneedle : It determines whether Kneedle algorithm should be used (True) or not (False).
             If useKneedle==False, then maxK probes will be used to estimate missing values.
verbose : If True, progress is reported. Otherwise, no progress is reported.
n_jobs : Use all threads ('all') or speicified number of threads (int)
addK = Intenger that added to Kneedle's K to prevent underfitting.
       This will use K+addK probes to estimate missing values of a gene.
return_probes = if true, 'target-table and mgcp' will be returned else 'target' will be returned.

Returns

* target : table with estimated values of missing genes that are not present in original target table.
            matrix shape will be (n x k).
* mgcp : missing gene correlative probes. If useKneedle == True, mgcp will have R2-square column.

Tutorial

You can simply run the following tutorial codes.

import pyLLS
import pandas as pd
import numpy as np
import random
tmp=pd.DataFrame(np.array(random.sample(range(1000),1000)).reshape(100,10))
tmp.index=['g'+str(i) for i in tmp.index]
tmp.columns=['s'+str(i) for i in tmp.columns]
tmp2=tmp.iloc[:90,:5]
tmp3=pyLLS.impute_missing_gene(ref=tmp,target=tmp2)

If you want experience more sophisticated tutorial,
please refer the notebook.

Project details


Release history Release notifications | RSS feed

This version

0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyLLS-0.5.tar.gz (7.6 kB view hashes)

Uploaded Source

Built Distribution

pyLLS-0.5-py2.py3-none-any.whl (7.6 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page