Skip to main content

Missing value imputation with the local least square algorithm in python

Project description

pyLLS

a Python library for missing gene value imputation using local least square algorithm

The Local Least Square (LLS) algorithm is an algorithm that is particularly effective at imputing missing values.
We developed pyLLS by implementing the LLS into python framework.
Our pyLLS offers more options and is significantly faster than LLS in R.
pyLLS is subjected to the MIT License, which guarantees free and commercial use for industry as well as the research community.

Please report bugs to Sejin Oh, at <sejin.ohtheragenbio.com> or at https://github.com/Theragen-Bio/pyLLS/issues.

Installation

You can install the pyLLS from pypi with:

pip install pyLLS

Example

If you run 'impute_missing_gene()' without any data,
it will return its description.

import pyLLS
pyLLS.impute_missing_gene()
>>> pyLLS.impute_missing_gene()

            This function estimates missing values of the specified target probes.
            # parameters
            ref (pd.DataFrame): reference data. gene x sample (n x p) DataFrame.
            target (pd.DataFrame) : Target table containing missing values. gene x sample (i x k) DataFrame.
            metric (str) : ['correlation'(default),'L1','L2']
                           Similarity metric to prioritize the probes for each target.
            maxK : maximum number of probe to be used in missing value estimation.
            useKneedle : It determines whether Kneedle algorithm should be used (True) or not (False).
                         If useKneedle==False, then maxK probes will be used to estimate missing values.
            verbose : If True, progress is reported. Otherwise, no progress is reported.
            n_jobs : Use all threads ('all') or speicified number of threads (int)
            addK = Intenger that added to Kneedle's K to prevent underfitting.
                   This will use K+addK probes to estimate missing values of a gene. (default=1)
            return_probes = if true, 'target-table and mgcp' will be returned else 'target' will be returned.
            # Return
            * target : table with estimated values of missing genes that are not present in original target table.
            matrix shape will be (n x k).
            * mgcp : missing gene correlative probes. If useKneedle == True, mgcp will have R2-square column.
            # tutorial
            <------omit-------->

Parameters

ref (pd.DataFrame): reference data. gene x sample (n x p) DataFrame.
target (pd.DataFrame) : Target table containing missing values. gene x sample (i x k) DataFrame.
metric (str) : ['correlation'(default),'L1','L2']
               Similarity metric to prioritize the probes for each target.
maxK : maximum number of probe to be used in missing value estimation.
useKneedle : It determines whether Kneedle algorithm should be used (True) or not (False).
             If useKneedle==False, then maxK probes will be used to estimate missing values.
verbose : If True, progress is reported. Otherwise, no progress is reported.
n_jobs : Use all threads ('all') or speicified number of threads (int)
addK = Intenger that added to Kneedle's K to prevent underfitting.
       This will use K+addK probes to estimate missing values of a gene.
return_probes = if true, 'target-table and mgcp' will be returned else 'target' will be returned.

Returns

* target : table with estimated values of missing genes that are not present in original target table.
            matrix shape will be (n x k).
* mgcp : missing gene correlative probes. If useKneedle == True, mgcp will have R2-square column.

Tutorial

You can simply run the following tutorial codes.

import pyLLS
import pandas as pd
import numpy as np
import random
tmp=pd.DataFrame(np.array(random.sample(range(1000),1000)).reshape(100,10))
tmp.index=['g'+str(i) for i in tmp.index]
tmp.columns=['s'+str(i) for i in tmp.columns]
tmp2=tmp.iloc[:90,:5]
tmp3=pyLLS.impute_missing_gene(ref=tmp,target=tmp2)

If you want experience more sophisticated tutorial,
please refer the notebook.

Project details


Release history Release notifications | RSS feed

This version

0.5

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyLLS-0.5.tar.gz (7.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyLLS-0.5-py2.py3-none-any.whl (7.6 kB view details)

Uploaded Python 2Python 3

File details

Details for the file pyLLS-0.5.tar.gz.

File metadata

  • Download URL: pyLLS-0.5.tar.gz
  • Upload date:
  • Size: 7.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for pyLLS-0.5.tar.gz
Algorithm Hash digest
SHA256 72a8cb21ca879d921fb3d0a93f860d1fab4c851258330c9e3a3c6effbea2bae0
MD5 c6d819fe1e9f5cb42342191cdae83a57
BLAKE2b-256 78431dc6636692c69e986052a3545bb537c0ba94492644a175000dffc0103cc9

See more details on using hashes here.

File details

Details for the file pyLLS-0.5-py2.py3-none-any.whl.

File metadata

  • Download URL: pyLLS-0.5-py2.py3-none-any.whl
  • Upload date:
  • Size: 7.6 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for pyLLS-0.5-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 7488e820cc33f5deb45cddba8a08b337a561fd45268d46a9413517f38bdcaf13
MD5 27bff8058e09854f5b7697e81d2b693d
BLAKE2b-256 e57b6ec2808a002a1ff2b0955446de5730182ffaadd70baed09d54a4a1a5ad31

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page