Skip to main content

Use mutual information and accelerated gradient method to filter out and optimize nonconvex sparse learning problems on large genetic data based on bed/bim/fam. Multiprocessing is now available.

Project description

MI_AG

Use mutual information and accelerated gradient method to filter out and optimize nonconvex sparse learning problems on large genetic data based on bed/bim/fam. The corresponding paper is coming soon...

The available functions are:

  • continuous_filter caculates the mutual information between a continuous outcome and a bialletic SNP using FFT. Missing data is acceptable and will be removed. The arguments are:

    • bed_file, bim_file, fam_file are the location of the plink1 files;
    • outcome, outcome_iid are the outcome values and the iids for the outcome. For genetic data, it is usual that the order of SNP iid and the outcome iid don't match. While SNP iid can be obtained from the plink1 files, outcome iid here is to be declared separately. outcome_iid should be a list of strings or a one-dimensional numpy string array.
    • a_min, a_max are the minimum and maximum of the continous outcome used to evaluate the support; N=500 is the default values for grid size for FFT.
  • binary_filter works similarly, execpt that a_min, a_max, N are not available obviously.

  • continuous_filter_parallel and binary_filter_parallel are the multiprocessing version of the above two functions, with chunck_size=60000 can be used to declare the chunk size.

  • UAG_LM_SCAD_MCP, UAG_logistic_SCAD_MCP: these functions find a local minizer for the SCAD/MCP penalized linear models/logistic models. The arguments are: * design_matrix: the design matrix input, should be a two-dimensional numpy array; * outcome: the outcome, should be one dimensional numpy array, continuous for linear model, binary for logistic model; * beta_0: starting value; optional, if not declared, it will be calculated based on the Gauss-Markov theory estimators of $\beta$; * tol: tolerance parameter; the tolerance parameter is set to be the uniform norm of two iterations; * maxit: maximum number of iteratios allowed; * _lambda: _lambda value; * penalty: could be "SCAD" or "MCP"; * a=3.7, gamma=2: a for SCAD and gamma for MCP; it is recommended for a to be set as $3.7$; * L_convex: the L-smoothness constant for the convex component, if not declared, it will be calculated by itself * add_intercept_column: boolean, should the fucntion add an intercept column?

  • solution_path_LM, solution_path_logistic: calculate the solution path for linear/logistic models; the only difference from above is that lambda_ is now a one-dimensional numpy array for the values of $\lambda$ to be used.

  • UAG_LM_SCAD_MCP_strongrule, UAG_logistic_SCAD_MCP_strongrule work just like UAG_LM_SCAD_MCP, UAG_logistic_SCAD_MCP -- except they use strong rule to filter out many covariates before carrying out the optimization step. Same for solution_path_LM_strongrule and solution_path_logistic_strongrule. Strong rule increases the computational speed dramatically.

  • SNP_UAG_LM_SCAD_MCP and SNP_UAG_logistic_SCAD_MCP work similar to UAG_LM_SCAD_MCP and UAG_logistic_SCAD_MCP; and SNP_solution_path_LM and SNP_solution_path_logistic work similar to solution_path_LM, solution_path_logistic -- except that it takes plink1 files so it will be more memory-efficient.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

MI_AG-0.9.1.tar.gz (51.4 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

MI_AG-0.9.1-py3-none-any.whl (38.5 kB view details)

Uploaded Python 3

File details

Details for the file MI_AG-0.9.1.tar.gz.

File metadata

  • Download URL: MI_AG-0.9.1.tar.gz
  • Upload date:
  • Size: 51.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for MI_AG-0.9.1.tar.gz
Algorithm Hash digest
SHA256 d8efe659f645059bb3c5be0210deebe2a1a4f689bee0197546d9aca7d7ac6c44
MD5 581b69f2f9ddd40501c48cfda35acbbe
BLAKE2b-256 df4242adae0e2b4b703d241b18982ee47d20272efa915bf1a0d99bc64ae351b7

See more details on using hashes here.

File details

Details for the file MI_AG-0.9.1-py3-none-any.whl.

File metadata

  • Download URL: MI_AG-0.9.1-py3-none-any.whl
  • Upload date:
  • Size: 38.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for MI_AG-0.9.1-py3-none-any.whl
Algorithm Hash digest
SHA256 787779345cdc7e8874c3912da199db1f0dec6badedae7b3f4b46da4f2161bf67
MD5 4d66a027bba51556ea4f3aa63aeafbbe
BLAKE2b-256 d6be503e37de4e2848688816311bf8ce5f3c666e17a5d60d6347d26ae3001d92

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page