Skip to main content

A simulation framework for supervised learning data. The functionalities are specifically designed to let the user a maximum degrees of freedom, to ultimately fulfill the research purpose. Furthermore, feature importances of the simulation can be created on a local and a global level. This is particular interesting, for instance, to benchmark feature selection algorithms.

Project description

Simulating Supervised Learning Data drawing

With xypy.Xy() you can convienently simulate supervised learning data, e.g. regression and classification problems. The simulation can be very specific, since there are many degrees of freedom for the user. For instance, the functional shape of the nonlinearity is user-defined as well. Interactions can be formed and (co)variances altered. For a more specific motivation you can visit our blog. I have adapted this package from my R version, which you can check out here.

Usage

You can can checkout details about the package on testPYPI or on GitHub.

You can convienently install the package via PYPI with the following command.

pip install xypy

There is an example test script on my GitHub, which will you get started with the simulation.

Simulate data

You can simulate regression and classification data with interactions and a user-specified non-linearity. With the stn argument you can alter the signal to noise ratio of your simulation. I strongly encourage you to read this blog post, where I've analyzed OLS coefficients with different signal to noise ratios.

# load the library
from xypy import Xy
# simulate regression data
my_sim = Xy(n = 1000, 
            numvars = [10,10], 
            catvars = [3, 2], 
            noisevars = 50, 
            stn = 100.0)

# get a glimpse of the simulation
my_sim

# plot the true underlying effects
my_sim.plot()

# extract the data
X, y = my_sim.data

# extract the true underlying model weights
my_sim.coef_

Feature Selection

You can extract a feature importance of your simulation. For instance, to benchmark feature selection algorithms. You can read up on a small benchmark I did with this feature on our blog. You can perform the same analysis easily in Python as well.

# Feature Importance 
my_sim.varimp()
drawing

Feel free to contact me with input and ideas.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xypy-0.0.7.tar.gz (8.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xypy-0.0.7-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

File details

Details for the file xypy-0.0.7.tar.gz.

File metadata

  • Download URL: xypy-0.0.7.tar.gz
  • Upload date:
  • Size: 8.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for xypy-0.0.7.tar.gz
Algorithm Hash digest
SHA256 0128baff588fd13d1a9aef95a1dcb4d8bd7f7fc71ff366f1fee25e6f36a0717e
MD5 187a408217d6519f19abc64093bc5a47
BLAKE2b-256 df1ebf5ded2067a2de675f1085975da317df919abd0533e9f8cbf7a2b6be862c

See more details on using hashes here.

File details

Details for the file xypy-0.0.7-py3-none-any.whl.

File metadata

  • Download URL: xypy-0.0.7-py3-none-any.whl
  • Upload date:
  • Size: 9.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.8

File hashes

Hashes for xypy-0.0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 4dfc6caf3ca0e567e55e30e30c040769cb18fca821e194b0f754f353c888dc57
MD5 6baae0e1fbeae6d8f59a752a435abd6a
BLAKE2b-256 005956fc61398eb43d380a94afd973cb4b13d8488b79a87e9411486ba390873e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page