Skip to main content

Ensemble dataset generator for tabular data prediction and modeling projects.

Project description

EnsembleSet

PyPI release Python CIDevcontainer

EnsembleSet generates dataset ensembles by applying a randomized sequence of feature engineering methods to a randomized subset of input features.

1. Installation

Install the pre-release alpha from PyPI with:

pip install ensembleset

2. Usage

See the example usage notebook.

Initialize an EnsembleSet class instance, passing in the label name a training DataFrame. Optionally, include a test DataFrame and/or list of any string features. Then call the make_datasets() to generate an EnsembleSet, specifying:

  1. The number of individual datasets to generate.
  2. The number of features to randomly select for each feature engineering step.
  3. The number of feature engineering steps to run.
import ensembleset.dataset as ds

data_ensemble=ds.DataSet(
    label='label_column_name',
    train_data=train_df,
    test_data=test_df
    string_features=['string_feature_column_names']
)

data_ensemble.make_datasets(
    n_datasets=10,
    n_features=7,
    n_steps=5
)

By default, generated datasets will be saved to HDF5 in data/dataset.h5 using the following structure:

dataset.h5
├──train
│   ├── labels
|   ├── 1
|   ├── .
|   ├── .
|   ├── .
|   └── n
│
└──test
    ├── labels
    ├── 1
    ├── .
    ├── .
    ├── .
    └── n

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensembleset-1.0a13.tar.gz (237.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ensembleset-1.0a13-py3-none-any.whl (21.5 kB view details)

Uploaded Python 3

File details

Details for the file ensembleset-1.0a13.tar.gz.

File metadata

  • Download URL: ensembleset-1.0a13.tar.gz
  • Upload date:
  • Size: 237.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ensembleset-1.0a13.tar.gz
Algorithm Hash digest
SHA256 cc36cfa89f4f5cf9bfb0d28488bb3544baa8eb25926733d15b0cccd8741d3c8b
MD5 c475182d9c598af76ab10aa7c52d3c42
BLAKE2b-256 64432dacde5f8a6313ee6dbf1820a626cac14a0e07e7410c1dd4fe4eec15fd9d

See more details on using hashes here.

Provenance

The following attestation bundles were made for ensembleset-1.0a13.tar.gz:

Publisher: publish_pypi.yml on gperdrizet/ensembleset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ensembleset-1.0a13-py3-none-any.whl.

File metadata

  • Download URL: ensembleset-1.0a13-py3-none-any.whl
  • Upload date:
  • Size: 21.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ensembleset-1.0a13-py3-none-any.whl
Algorithm Hash digest
SHA256 4f5c6dcb7efa73cd753baed1708f756979bd189a554785b5a790e705061a9698
MD5 5e9f4d7d063ba7649866aba1a733f935
BLAKE2b-256 e8f62059271dcabca1dd027494a0fd1f47a3d12d6276f46e410f2d3ce015838c

See more details on using hashes here.

Provenance

The following attestation bundles were made for ensembleset-1.0a13-py3-none-any.whl:

Publisher: publish_pypi.yml on gperdrizet/ensembleset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page