Skip to main content

Dataset generator utility for data science projects

Project description

Ensembleset

PyPI release Python CIDevcontainer

Ensemblesets generates dataset ensembles by applying a randomized sequence of feature engineering methods to a randomized subset of input features.

1. Installation

Install the pre-release alpha from PyPI with:

pip install ensembleset

2. Usage

See the example usage notebook.

Initialize an ensembleset class instance, passing in the label name a training DataFrame. Optionally, include a test DataFrame and/or list of any string features. Then call the make_datasets() to generate an ensembleset, specifying:

  1. The number of individual datasets to generate.
  2. The number of feature to randomly select for each feature engineering step.
  3. The number of feature engineering steps to run.
import ensembleset.dataset as ds

data_ensemble=ds.DataSet(
    label='label_column_name',
    train_data=train_df,
    test_data=test_df
    string_features=['string_feature_column_names']
)

data_ensemble.make_datasets(
    n_datasets=10,
    n_features=7,
    n_steps=5
)

By default, generated datasets will be saved to HDF5 in data/dataset.h5 using the following structure:

dataset.h5
├──train
│   ├── labels
|   ├── 1
|   ├── .
|   ├── .
|   ├── .
|   └── n
│
└──test
    ├── labels
    ├── 1
    ├── .
    ├── .
    ├── .
    └── n

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ensembleset-1.0a11.tar.gz (238.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ensembleset-1.0a11-py3-none-any.whl (20.2 kB view details)

Uploaded Python 3

File details

Details for the file ensembleset-1.0a11.tar.gz.

File metadata

  • Download URL: ensembleset-1.0a11.tar.gz
  • Upload date:
  • Size: 238.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ensembleset-1.0a11.tar.gz
Algorithm Hash digest
SHA256 30139d4cd180e4cf5525f60a267114902ac2a8347aff658948244c28f38e2d13
MD5 44c945b957455103adaa5bc36d45e28b
BLAKE2b-256 6aa396111c6e3ed5f210b84a9088e7a94055b59b0d2e5e525b8992b96ca97402

See more details on using hashes here.

Provenance

The following attestation bundles were made for ensembleset-1.0a11.tar.gz:

Publisher: publish_pypi.yml on gperdrizet/ensembleset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file ensembleset-1.0a11-py3-none-any.whl.

File metadata

  • Download URL: ensembleset-1.0a11-py3-none-any.whl
  • Upload date:
  • Size: 20.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for ensembleset-1.0a11-py3-none-any.whl
Algorithm Hash digest
SHA256 22fcd81bb7f3e3acd73c0c4dce8c482edde27121639cd3c9c41ea56cae1894f0
MD5 447e89948df16d455ceda98aadbd0fe8
BLAKE2b-256 17a7f62f0f8d002259bd4c5d7f445fb69f79c819335c93e65994780e8a08e087

See more details on using hashes here.

Provenance

The following attestation bundles were made for ensembleset-1.0a11-py3-none-any.whl:

Publisher: publish_pypi.yml on gperdrizet/ensembleset

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page