Skip to main content

General utilities for data science projects

Project description

instrumentum

General utilities for data science projects.

The goal of this repository is to consolidate functionalities that are tipically not found in other packages, and can facilitate some steps during a data science project.

The classes created in instrumentum -tipically- inherit from sklearn, which makes them easier to work with, and reuse some code that has been extensively battle-tested. Classes use parallelism whenever possible.

  1. Feature Generation
  2. Model Tuning
  3. Feature Selection
  4. Dashboards & Plots

Feature Generation

Class Interactions offers an easy way to create combinatiors of existing features. It is a lightweight class that can be extended with minimum effort.

This simple example showcase how this class can be used with a small DataFrame. The degree indicates how the different columns will be combined (careful, it grows exponentially)

arr = np.array([[5, 2, 3], [5, 2, 3], [1, 2, 3]])
arr = pd.DataFrame(arr, columns=["a", "b", "c"])

interactions = Interactions(operations=["sum", "prod"], degree=(2, 3), verbose=logging.DEBUG)
interactions.fit(arr)


pd.DataFrame(interactions.transform(arr), columns=interactions.get_feature_names_out())

Depending on the verbosity, the output can provide a large degree of information

Model Tuning

Class OptunaSearchCV implements a sklearn wrapper for the great Optuna class. It provides a set of distribution parameters that can be easily extended. In this example it makes use of the dispatcher by fetching a decision tree (which is named after the Sklearn class)

search_function = optuna_param_disp[DecisionTreeClassifier.__name__]
cv = RepeatedStratifiedKFold(n_splits=5, n_repeats=2)

os = OptunaSearchCV(
    estimator=DecisionTreeClassifier(),
    scoring="roc_auc",
    cv=cv,
    search_space=search_function,
    n_iter=5,
)
os.fit(X_train, y_train)

The output presents all the details depending on the verbosity

Usage

  • TODO

Contributing

Interested in contributing? Check out the contributing guidelines. Please note that this project is released with a Code of Conduct. By contributing to this project, you agree to abide by its terms.

License

instrumentum was created by Federico Montanana. It is licensed under the terms of the MIT license.

Credits

instrumentum uses:

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

instrumentum-0.8.12.tar.gz (22.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

instrumentum-0.8.12-py3-none-any.whl (29.0 kB view details)

Uploaded Python 3

File details

Details for the file instrumentum-0.8.12.tar.gz.

File metadata

  • Download URL: instrumentum-0.8.12.tar.gz
  • Upload date:
  • Size: 22.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.13.0-1022-azure

File hashes

Hashes for instrumentum-0.8.12.tar.gz
Algorithm Hash digest
SHA256 3cdd91557d9f7d17c88b9c42dbf336c795582229b9375fa6c3536683d1bb67aa
MD5 83707872916d6a9a854ca6521b993d04
BLAKE2b-256 a7ed00cf7e2dc76f1bd1677a98ab3b2dd59adc0f16067f50e1c2e29de789ac76

See more details on using hashes here.

File details

Details for the file instrumentum-0.8.12-py3-none-any.whl.

File metadata

  • Download URL: instrumentum-0.8.12-py3-none-any.whl
  • Upload date:
  • Size: 29.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.13 CPython/3.10.2 Linux/5.13.0-1022-azure

File hashes

Hashes for instrumentum-0.8.12-py3-none-any.whl
Algorithm Hash digest
SHA256 486ee49f358e3d6556c6ed03a5657374faef03bb06162dfd8dfd5b56a54c89c0
MD5 6930588a22c7dfb4fb195ecf141ffef1
BLAKE2b-256 b6c81aff99097d3a4167505fe77e612adf8ba563eff3c8b623b4f1c3f0642886

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page