Skip to main content

Tools for the DataCamp Creating Robust Python Projects course

Project description

The datacamprojects python package

Skip the boilerplate of scikit-learn machine learning examples.

Installation

pip install datacamprojects

Usage

In a shell environment, you can run datacamprojects with no arguments to perform a Logistic Regression on the digits dataset.

This will produce a 10 x 10 confusion matrix with the Accuracy Score at the top.

You can also pass arguments to datacamprojects at the command line.

For example,

datacamprojects -dataset diabetes -model linear_model.Lasso
# Or
datacamprojects -d diabetes -m linear_model.Lasso

will run a linear regression with lasso regularization (L1) on the diabetes dataset.

The dataset argument can be any of the following built-in scikit-learn datasets:

  • Regression
    • boston
    • diabetes
  • Classification
    • digits
    • iris
    • wine
    • breast_cancer

The model argument refers to the model type and name from scikit-learn. The first part is the submodule, e.g.

  • linear_model
  • naive_bayes
  • ensemble
  • svm

while the second is what is actually imported, e.g.

  • LinearRegression
  • GaussianNB
  • RandomForestRegressor
  • SVC

Simplify code to a single function call per step:

from sklearn.metrics import confusion_matrix, accuracy_score
import datacamprojects as dcp

dataset = dcp.get_data('digits')
x_train, x_test, y_train, y_test = dcp.split_data(dataset)

model = dcp.get_model(model_type='ensemble',
                      model_name='RandomForestClassifier')

fit = model.fit(x_train, y_train)
dcp.pickle_model(filename='digits_rf.pickle', model=fit)
predictions = fit.predict(x_test)

confmat = confusion_matrix(y_true=y_test, y_pred=predictions)
accuracy = accuracy_score(y_true=y_test, y_pred=predictions)

dcp.confusion_matrix_plot(cm=confmat,
                          acc=accuracy,
                          filename='digits_rf.png')

Or run a whole pipeline with one function:

import datacamprojects as dcp

dcp.classification(dataset='digits',
                   model_type='ensemble',
                   model_name='RandomForestClassifier',
                   pickle_name='digits_rf.pickle',
                   plot_name='digits_rf.png')

For inspiration, look at the example pipeline in the pipeline folder of the datacamprojects repo.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datacamprojects-0.0.1.tar.gz (3.8 kB view details)

Uploaded Source

Built Distribution

datacamprojects-0.0.1-py3-none-any.whl (4.6 kB view details)

Uploaded Python 3

File details

Details for the file datacamprojects-0.0.1.tar.gz.

File metadata

  • Download URL: datacamprojects-0.0.1.tar.gz
  • Upload date:
  • Size: 3.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for datacamprojects-0.0.1.tar.gz
Algorithm Hash digest
SHA256 93777a733766b35dde7f8752ef3ed0d4397326920deaa3e21b036e2525043a41
MD5 d90bab66d434020942dd9414cc6b3a98
BLAKE2b-256 1c5c0c5b2c742445816c15f8df04f4af5ffed1a260875afd79736c66c83f222a

See more details on using hashes here.

File details

Details for the file datacamprojects-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: datacamprojects-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 4.6 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.12.1 pkginfo/1.4.2 requests/2.19.1 setuptools/40.2.0 requests-toolbelt/0.8.0 tqdm/4.26.0 CPython/3.7.0

File hashes

Hashes for datacamprojects-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 ca96954f9dcc32567294a848bd8f199f83c1ed702fd39617efed1672bcdbdba3
MD5 b563cf96ea8a76a725fddd13ac59674e
BLAKE2b-256 5b1a4d723cd837a214a13396f5fcb545b07bf3b3532ac3828d1e6ecdadef52c7

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page