Tools for the DataCamp Creating Robust Python Projects course
Project description
The datacamprojects
python package
Skip the boilerplate of scikit-learn machine learning examples.
Installation
pip install datacamprojects
Usage
In a shell environment, you can run datacamprojects
with no arguments to perform a Logistic Regression
on the digits
dataset.
This will produce a 10 x 10 confusion matrix with the Accuracy Score at the top.
You can also pass arguments to datacamprojects at the command line.
For example,
datacamprojects -dataset diabetes -model linear_model.Lasso
# Or
datacamprojects -d diabetes -m linear_model.Lasso
will run a linear regression with lasso regularization (L1)
on the diabetes
dataset.
The dataset
argument can be any of
the following built-in scikit-learn datasets:
- Regression
boston
diabetes
- Classification
digits
iris
wine
breast_cancer
The model
argument refers to the model type and name from scikit-learn.
The first part is the submodule, e.g.
linear_model
naive_bayes
ensemble
svm
while the second is what is actually imported, e.g.
LinearRegression
GaussianNB
RandomForestRegressor
SVC
Simplify code to a single function call per step:
from sklearn.metrics import confusion_matrix, accuracy_score
import datacamprojects as dcp
dataset = dcp.get_data('digits')
x_train, x_test, y_train, y_test = dcp.split_data(dataset)
model = dcp.get_model(model_type='ensemble',
model_name='RandomForestClassifier')
fit = model.fit(x_train, y_train)
dcp.pickle_model(filename='digits_rf.pickle', model=fit)
predictions = fit.predict(x_test)
confmat = confusion_matrix(y_true=y_test, y_pred=predictions)
accuracy = accuracy_score(y_true=y_test, y_pred=predictions)
dcp.confusion_matrix_plot(cm=confmat,
acc=accuracy,
filename='digits_rf.png')
Or run a whole pipeline with one function:
import datacamprojects as dcp
dcp.classification(dataset='digits',
model_type='ensemble',
model_name='RandomForestClassifier',
pickle_name='digits_rf.pickle',
plot_name='digits_rf.png')
For inspiration, look at the example pipeline in the pipeline folder of the datacamprojects repo.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for datacamprojects-0.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | ca96954f9dcc32567294a848bd8f199f83c1ed702fd39617efed1672bcdbdba3 |
|
MD5 | b563cf96ea8a76a725fddd13ac59674e |
|
BLAKE2b-256 | 5b1a4d723cd837a214a13396f5fcb545b07bf3b3532ac3828d1e6ecdadef52c7 |