Skip to main content

SciKit-Learn Laboratory makes it easier to run machine learning experiments with scikit-learn.

Project description

Gitlab CI status Azure Pipelines status https://codecov.io/gh/EducationalTestingService/skll/branch/main/graph/badge.svg Latest version on PyPI License Conda package for SKLL Supported python versions for SKLL DOI for citing SKLL 1.0.0 https://mybinder.org/badge_logo.svg

This Python package provides command-line utilities to make it easier to run machine learning experiments with scikit-learn. One of the primary goals of our project is to make it so that you can run scikit-learn experiments without actually needing to write any code other than what you used to generate/extract the features.

Installation

You can install using either pip or conda. See details here.

Requirements

Command-line Interface

The main utility we provide is called run_experiment and it can be used to easily run a series of learners on datasets specified in a configuration file like:

[General]
experiment_name = Titanic_Evaluate_Tuned
# valid tasks: cross_validate, evaluate, predict, train
task = evaluate

[Input]
# these directories could also be absolute paths
# (and must be if you're not running things in local mode)
train_directory = train
test_directory = dev
# Can specify multiple sets of feature files that are merged together automatically
featuresets = [["family.csv", "misc.csv", "socioeconomic.csv", "vitals.csv"]]
# List of scikit-learn learners to use
learners = ["RandomForestClassifier", "DecisionTreeClassifier", "SVC", "MultinomialNB"]
# Column in CSV containing labels to predict
label_col = Survived
# Column in CSV containing instance IDs (if any)
id_col = PassengerId

[Tuning]
# Should we tune parameters of all learners by searching provided parameter grids?
grid_search = true
# Function to maximize when performing grid search
objectives = ['accuracy']

[Output]
# Also compute the area under the ROC curve as an additional metric
metrics = ['roc_auc']
# The following can also be absolute paths
logs = output
results = output
predictions = output
probability = true
models = output

For more information about getting started with run_experiment, please check out our tutorial, or our config file specs.

You can also follow this interactive Jupyter tutorial.

We also provide utilities for:

Python API

If you just want to avoid writing a lot of boilerplate learning code, you can also use our simple Python API which also supports pandas DataFrames. The main way you’ll want to use the API is through the Learner and Reader classes. For more details on our API, see the documentation.

While our API can be broadly useful, it should be noted that the command-line utilities are intended as the primary way of using SKLL. The API is just a nice side-effect of our developing the utilities.

A Note on Pronunciation

SKLL logo
doc/spacer.png

SciKit-Learn Laboratory (SKLL) is pronounced “skull”: that’s where the learning happens.

Talks

  • Simpler Machine Learning with SKLL 1.0, Dan Blanchard, PyData NYC 2014 (video | slides)

  • Simpler Machine Learning with SKLL, Dan Blanchard, PyData NYC 2013 (video | slides)

Citing

If you are using SKLL in your work, you can cite it as follows: “We used scikit-learn (Pedragosa et al, 2011) via the SKLL toolkit (https://github.com/EducationalTestingService/skll).”

Books

SKLL is featured in Data Science at the Command Line by Jeroen Janssens.

Changelog

See GitHub releases.

Contribute

Thank you for your interest in contributing to SKLL! See CONTRIBUTING.md for instructions on how to get started.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skll-5.1.0.tar.gz (132.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

skll-5.1.0-py3-none-any.whl (149.1 kB view details)

Uploaded Python 3

File details

Details for the file skll-5.1.0.tar.gz.

File metadata

  • Download URL: skll-5.1.0.tar.gz
  • Upload date:
  • Size: 132.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for skll-5.1.0.tar.gz
Algorithm Hash digest
SHA256 cfd690177a9ca2e7a8aa2c060ea86b14aded6ca045dc381609d0f9ed3c1b216d
MD5 4506a58c86603c0a5b45b3fd045875aa
BLAKE2b-256 b52d7101e5137c26cffc3b43586247893baae13c0901d106c1fbb819aa8688a9

See more details on using hashes here.

File details

Details for the file skll-5.1.0-py3-none-any.whl.

File metadata

  • Download URL: skll-5.1.0-py3-none-any.whl
  • Upload date:
  • Size: 149.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.10

File hashes

Hashes for skll-5.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04ca1af09d95304a12bf8e2968a82b5c105f80b406408832361c461a109b51da
MD5 56749c3b88b17b66887f078eb6923a0b
BLAKE2b-256 9aa0061cd3f05724fd0cb806f1f2cb36b216420a88009ce750f34788b881bc0a

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page