Skip to main content

SciKit-Learn Laboratory makes it easier to run machinelearning experiments with scikit-learn.

Project description

Build status https://coveralls.io/repos/EducationalTestingService/skll/badge.png?branch=master PyPI downloads Latest version on PyPI Bitdeli badge

This Python package provides utilities to make it easier to run machine learning experiments with scikit-learn.

Command-line Interface

run_experiment is a command-line utility for running a series of learners on datasets specified in a configuration file. For more information about using run_experiment (including a quick example), go here.

Python API

If you just want to avoid writing a lot of boilerplate learning code, you can use our simple Python API. The main way you’ll want to use the API is through the load_examples function and the Learner class. For more details on how to simply train, test, cross-validate, and run grid search on a variety of scikit-learn models see the documentation.

A Note on Pronunciation

SciKit-Learn Laboratory (SKLL) is pronounced “skull”: that’s where the learning happens.

Requirements

Changelog

  • v0.9.13

    • Added skll.data.write_feature_file (also available as skll.write_feature_file) to simplify outputting .jsonlines, .megam, and .tsv files.

    • Added more unit tests for handling .megam and .tsv files.

    • Fixed a bug that caused a crash when using gridmap.

    • grid_search_jobs now sets both n_jobs and pre_dispatch for GridSearchCV under the hood. This prevents a potential memory issue when dealing with large datasets and learners that cannot handle sparse data.

    • Changed logging format when using run_experiment to be a little more readable.

  • v0.9.12

    • Fixed serious issue where merging feature sets was not working correctly. All experiments conducted using feature set merging (i.e., where you specified a list of feature files and had them merged into one set for training/testing) should be considered invalid. In general, your results should previously have been poor and now should be much better.

    • Added more verbose regression output including descriptive statistics and Pearson correlation.

  • v0.9.11

    • Fixed all known remaining compatibility issues with Python 3.

    • Fixed bug in skll.metrics.kappa which would raise an exception if full range of ratings was not seen in both y_true and y_pred. Also added a unit test to prevent future regressions.

    • Added missing configuration file that would cause a unit test to fail.

    • Slightly refactored skll.Learner._create_estimator to make it a lot simpler to add new learners/estimators in the future.

    • Fixed a bug in handling of sparse matrices that would cause a crash if the number of features in the training and the test set were not the same. Also added a corresponding unit test to prevent future regressions.

    • We now require the backported configparser module for Python 2.7 to make maintaining compatibility with both 2.x and 3.x a lot easier.

  • v0.9.10

    • Fixed bug introduced in v0.9.9 that broke “predict” mode.

  • v0.9.9

    • Automatically generate a result summary file with all results for experiment in one TSV.

    • Fixed bug where printing predictions to file would cause a crash with some learners.

    • Run unit tests for Python 3.3 as well as 2.7.

    • More unit tests for increased coverage.

  • v0.9.8

    • Fixed crash due to trying to print name of grid objective which is now a str and not a function.

    • Added –version option to shell scripts.

  • v0.9.7

    • Can now use any objective function scikit-learn supports for tuning (i.e., any valid argument for scorer when instantiating GridSearchCV) in addition to those we define.

    • Removed ml_metrics dependency and we now support custom weights for kappa (through the API only so far).

    • Require’s scikit-learn 0.14+.

    • accuracy, quadratic_weighted_kappa, unweighted_kappa, f1_score_micro, and f1_score_macro functions are no longer available under skll.metrics. The accuracy and f1 score ones are no longer needed because we just use the built-in ones. As for quadratic_weighted_kappa and unweighted_kappa, they’ve been superseded by the kappa function that takes a weights argument.

    • Fixed issue where you couldn’t write prediction files if you were classifying using numeric classes.

  • v0.9.6

    • Fixes issue with setup.py importing from package when trying to install it (for real this time).

  • v0.9.5

    • You can now include feature files that don’t have class labels in your featuresets. At least one feature file has to have a label though, because we only support supervised learning so far.

    • Important: If you’re using TSV files in your experiments, you should either name the class label column ‘y’ or use the new tsv_label option in your configuration file to specify the name of the label column. This was necessary to support feature files without labels.

    • Fixed an issue with how version number was being imported in setup.py that would prevent installation if you didn’t already have the prereqs installed on your machine.

    • Made random seeds smaller to fix crash on 32-bit machines. This means that experiments run with previous versions of skll will yield slightly different results if you re-run them with v0.9.5+.

    • Added megam_to_csv for converting .megam files to CSV/TSV files.

    • Fixed a potential rounding problem with csv_to_megam that could slightly change feature values in conversion process.

    • Cleaned up test_skll.py a little bit.

    • Updated documentation to include missing fields that can be specified in config files.

  • v0.9.4

    • Documentation fixes

    • Added requirements.txt to manifest to fix broken PyPI release tarball.

  • v0.9.3

    • Fixed bug with merging feature sets that used to cause a crash.

    • If you’re running scikit-learn 0.14+, we use their StandardScaler, since the bug fix we include in FixedStandardScaler is in there.

    • Unit tests all pass again

    • Lots of little things related to using travis (which do not affect users)

  • v0.9.2

    • Fixed example.cfg path issue. Updated some documentation.

    • Made path in make_example_iris_data.py consistent with the updated one in example.cfg

  • v0.9.1

    • Fixed bug where classification experiments would raise an error about class labels not being floats

    • Updated documentation to include quick example for run_experiment.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

skll-0.9.13.tar.gz (71.3 kB view details)

Uploaded Source

File details

Details for the file skll-0.9.13.tar.gz.

File metadata

  • Download URL: skll-0.9.13.tar.gz
  • Upload date:
  • Size: 71.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for skll-0.9.13.tar.gz
Algorithm Hash digest
SHA256 8f991d62f422dc6c878d93f5cbe746bfc11e700772b309971f3e4ea68934ea80
MD5 bd265925181049e631ae14847c2dc80d
BLAKE2b-256 94594f1021cd407693db1db7964bbed8cbca943749cc41c18cf81af2f952146c

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page