SciKit-Learn Laboratory makes it easier to run machinelearning experiments with scikit-learn.
This Python package provides utilities to make it easier to run machine learning experiments with scikit-learn.
run_experiment is a command-line utility for running a series of learners on datasets specified in a configuration file. For more information about using run_experiment (including a quick example), go here.
If you just want to avoid writing a lot of boilerplate learning code, you can use our simple Python API. The main way you’ll want to use the API is through the load_examples function and the Learner class. For more details on how to simply train, test, cross-validate, and run grid search on a variety of scikit-learn models see the documentation.
A Note on Pronunciation
SciKit-Learn Laboratory (SKLL) is pronounced “skull”: that’s where the learning happens.
- Updated generate_predictions to use latest API.
- Switched to using multiprocessing-compatible logging. This should fix some intermittent deadlocks.
- Switched to using miniconda for install Python on Travis-CI.
- Fixed crash when modelpath is blank and task is not cross_validate.
- Fixed crash with convert_examples when given a generator.
- Refactored skll.data’s private _*_dict_iter functions to be classes to reduce code duplication.
- Fixed crash with SVR on Python 3 from kernel type being a byte string.
- Fixed crash with DecisionTreeRegressor due to an invalid criterion being set.
- Fixed setup.py issue where requirements weren’t being installed via pip.
- Added SKLL version number and Pearson correlation to result summary files.
- No longer crash if a result summary file doesn’t exist, and instead just print an error message.
- Tweak handling of logging under the hood to make sure logging settings are applied to all loggers.
- Fixed crash with GradientBoostingRegressor and MultionomialNB from typo in previous release.
- Fixed crash related to loading feature sets that contain files that are unlabelled.
- Fixed crash related to loading .megam files with unlabelled examples.
- Added new versions of kappa metrics that make it so adjacent ratings are not penalized. For example, 1 and 2 will be considered to be equal, whereas 1 and 3 will have a difference of 1 for when building the weights matrix.
- Cleaned up a bit of the Sphinx documentation.
- Each module now has its own separate logger, which should make logging messages more informative.
- Made handling of non-convertible IDs when using ids_to_floats uniform across date file types. All will now raise a ValueError when faced with a string that does not look like a float.
- Now raise an error when duplicate feature names are encountered in .megam files.
- No longer set compute_importances for learner based on decision trees, since that is no longer necessary as of scikit-learn 0.14.
- Added support for DecisionTreeRegressor and RandomForestRegressor.
- Fixed issue #60 with filtering examples via cv_folds_location file.
- Added unit tests for Grid Map mode on Travis using the scripts described by this gist.
- Switched from using mix-in to decorator for handling rescaled versions of regressors. Code’s a lot simpler now.
- Added support for suppressing “Loading…” messages to most functions in the experiments module.
- Refactored convert_examples to simply call new version of load_examples that can take a list of example dictionaries in addition to filenames.
- Made all unit tests much less verbose.
- Fix an obscure issue related to loading examples when SKLL_MAX_CONCURRENT_PROCESSES is set to 1.
- Added warning when configuration files contain settings that are invalid.
- Fixed a crash because job_results was not defined in grid-mode.
- Cleaned up a lot of things related to unit tests and their discovery.
- Added unit tests to manifest so that people who install this through pip could run the tests themselves if they wanted.
- Now raise an exception when using ids_to_floats with non-numeric IDs.
- Fixed a number of inconsistencies with cv_folds_location and ids_to_floats (including GH issue #57).
- Fixed unit tests for cv_folds_location and ids_to_floats so that they actually test the right things now.
- Fixed crash when using cv_folds_location with ids_to_floats.
- Will now skip IDs that are missing from cv_folds/grid_search_folds dicts and print a warning instead of crashing.
- Added additional kappa unit tests to help detect/prevent future issues.
- API change: model_type is no longer a keyword argument to Learner constructor, and is now required. This was done to help prevent unexpected issues from defaulting to LogisticRegression.
- No longer keep extra temporary config files around when running run_experiment in ablation mode.
- Fixed crash with kappa when given two sets of ratings that are both missing an intermediate value (e.g., [1, 2, 4]).
- Added summarize_results script for creating a nice summary TSV file from a list of JSON results files.
- Summary files for ablation studies now have an extra column that says which feature was removed.
- Added initial version of skll_convert script for converting between .jsonlines, .megam, and .tsv data file formats.
- Fixed bug in _megam_dict_iter where labels for instances with all zero features were being incorrectly set to None.
- Fixed bug in _tsv_dict_iter where features with zero values were being retained with values set as ‘0’ instead of being removed completely. This caused DictVectorizer to create extra features, so results may change a little bit if you were using .tsv files.
- Fixed crash with predict and train_only modes when running on the grid.
- No longer use process pools to load files if SKLL_MAX_CONCURRENT_PROCESSES is 1.
- Added more informative error message when trying to load a file without any features.
- Made processes non-daemonic to fix pool.map issue with running multiple configurations files at the same time with run_experiment.
- run_experiment can now take multiple configuration files.
- Fixed issue where model parameters and scores were missing in evaluate mode
- Added skll.data.convert_examples function to convert a list dictionaries to an ExamplesTuple.
- Added a new optional field to configuration file, ids_to_floats, to help save memory if you have a massive number of instances with numeric IDs.
- Replaced use_dense_features and scale_features options with feature_scaling. See the run_experiment documentation for details.
- Fixed summary output for ablation experiments. Previously summary files would not include all results.
- Added ablation unit tests.
- Fixed issue with generating PDF documentation.
- Added two new required fields to the configuration file format under the General heading: experiment_name and task. See the run_experiment documentation for details.
- Fixed an issue where the “loading…” message was never being printed when loading data files.
- Fixed a bug where keyword arguments were being ignored for metrics when calculating final scores for a tuned model. This means that previous reported results may be wrong for tuning metrics that use keywords arguments: f1_score_micro, f1_score_macro, linear_weighted_kappa, and quadratic_weighted_kappa.
- Now try to convert IDs to floats if they look like them to save memory for very large files.
- kappa now supports negative ratings.
- Fixed a crash when specifing grid_search_jobs and pre-specified folds.
- Hotfix to fix issue where grid_search_jobs setting was being overriden by grid_search_folds.
- Added skll.data.write_feature_file (also available as skll.write_feature_file) to simplify outputting .jsonlines, .megam, and .tsv files.
- Added more unit tests for handling .megam and .tsv files.
- Fixed a bug that caused a crash when using gridmap.
- grid_search_jobs now sets both n_jobs and pre_dispatch for GridSearchCV under the hood. This prevents a potential memory issue when dealing with large datasets and learners that cannot handle sparse data.
- Changed logging format when using run_experiment to be a little more readable.
- Fixed serious issue where merging feature sets was not working correctly. All experiments conducted using feature set merging (i.e., where you specified a list of feature files and had them merged into one set for training/testing) should be considered invalid. In general, your results should previously have been poor and now should be much better.
- Added more verbose regression output including descriptive statistics and Pearson correlation.
- Fixed all known remaining compatibility issues with Python 3.
- Fixed bug in skll.metrics.kappa which would raise an exception if full range of ratings was not seen in both y_true and y_pred. Also added a unit test to prevent future regressions.
- Added missing configuration file that would cause a unit test to fail.
- Slightly refactored skll.Learner._create_estimator to make it a lot simpler to add new learners/estimators in the future.
- Fixed a bug in handling of sparse matrices that would cause a crash if the number of features in the training and the test set were not the same. Also added a corresponding unit test to prevent future regressions.
- We now require the backported configparser module for Python 2.7 to make maintaining compatibility with both 2.x and 3.x a lot easier.
- Fixed bug introduced in v0.9.9 that broke predict mode.
- Automatically generate a result summary file with all results for experiment in one TSV.
- Fixed bug where printing predictions to file would cause a crash with some learners.
- Run unit tests for Python 3.3 as well as 2.7.
- More unit tests for increased coverage.
- Fixed crash due to trying to print name of grid objective which is now a str and not a function.
- Added –version option to shell scripts.
- Can now use any objective function scikit-learn supports for tuning (i.e., any valid argument for scorer when instantiating GridSearchCV) in addition to those we define.
- Removed ml_metrics dependency and we now support custom weights for kappa (through the API only so far).
- Require’s scikit-learn 0.14+.
- accuracy, quadratic_weighted_kappa, unweighted_kappa, f1_score_micro, and f1_score_macro functions are no longer available under skll.metrics. The accuracy and f1 score ones are no longer needed because we just use the built-in ones. As for quadratic_weighted_kappa and unweighted_kappa, they’ve been superseded by the kappa function that takes a weights argument.
- Fixed issue where you couldn’t write prediction files if you were classifying using numeric classes.
- Fixes issue with setup.py importing from package when trying to install it (for real this time).
- You can now include feature files that don’t have class labels in your featuresets. At least one feature file has to have a label though, because we only support supervised learning so far.
- Important: If you’re using TSV files in your experiments, you should either name the class label column ‘y’ or use the new tsv_label option in your configuration file to specify the name of the label column. This was necessary to support feature files without labels.
- Fixed an issue with how version number was being imported in setup.py that would prevent installation if you didn’t already have the prereqs installed on your machine.
- Made random seeds smaller to fix crash on 32-bit machines. This means that experiments run with previous versions of skll will yield slightly different results if you re-run them with v0.9.5+.
- Added megam_to_csv for converting .megam files to CSV/TSV files.
- Fixed a potential rounding problem with csv_to_megam that could slightly change feature values in conversion process.
- Cleaned up test_skll.py a little bit.
- Updated documentation to include missing fields that can be specified in config files.
- Documentation fixes
- Added requirements.txt to manifest to fix broken PyPI release tarball.
- Fixed bug with merging feature sets that used to cause a crash.
- If you’re running scikit-learn 0.14+, we use their StandardScaler, since the bug fix we include in FixedStandardScaler is in there.
- Unit tests all pass again
- Lots of little things related to using travis (which do not affect users)
- Fixed example.cfg path issue. Updated some documentation.
- Made path in make_example_iris_data.py consistent with the updated one in example.cfg
- Fixed bug where classification experiments would raise an error about class labels not being floats
- Updated documentation to include quick example for run_experiment.
Release history Release notifications
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
|Filename, size & hash||File type||Python version||Upload date|
|skll-0.18.1.tar.gz (95.8 kB) View hashes||Source||None|