Skip to main content

9 data sets for semi-supervised learning

Project description


This python module provides functions to load the 9 data sets published in the book "Semi-Supervised Learning".

They are converted from the matlab files as found on Olivier Chapelle's web page

Detailed description of the data

Each data set comes with a 10 or 12 different splits, and users can choose the number of labeled points their training gets to see.

Labels are provided for all points (for benchmarking), but the benchmarks suggest to use a fixed number of labels (10 or 100 for most sets).

Full details about the benchmarks are provided in chapter 23 of the book (online here:

This code

  • This code (c) by Oliver Obst has been released under MIT License (see the LICENSE file).

  • If you use these data sets in your research, you can cite the SSL book:

      editor =	  {O. Chapelle and B. Sch{\"o}lkopf and A. Zien},
      title = 	  {Semi-Supervised Learning},
      publisher = {MIT Press},
      year = 	  2006,
      url =       {},
      address =	  {Cambridge, MA}

Project details

Release history Release notifications | RSS feed

This version


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sslbookdata-0.1.tar.gz (32.5 MB view hashes)

Uploaded source

Built Distribution

sslbookdata-0.1-py3-none-any.whl (32.5 MB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page