Skip to main content

9 data sets for semi-supervised learning

Project description


This python module provides functions to load the 9 data sets published in the book "Semi-Supervised Learning".

They are converted from the matlab files as found on Olivier Chapelle's web page

Detailed description of the data

Each data set comes with a 10 or 12 different splits, and users can choose the number of labeled points their training gets to see.

Labels are provided for all points (for benchmarking), but the benchmarks suggest to use a fixed number of labels (10 or 100 for most sets).

Full details about the benchmarks are provided in chapter 23 of the book (online here:

This code

  • This code (c) by Oliver Obst has been released under MIT License (see the LICENSE file).

  • If you use these data sets in your research, you can cite the SSL book:

      editor =	  {O. Chapelle and B. Sch{\"o}lkopf and A. Zien},
      title = 	  {Semi-Supervised Learning},
      publisher = {MIT Press},
      year = 	  2006,
      url =       {},
      address =	  {Cambridge, MA}

Project details

Release history Release notifications | RSS feed

This version


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for sslbookdata, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size sslbookdata-0.1-py3-none-any.whl (32.5 MB) File type Wheel Python version py3 Upload date Hashes View
Filename, size sslbookdata-0.1.tar.gz (32.5 MB) File type Source Python version None Upload date Hashes View

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring DigiCert DigiCert EV certificate Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page