Skip to main content

Packaged data modules for multiview learning benchmarks

Project description

DOI Build Status downloads version codecov

Multiview Data

  • Experimental package to give easy access to key toy and simulated datasets from the (deep) multiview learning literature
  • Feedback and contributions are welcome

Getting Started

Datasets are imported and built with the following syntax:

import os
from multiviewdata.torchdatasets import XRMB

my_dataset = XRMB(root=os.getcwd(),download=True)

Datasets have somewhat standardised batches.

my_dataset[0]['index'] # returns the index of the batch element
my_dataset[0]['views'] # returns a tuple/list of each view

Individual datasets may have additional information such as "label", "partial", and "userid". For more information check the docs for each dataset.

Roadmap

  • option to convert torch datasets to dictionaries of numpy arrays to allow for batch methods
  • additional datasets
  • standardised plotting functions for each dataset?
  • benchmarks?
  • tensorflow versions?

Sources

XRMB

https://home.ttic.edu/~klivescu/XRMB_data/full/README

This directory contains data based on the University of Wisconsin X-ray Microbeam Database (referred to here as XRMB).

The original XRMB manual can be found here: http://www.haskins.yale.edu/staff/gafos_downloads/ubdbman.pdf

We acknowledge John Westbury for providing the original data and for permitting this post-processed version to be redistributed. The original data collection was supported (in part) by research grant number R01 DC 00820 from the National Institute of Deafness and Other Communicative Disorders, U.S. National Institutes of Health.

The post-processed data provided here was produced as part of work supported in part by NSF grant IIS-1321015.

Some of the original XRMB articulatory data was missing due to issues such as pellet tracking errors. The data has been reconstructed in using the algorithm described in this paper:

Wang, Arora, and Livescu, "Reconstruction of articulatory measurements with smoothed low-rank matrix completion," SLT 2014. http://ttic.edu/livescu/papers/wang_SLT2014.pdf

The data provided here has been used for multi-view acoustic feature learning in this paper:

Wang, Arora, Livescu, and Bilmes, "Unsupervised learning of acoustic features via deep canonical correlation analysis," ICASSP 2015. http://ttic.edu/livescu/papers/wang_ICASSP2015.pdf

If you use this version of the data, please cite the papers above.

WIW

https://github.com/rotmanguy/DPCCA MIT License

Cars3d

https://github.com/llvqi/multiview_and_self-supervision Apache License 2.0

MNIST

https://github.com/bcdutton/AdversarialCanonicalCorrelationAnalysis Unlicensed

MFeat

Twitter

https://github.com/abenton/wgcca MIT License

CUB Image-Caption

https://github.com/iffsid/mmvae We use Caltech-UCSD Birds (CUB) dataset, with the bird images and their captions serving as two modalities. GNU General Public License v3.0

MNIST-SVHN Dataset

https://github.com/iffsid/mmvae We construct a dataset of pairs of MNIST and SVHN such that each pair depicts the same digit class. Each instance of a digit class in either dataset is randomly paired with 20 instances of the same digit class from the other dataset. GNU General Public License v3.0

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multiviewdata-0.7.tar.gz (23.2 kB view details)

Uploaded Source

Built Distribution

multiviewdata-0.7-py3-none-any.whl (33.0 kB view details)

Uploaded Python 3

File details

Details for the file multiviewdata-0.7.tar.gz.

File metadata

  • Download URL: multiviewdata-0.7.tar.gz
  • Upload date:
  • Size: 23.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for multiviewdata-0.7.tar.gz
Algorithm Hash digest
SHA256 29455031d35a0a5dacb57d67c27f510ec63e309324ee9cec3261d6a6cdeef584
MD5 1f136f914c71b1e544fc6707f8312292
BLAKE2b-256 44349cfebfa2eef84185e53e5faa4e21dfe4a41bb54e2a8096e4637c47ca03d5

See more details on using hashes here.

File details

Details for the file multiviewdata-0.7-py3-none-any.whl.

File metadata

  • Download URL: multiviewdata-0.7-py3-none-any.whl
  • Upload date:
  • Size: 33.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.7.16

File hashes

Hashes for multiviewdata-0.7-py3-none-any.whl
Algorithm Hash digest
SHA256 f2bdd3d2d42774798e0cc04d81d2b32a3b0ec84844f513746fda55df4309f5c0
MD5 ef45e65d2aa0baeaa4ae42d3dc37c978
BLAKE2b-256 cdafd9e373447ed656b54fb1f76c6f569b015f8a1938f3d1431cb1c019366f1e

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page