Skip to main content

A python package for multi-modal learning with incomplete data

Project description

PyPI - Version PyPI - Python Version Read the Docs CI Tests Codecov CodeQL PRs Welcome GitHub repo size GitHub License

iMML Logo

Overview

iMML is a Python package that provides a robust tool-set for integrating, processing, and analyzing incomplete multi-modal datasets to support a wide range of machine learning tasks. Starting with a dataset containing N samples with K modalities, iMML effectively handles missing data for classification, clustering, data retrieval, imputation and amputation, feature selection, feature extraction and data exploration, hence enabling efficient analysis of partially observed samples.

Overview of iMML for multi-modal learning with incomplete data

Overview of iMML for multi-modal learning with incomplete data.

Background

Multi-modal learning, where diverse data types are integrated and analyzed together, has emerged as a critical field in artificial intelligence. Multi-modal machine learning models that effectively integrate multiple data modalities generally outperform their uni-modal counterparts by leveraging more comprehensive and complementary information. However, most algorithms in this field assume fully observed data, an assumption that is often unrealistic in real-world scenarios.

Motivation

Learning from incomplete multi-modal data has seen an important growth last years. Despite this progress, several limitations still persist. The landscape of available methods is fragmented, largely due to the diversity of use cases and data modalities, which complicates both their application and benchmarking. Systematic use and comparison of the current methods are often hindered by practical challenges, such as incompatible input data formats and conflicting software dependencies. As a result, researchers and practitioners frequently face challenges in choosing a practical method and invest considerable efforts into reconciling codebases, rather than addressing the core scientific questions. This suggests that the community currently lacks robust and standardized tools to effectively handle incomplete multi-modal data.

Key features

To address this gap, we have developed iMML, a Python package designed for multi-modal learning with incomplete data. The key features of this package are:

  • Comprehensive toolkit: iMML offers a broad set of tools for integrating, processing, and analyzing incomplete multi-modal datasets implemented as a single, user-friendly interface to facilitate adoption by a wide community of users. The package includes extensive technical testing to ensure robustness, and thorough documentation enables end-users to apply its functionality effectively.
  • Accessible: iMML makes the tools readily available to the Python community, simplifying their usage, comparison, and benchmarking, and thereby addresses the current lack of resources and standardized methods for handling incomplete multi-modal data.
  • Extensible: iMML provides a common framework where researchers can contribute and integrate new approaches, serving as a community platform for hosting new algorithms and methods.

Installation

Run the following command to install the most recent release of iMML using pip:

pip install imml

Or if you prefer uv, use:

uv pip install imml

Some features of iMML rely on optional dependencies. To enable these additional features, ensure you install the required packages as described in our documentation: https://imml.readthedocs.io/stable/main/installation.html.

Usage

This package provides a user-friendly interface to apply these algorithms to user-provided data. iMML was designed to be compatible with widely-used machine learning and data analysis tools, such as Pandas, NumPy, Scikit-learn, and Lightning AI, hence allowing researchers to apply machine learning models with minimal programming effort. Moreover, it can be easily integrated into Scikit-learn pipelines for data preprocessing and modeling.

For this demonstration, we will generate a random dataset, that we have called Xs, as a multi-modal dataset to simulate a multi-modal scenario:

import numpy as np
Xs = [np.random.random((10,5)) for i in range(3)] # or your multi-modal dataset

You can use any other complete or incomplete multi-modal dataset. Once you have your dataset ready, you can leverage the iMML library for a wide range of machine learning tasks, such as:

  • Decompose a multi-modal dataset using MOFA to capture joint information.
from imml.decomposition import MOFA
transformed_Xs = MOFA().fit_transform(Xs)
  • Cluster samples from a multi-modal dataset using NEMO to find hidden groups.
from imml.cluster import NEMO
labels = NEMO().fit_predict(Xs)
  • Simulate incomplete multi-modal datasets for evaluation and testing purposes using Amputer.
from imml.ampute import Amputer
transformed_Xs = Amputer(p=0.8).fit_transform(Xs)

Free software

iMML is free software; you can redistribute it and/or modify it under the terms of the BSD 3-Clause License.

Contribute

We welcome practitioners, researchers, and the open-source community to contribute to the iMML project, and in doing so, helping us extend and refine the library for the community. Such a community-wide effort will make iMML more versatile, sustainable, powerful, and accessible to the machine learning community across many domains.

Project roadmap

Our vision is to establish iMML as a leading and reliable library for multi-modal learning across research and applied settings. Therefore, our priorities include to broaden algorithmic coverage, improve performance and scalability, strengthen interoperability, and grow a healthy contributor community.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

imml-0.1.0.tar.gz (303.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

imml-0.1.0-py3-none-any.whl (408.7 kB view details)

Uploaded Python 3

File details

Details for the file imml-0.1.0.tar.gz.

File metadata

  • Download URL: imml-0.1.0.tar.gz
  • Upload date:
  • Size: 303.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for imml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 e14cadb3d472359f4cc117b4ee07b5b8b556a613313c02cacfdc0d9c99083f73
MD5 676b9ef994eb744cf45577c65a7052ec
BLAKE2b-256 7320f744600f907655809416ba47bcf3cf8c368d1f222293c89095455b6e3003

See more details on using hashes here.

File details

Details for the file imml-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: imml-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 408.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.8.22

File hashes

Hashes for imml-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 783284a9e2f0c5853bcd657e85e81fb4dff1e93f67fd107f3e6f7a95db25bf98
MD5 9c6281998ecfa7667f70e052424fba4f
BLAKE2b-256 17af32a43e4086185ffc361f97296a97e2c954451e60e3029b17d52e3fe01b0e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page