A python package for multi-modal learning with incomplete data
Project description
Overview
iMML is a Python package that provides a robust tool-set for integrating, processing, and analyzing incomplete multi-modal datasets to support a wide range of machine learning tasks. Starting with a dataset containing N samples with K modalities, iMML effectively handles missing data for classification, clustering, data retrieval, imputation and amputation, feature selection, feature extraction and data exploration, hence enabling efficient analysis of partially observed samples.
Overview of iMML for multi-modal learning with incomplete data.
Background
Multi-modal learning, where diverse data types are integrated and analyzed together, has emerged as a critical field in artificial intelligence. Multi-modal machine learning models that effectively integrate multiple data modalities generally outperform their uni-modal counterparts by leveraging more comprehensive and complementary information. However, most algorithms in this field assume fully observed data, an assumption that is often unrealistic in real-world scenarios.
Motivation
Learning from incomplete multi-modal data has seen an important growth last years. Despite this progress, several limitations still persist. The landscape of available methods is fragmented, largely due to the diversity of use cases and data modalities, which complicates both their application and benchmarking. Systematic use and comparison of the current methods are often hindered by practical challenges, such as incompatible input data formats and conflicting software dependencies. As a result, researchers and practitioners frequently face challenges in choosing a practical method and invest considerable efforts into reconciling codebases, rather than addressing the core scientific questions. This suggests that the community currently lacks robust and standardized tools to effectively handle incomplete multi-modal data.
Key features
To address this gap, we have developed iMML, a Python package designed for multi-modal learning with incomplete data. The key features of this package are:
- Comprehensive toolkit: iMML offers a broad set of tools for integrating, processing, and analyzing incomplete multi-modal datasets implemented as a single, user-friendly interface to facilitate adoption by a wide community of users. The package includes extensive technical testing to ensure robustness, and thorough documentation enables end-users to apply its functionality effectively.
- Accessible: iMML makes the tools readily available to the Python community, simplifying their usage, comparison, and benchmarking, and thereby addresses the current lack of resources and standardized methods for handling incomplete multi-modal data.
- Extensible: iMML provides a common framework where researchers can contribute and integrate new approaches, serving as a community platform for hosting new algorithms and methods.
Installation
Run the following command to install the most recent release of iMML using pip:
pip install imml
Or if you prefer uv, use:
uv pip install imml
Some features of iMML rely on optional dependencies. To enable these additional features, ensure you install the required packages as described in our documentation: https://imml.readthedocs.io/stable/main/installation.html.
Usage
This package provides a user-friendly interface to apply these algorithms to user-provided data. iMML was designed to be compatible with widely-used machine learning and data analysis tools, such as Pandas, NumPy, Scikit-learn, and Lightning AI, hence allowing researchers to apply machine learning models with minimal programming effort. Moreover, it can be easily integrated into Scikit-learn pipelines for data preprocessing and modeling.
For this demonstration, we will generate a random dataset, that we have called Xs, as a multi-modal dataset
to simulate a multi-modal scenario:
import numpy as np
Xs = [np.random.random((10,5)) for i in range(3)] # or your multi-modal dataset
You can use any other complete or incomplete multi-modal dataset. Once you have your dataset ready, you can leverage the iMML library for a wide range of machine learning tasks, such as:
- Decompose a multi-modal dataset using
MOFAto capture joint information.
from imml.decomposition import MOFA
transformed_Xs = MOFA().fit_transform(Xs)
- Cluster samples from a multi-modal dataset using
NEMOto find hidden groups.
from imml.cluster import NEMO
labels = NEMO().fit_predict(Xs)
- Simulate incomplete multi-modal datasets for evaluation and testing purposes using
Amputer.
from imml.ampute import Amputer
transformed_Xs = Amputer(p=0.8).fit_transform(Xs)
Free software
iMML is free software; you can redistribute it and/or modify it under the terms of the BSD 3-Clause License.
Contribute
We welcome practitioners, researchers, and the open-source community to contribute to the iMML project, and in doing so, helping us extend and refine the library for the community. Such a community-wide effort will make iMML more versatile, sustainable, powerful, and accessible to the machine learning community across many domains.
Project roadmap
Our vision is to establish iMML as a leading and reliable library for multi-modal learning across research and applied settings. Therefore, our priorities include to broaden algorithmic coverage, improve performance and scalability, strengthen interoperability, and grow a healthy contributor community.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file imml-0.1.0.tar.gz.
File metadata
- Download URL: imml-0.1.0.tar.gz
- Upload date:
- Size: 303.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e14cadb3d472359f4cc117b4ee07b5b8b556a613313c02cacfdc0d9c99083f73
|
|
| MD5 |
676b9ef994eb744cf45577c65a7052ec
|
|
| BLAKE2b-256 |
7320f744600f907655809416ba47bcf3cf8c368d1f222293c89095455b6e3003
|
File details
Details for the file imml-0.1.0-py3-none-any.whl.
File metadata
- Download URL: imml-0.1.0-py3-none-any.whl
- Upload date:
- Size: 408.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.8.22
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
783284a9e2f0c5853bcd657e85e81fb4dff1e93f67fd107f3e6f7a95db25bf98
|
|
| MD5 |
9c6281998ecfa7667f70e052424fba4f
|
|
| BLAKE2b-256 |
17af32a43e4086185ffc361f97296a97e2c954451e60e3029b17d52e3fe01b0e
|