Use fast FFT-based mutual information screening for large datasets. Works well on MRI brain imaging data. Developed by Kai Yang, [GPG Public key Fingerprint: B9F863A56220DBD56B91C3E835022A1A5941D810](https://keys.openpgp.org/vks/v1/by-fingerprint/B9F863A56220DBD56B91C3E835022A1A5941D810)
Project description
fastHDMI - Fast High-Dimensional Mutual Information Estimation
Kai Yang
Contact: <kai.yang2 "at" mail.mcgill.ca>
GPG Public Key Fingerprint: B9F863A56220DBD56B91C3E835022A1A5941D810
Fast mutual information estimation for high-dimensional data. See the paper: fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data.
Installation
pip install fastHDMI
Usage
Basic MI Estimation
import fastHDMI
import numpy as np
# Generate sample data
x = np.random.randn(1000)
y = x + 0.5 * np.random.randn(1000)
# Estimate MI between continuous variables
mi = fastHDMI.MI_continuous_continuous(x, y, bw_multiplier=1.0)
print(f"MI: {mi}")
# MI between binary and continuous
binary_y = (y > 0).astype(int)
mi_binary = fastHDMI.MI_binary_continuous(binary_y, x, bw_multiplier=1.0)
Feature Screening
# Screen features against continuous outcome
X = np.random.randn(100, 50) # 100 samples, 50 features
y = X[:, 0] + 0.5 * X[:, 1] + np.random.randn(100) * 0.1
# Single-threaded screening
mi_scores = fastHDMI.continuous_screening_array(X, y)
# Parallel screening (faster for many features)
mi_scores_parallel = fastHDMI.continuous_screening_array_parallel(X, y, core_num=4)
# Find top features
top_features = np.argsort(mi_scores)[-10:]
print(f"Top 10 features: {top_features}")
CSV File Screening
# Screen features from CSV file (outcome in first column)
mi_scores = fastHDMI.continuous_screening_csv_parallel(
"data.csv",
core_num=4
)
# Using sklearn MI estimation
mi_scores_sk = fastHDMI.continuous_skMI_screening_csv_parallel(
"data.csv",
n_neighbors=3,
core_num=4
)
# Pearson correlation screening
correlations = fastHDMI.Pearson_screening_csv_parallel(
"data.csv",
core_num=4
)
DataFrame Screening
import pandas as pd
# Create DataFrame with outcome as first column
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(50)])
df.insert(0, 'outcome', y)
# Screen using DataFrame
mi_scores = fastHDMI.continuous_screening_dataframe(df)
Package Information
ABIDE Data Analysis
- Data: (pre-processed) ABIDE data
- Notebook: /paper/ABIDE_data_analysis/ABIDE_analysis.ipynb - generates scripts for fastHDMI analysis
- Execution: Run scripts on server (e.g., Compute Canada), then rerun notebook with results (.npy files) to generate plots
Computational Resources
- Resource logs:
seff-[jobID].outfiles show job resource usage - Compute Canada docs: docs.alliancecan.ca/wiki/Running_jobs
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file fasthdmi-1.25.26.tar.gz.
File metadata
- Download URL: fasthdmi-1.25.26.tar.gz
- Upload date:
- Size: 239.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c6f3afbc1e3b1ac8e87e1a9f4afef2c11d73bde3e97f68ac55ccf399713562b5
|
|
| MD5 |
d2f95b9bb18bd71dc45320354f5fab35
|
|
| BLAKE2b-256 |
46fe655cdc69ae67f499dc385f9439f49841ad7a09634707bce8b11f86c5c33a
|
File details
Details for the file fasthdmi-1.25.26-cp312-cp312-macosx_11_0_arm64.whl.
File metadata
- Download URL: fasthdmi-1.25.26-cp312-cp312-macosx_11_0_arm64.whl
- Upload date:
- Size: 345.7 kB
- Tags: CPython 3.12, macOS 11.0+ ARM64
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
ea0dd52ea0da614cb6835e49138fd6a56e145e85cb87d6d40fc35a2a2fd21adf
|
|
| MD5 |
ffc6339f35f69ef65482d0fa37cf10d7
|
|
| BLAKE2b-256 |
46ef302f962afb4e6afbb14c06f86a5a165df91886127e29786b8e2484cd996f
|