Skip to main content

Use fast FFT-based mutual information screening for large datasets. Works well on MRI brain imaging data. Developed by Kai Yang, [GPG Public key Fingerprint: B9F863A56220DBD56B91C3E835022A1A5941D810](https://keys.openpgp.org/vks/v1/by-fingerprint/B9F863A56220DBD56B91C3E835022A1A5941D810)

Project description

fastHDMI - Fast High-Dimensional Mutual Information Estimation

Kai Yang

Contact: <kai.yang2 "at" mail.mcgill.ca>

GPG Public Key Fingerprint: B9F863A56220DBD56B91C3E835022A1A5941D810

Fast mutual information estimation for high-dimensional data. See the paper: fastHDMI: Fast Mutual Information Estimation for High-Dimensional Data.

Installation

pip install fastHDMI

Usage

Basic MI Estimation

import fastHDMI
import numpy as np

# Generate sample data
x = np.random.randn(1000)
y = x + 0.5 * np.random.randn(1000)

# Estimate MI between continuous variables
mi = fastHDMI.MI_continuous_continuous(x, y, bw_multiplier=1.0)
print(f"MI: {mi}")

# MI between binary and continuous
binary_y = (y > 0).astype(int)
mi_binary = fastHDMI.MI_binary_continuous(binary_y, x, bw_multiplier=1.0)

Feature Screening

# Screen features against continuous outcome
X = np.random.randn(100, 50)  # 100 samples, 50 features
y = X[:, 0] + 0.5 * X[:, 1] + np.random.randn(100) * 0.1

# Single-threaded screening
mi_scores = fastHDMI.continuous_screening_array(X, y)

# Parallel screening (faster for many features)
mi_scores_parallel = fastHDMI.continuous_screening_array_parallel(X, y, core_num=4)

# Find top features
top_features = np.argsort(mi_scores)[-10:]
print(f"Top 10 features: {top_features}")

CSV File Screening

# Screen features from CSV file (outcome in first column)
mi_scores = fastHDMI.continuous_screening_csv_parallel(
    "data.csv",
    core_num=4
)

# Using sklearn MI estimation
mi_scores_sk = fastHDMI.continuous_skMI_screening_csv_parallel(
    "data.csv", 
    n_neighbors=3,
    core_num=4
)

# Pearson correlation screening
correlations = fastHDMI.Pearson_screening_csv_parallel(
    "data.csv",
    core_num=4
)

DataFrame Screening

import pandas as pd

# Create DataFrame with outcome as first column
df = pd.DataFrame(X, columns=[f'feature_{i}' for i in range(50)])
df.insert(0, 'outcome', y)

# Screen using DataFrame
mi_scores = fastHDMI.continuous_screening_dataframe(df)

Package Information

ABIDE Data Analysis

Computational Resources

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

fasthdmi-1.25.26.tar.gz (239.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

fasthdmi-1.25.26-cp312-cp312-macosx_11_0_arm64.whl (345.7 kB view details)

Uploaded CPython 3.12macOS 11.0+ ARM64

File details

Details for the file fasthdmi-1.25.26.tar.gz.

File metadata

  • Download URL: fasthdmi-1.25.26.tar.gz
  • Upload date:
  • Size: 239.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for fasthdmi-1.25.26.tar.gz
Algorithm Hash digest
SHA256 c6f3afbc1e3b1ac8e87e1a9f4afef2c11d73bde3e97f68ac55ccf399713562b5
MD5 d2f95b9bb18bd71dc45320354f5fab35
BLAKE2b-256 46fe655cdc69ae67f499dc385f9439f49841ad7a09634707bce8b11f86c5c33a

See more details on using hashes here.

File details

Details for the file fasthdmi-1.25.26-cp312-cp312-macosx_11_0_arm64.whl.

File metadata

File hashes

Hashes for fasthdmi-1.25.26-cp312-cp312-macosx_11_0_arm64.whl
Algorithm Hash digest
SHA256 ea0dd52ea0da614cb6835e49138fd6a56e145e85cb87d6d40fc35a2a2fd21adf
MD5 ffc6339f35f69ef65482d0fa37cf10d7
BLAKE2b-256 46ef302f962afb4e6afbb14c06f86a5a165df91886127e29786b8e2484cd996f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page