Skip to main content

Pure-Python port of Bioconductor Mfuzz — soft clustering of time-series gene-expression data by fuzzy c-means.

Project description

py-mfuzz

Pure-Python port of the Bioconductor package Mfuzz — soft clustering of time-series gene-expression data by fuzzy c-means (Futschik & Carlisle, J. Bioinform. Comput. Biol. 2005; Kumar & Futschik, Bioinformation 2007).

pymfuzz reproduces the full computational and visualisation API of Mfuzz with no R dependency — only numpy, scipy, pandas, matplotlib and anndata. The fuzzy c-means core is a faithful numpy port of e1071's cmeans C routine, the same algorithm R's mfuzz() wraps.

Why

Mfuzz operates on a Bioconductor ExpressionSet. pymfuzz instead accepts a plain genes × timepoints numpy.ndarray, pandas.DataFrame or anndata.AnnData, and returns numpy / pandas / dataclasses — drop-in for Python single-cell / bulk pipelines.

Install

pip install pymfuzz

From source:

pip install -e .

Quick start

import pymfuzz as mf

# 1. load a genes x timepoints time-course (Mfuzz's data(yeast))
data = mf.load_yeast()

# 2. preprocessing
data = mf.filter_NA(data, thres=0.25)   # drop genes with many NAs
data = mf.fill_NA(data, mode="knn")     # impute remaining NAs
data = mf.standardise(data)             # per-gene z-score

# 3. estimate the fuzzifier and cluster
m  = mf.mestimate(data)                 # Schwammle & Jensen (2010)
cl = mf.mfuzz(data, c=16, m=m, random_state=0)

# 4. extract core genes and plot
cores = mf.acore(data, cl, min_acore=0.5)
fig   = mf.mfuzz_plot(data, cl, mfrow=(4, 4))

API

Group Functions
Data structures ExpressionMatrix, as_expression_matrix, FClust, KMeansResult, AcoreCluster, PartcoefResult
Preprocessing standardise, standardise2, filter_NA, fill_NA, filter_std
Clustering mestimate, mfuzz, cmeans
Diagnostics acore, Dmin, cselection, partcoef, overlap
Hard clustering kmeans2
Plotting mfuzz_plot, mfuzz_plot2, kmeans2_plot, overlap_plot
Datasets load_yeast, make_synthetic_timecourse

Mapping to the R package

Mfuzz (R) pymfuzz (Python)
standardise / standardise2 standardise / standardise2
mestimate mestimate
mfuzz (e1071::cmeans) mfuzz / cmeans
acore acore
Dmin, cselection, partcoef Dmin, cselection, partcoef
filter.NA, fill.NA, filter.std filter_NA, fill_NA, filter_std
overlap, overlap.plot overlap, overlap_plot
mfuzz.plot, mfuzz.plot2 mfuzz_plot, mfuzz_plot2
kmeans2, kmeans2.plot kmeans2, kmeans2_plot

R parity

Validated against Mfuzz 2.66.0 / e1071 1.7.17 on the bundled yeast cell-cycle time-course (data(yeast)):

Routine Agreement vs R
standardise bit-exact (rel-diff ≈ 1e-15)
mestimate bit-exact (rel-diff ≈ 1e-15)
fill_NA(knn) bit-exact (max abs diff ≈ 1e-15)
mfuzz membership Pearson r = 1.0, centres r = 1.0, hard-assignment ARI ≈ 0.99
Dmin curve Pearson r = 1.0

standardise, mestimate and fill_NA are deterministic and match R to machine precision. Fuzzy c-means uses random initialisation, so a bit-exact match across RNGs is not expected; instead clustering agreement is asserted (Hungarian-matched membership correlation and Adjusted Rand Index). Because the yeast fuzzifier (m ≈ 1.15) is close to 1, fuzzy c-means is sharp and slow to converge — both sides take the best of several converged restarts so they reach the same optimum.

Run the parity tests (needs the CMAP R environment):

python -m pytest tests/ -q

License

GPL-2 — the same license as the original Bioconductor Mfuzz package. See LICENSE.

Citation

If you use pymfuzz, please cite the original Mfuzz papers:

  • L. Kumar, M. Futschik (2007). Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2(1):5–7.
  • M. Futschik, B. Carlisle (2005). Noise-robust soft clustering of gene expression time-course data. J. Bioinform. Comput. Biol. 3(4):965–988.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymfuzz-0.1.2.tar.gz (47.9 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymfuzz-0.1.2-py3-none-any.whl (42.4 kB view details)

Uploaded Python 3

File details

Details for the file pymfuzz-0.1.2.tar.gz.

File metadata

  • Download URL: pymfuzz-0.1.2.tar.gz
  • Upload date:
  • Size: 47.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymfuzz-0.1.2.tar.gz
Algorithm Hash digest
SHA256 581dbc9032514ab7ffd18ae7cc8515bb4592c9ca33707e83cc2b5113f7929d89
MD5 404c418e9fda85134b51ffb2d2cb6f2e
BLAKE2b-256 de35c975e07dd11752a4d9a926b47b5ff39603fbd5318c77456cf2e7d6c8777a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymfuzz-0.1.2.tar.gz:

Publisher: publish.yml on omicverse/py-mfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymfuzz-0.1.2-py3-none-any.whl.

File metadata

  • Download URL: pymfuzz-0.1.2-py3-none-any.whl
  • Upload date:
  • Size: 42.4 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymfuzz-0.1.2-py3-none-any.whl
Algorithm Hash digest
SHA256 21af663d38c43254e65d3290ff9b12070dca4e4df83935bb89051dbd02b4a43c
MD5 f63b99e5bedfd7e98f335e05af847817
BLAKE2b-256 f29da5dfb0d30808c52c637f53239d98eea7205098bed78a69faf03a037e14f7

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymfuzz-0.1.2-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-mfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page