Skip to main content

Pure-Python port of Bioconductor Mfuzz — soft clustering of time-series gene-expression data by fuzzy c-means.

Project description

py-mfuzz

Pure-Python port of the Bioconductor package Mfuzz — soft clustering of time-series gene-expression data by fuzzy c-means (Futschik & Carlisle, J. Bioinform. Comput. Biol. 2005; Kumar & Futschik, Bioinformation 2007).

pymfuzz reproduces the full computational and visualisation API of Mfuzz with no R dependency — only numpy, scipy, pandas, matplotlib and anndata. The fuzzy c-means core is a faithful numpy port of e1071's cmeans C routine, the same algorithm R's mfuzz() wraps.

Why

Mfuzz operates on a Bioconductor ExpressionSet. pymfuzz instead accepts a plain genes × timepoints numpy.ndarray, pandas.DataFrame or anndata.AnnData, and returns numpy / pandas / dataclasses — drop-in for Python single-cell / bulk pipelines.

Install

pip install pymfuzz

From source:

pip install -e .

Quick start

import pymfuzz as mf

# 1. load a genes x timepoints time-course (Mfuzz's data(yeast))
data = mf.load_yeast()

# 2. preprocessing
data = mf.filter_NA(data, thres=0.25)   # drop genes with many NAs
data = mf.fill_NA(data, mode="knn")     # impute remaining NAs
data = mf.standardise(data)             # per-gene z-score

# 3. estimate the fuzzifier and cluster
m  = mf.mestimate(data)                 # Schwammle & Jensen (2010)
cl = mf.mfuzz(data, c=16, m=m, random_state=0)

# 4. extract core genes and plot
cores = mf.acore(data, cl, min_acore=0.5)
fig   = mf.mfuzz_plot(data, cl, mfrow=(4, 4))

API

Group Functions
Data structures ExpressionMatrix, as_expression_matrix, FClust, KMeansResult, AcoreCluster, PartcoefResult
Preprocessing standardise, standardise2, filter_NA, fill_NA, filter_std
Clustering mestimate, mfuzz, cmeans
Diagnostics acore, Dmin, cselection, partcoef, overlap
Hard clustering kmeans2
Plotting mfuzz_plot, mfuzz_plot2, kmeans2_plot, overlap_plot
Datasets load_yeast, make_synthetic_timecourse

Mapping to the R package

Mfuzz (R) pymfuzz (Python)
standardise / standardise2 standardise / standardise2
mestimate mestimate
mfuzz (e1071::cmeans) mfuzz / cmeans
acore acore
Dmin, cselection, partcoef Dmin, cselection, partcoef
filter.NA, fill.NA, filter.std filter_NA, fill_NA, filter_std
overlap, overlap.plot overlap, overlap_plot
mfuzz.plot, mfuzz.plot2 mfuzz_plot, mfuzz_plot2
kmeans2, kmeans2.plot kmeans2, kmeans2_plot

R parity

Validated against Mfuzz 2.66.0 / e1071 1.7.17 on the bundled yeast cell-cycle time-course (data(yeast)):

Routine Agreement vs R
standardise bit-exact (rel-diff ≈ 1e-15)
mestimate bit-exact (rel-diff ≈ 1e-15)
fill_NA(knn) bit-exact (max abs diff ≈ 1e-15)
mfuzz membership Pearson r = 1.0, centres r = 1.0, hard-assignment ARI ≈ 0.99
Dmin curve Pearson r = 1.0

standardise, mestimate and fill_NA are deterministic and match R to machine precision. Fuzzy c-means uses random initialisation, so a bit-exact match across RNGs is not expected; instead clustering agreement is asserted (Hungarian-matched membership correlation and Adjusted Rand Index). Because the yeast fuzzifier (m ≈ 1.15) is close to 1, fuzzy c-means is sharp and slow to converge — both sides take the best of several converged restarts so they reach the same optimum.

Run the parity tests (needs the CMAP R environment):

python -m pytest tests/ -q

License

GPL-2 — the same license as the original Bioconductor Mfuzz package. See LICENSE.

Citation

If you use pymfuzz, please cite the original Mfuzz papers:

  • L. Kumar, M. Futschik (2007). Mfuzz: a software package for soft clustering of microarray data. Bioinformation 2(1):5–7.
  • M. Futschik, B. Carlisle (2005). Noise-robust soft clustering of gene expression time-course data. J. Bioinform. Comput. Biol. 3(4):965–988.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymfuzz-0.1.1.tar.gz (47.7 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pymfuzz-0.1.1-py3-none-any.whl (42.2 kB view details)

Uploaded Python 3

File details

Details for the file pymfuzz-0.1.1.tar.gz.

File metadata

  • Download URL: pymfuzz-0.1.1.tar.gz
  • Upload date:
  • Size: 47.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymfuzz-0.1.1.tar.gz
Algorithm Hash digest
SHA256 3dd5165d7c0b4fd2f94c3b4b935420409bd34de618b7fbd6f72141169d65d6e2
MD5 a053d64c807b65f7f9445432df11f841
BLAKE2b-256 0af4b11c85f072d342b8903c55641c30432a1f7bcd61b2661a3219459e6fe698

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymfuzz-0.1.1.tar.gz:

Publisher: publish.yml on omicverse/py-mfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file pymfuzz-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pymfuzz-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 42.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.12

File hashes

Hashes for pymfuzz-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b405862b91adb51674522eb7e5f744db83e261e256c166b706d97cf6c80eb3b5
MD5 f65cba5946379d0e93e6a1328a4d5430
BLAKE2b-256 85ee779d144ef31b126f42a97bec4910d67b861b17a73696f7560829d15bce7a

See more details on using hashes here.

Provenance

The following attestation bundles were made for pymfuzz-0.1.1-py3-none-any.whl:

Publisher: publish.yml on omicverse/py-mfuzz

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page