Python package for handling and analyzing Mutation Annotation Format (MAF) files.

These details have not been verified by PyPI

Project links

Project description

pymaftools

pymaftools is a Python package for handling and analyzing MAF (Mutation Annotation Format) files and multi-omics cancer genomics data. It provides classes for data manipulation, statistical analysis, machine learning, and visualization.

pymaftools overview

pymaftools provides a unified workflow for multi-omics cancer genomics — from data loading and filtering,
through statistical analysis and machine learning, to publication-ready visualization.

Multi-omics cohort structure

Multiple omics layers (SNV, CNV, expression, etc.) are integrated into a unified Cohort structure.
Each layer shares the same samples but may have different numbers of features.

Features

Core Data Structures

MAF — Load, parse, filter, and merge MAF files
PivotTable — Gene/feature x sample matrix with synchronized metadata, frequency calculation, statistical testing, and filtering
Cohort — Multi-omics container linking multiple PivotTables with shared sample metadata
CopyNumberVariationTable — Read GISTIC arm-level and gene-level results
ExpressionTable — Gene expression data with clustering support
SignatureTable — COSMIC mutational signature data
CancerCellFractionTable — Cancer cell fraction (CCF) data from PyClone
SmallVariationTable — Specialized PivotTable for SNV/INDEL data
SimilarityMatrix — Pairwise similarity analysis (Jaccard, cosine, etc.)

Filtering & Statistical Analysis

filter_by_freq — Filter features by mutation frequency
filter_by_variance — Filter by variance or median absolute deviation (MAD)
filter_by_statistical_test — Filter by statistical test (t-test, Mann-Whitney, Kruskal-Wallis, ANOVA) with FDR correction
Chi-squared / Fisher's exact test — Association testing between features and groups
TMB calculation — Tumor mutation burden per sample

Visualization

OncoPlot — Mutation landscape heatmaps with frequency bars, sample metadata, and legends
LollipopPlot — Protein mutation positions with domain annotation
PivotTablePlot — PCA, boxplots with statistical annotations, heatmaps (via pt.plot)
ModelPlot — Model performance visualizations
MethodsPlot — 3D methodology demonstration plots
ColorManager / FontManager — Customizable color and font management

Machine Learning

OmicsStackingModel — Multi-omics stacking classifier with feature importance
Model utilities — Evaluation, cross-validation, RFECV feature selection, importance heatmaps

Utilities

PCA_CCA — Dimensionality reduction utilities
Gene set tools — Read GMT files, fetch MSigDB gene sets
Gene info — NCBI gene ID lookup

Requirements

Python 3.10+ with the following dependencies:

pandas (>2.0), numpy, matplotlib, seaborn, scipy
networkx, scikit-learn, statsmodels, statannotations
requests, beautifulsoup4, tqdm, tables (HDF5)

All dependencies are automatically installed.

Installation

Using uv (recommended)

uv pip install pymaftools

Using pip

pip install pymaftools

From GitHub (latest development version)

uv pip install git+https://github.com/xu62u4u6/pymaftools.git
# or
pip install git+https://github.com/xu62u4u6/pymaftools.git

Usage

Getting Started

from pymaftools import *

# Load and merge MAF files
maf1 = MAF.read_maf("case1.maf")
maf2 = MAF.read_maf("case2.maf")
merged = MAF.merge_mafs([maf1, maf2])

# Filter to nonsynonymous mutations and convert to pivot table
pt = merged.filter_maf(MAF.nonsynonymous_types).to_pivot_table()

# Process pivot table
pt = (pt
    .add_freq()
    .sort_features(by="freq")
    .sort_samples_by_mutations()
    .calculate_TMB(capture_size=50)
)

# Create oncoplot
oncoplot = (OncoPlot(pt.head(50))
    .set_config(figsize=(15, 10), width_ratios=[20, 2, 2])
    .mutation_heatmap()
    .plot_freq()
    .plot_bar()
    .save("oncoplot.png", dpi=300)
)

Advanced Filtering

# Filter by variance (keep top 25% most variable features)
filtered = pt.filter_by_variance(quantile=0.75, method="var")

# Filter by statistical test with FDR correction
filtered = pt.filter_by_statistical_test(
    group_col="subtype", method="kruskal", alpha=0.05
)

Mutation Oncoplot with Sample Metadata

# Load and process data
LUAD_maf = MAF.read_csv("data/WES/LUAD_all_case_maf.csv")
LUSC_maf = MAF.read_csv("data/WES/LUSC_all_case_maf.csv")
all_case_maf = MAF.merge_mafs([LUAD_maf, LUSC_maf])

# Filter and convert to table
table = (all_case_maf
    .filter_maf(all_case_maf.nonsynonymous_types)
    .to_pivot_table()
)

# Load sample metadata
all_sample_metadata = pd.read_csv("data/all_sample_metadata.csv")
table.sample_metadata[["case_ID", "sample_type"]] = table.columns.to_series().str.rsplit("_", n=1).apply(pd.Series)
table.sample_metadata = pd.merge(
    table.sample_metadata.reset_index(), all_sample_metadata,
    left_on="case_ID", right_on="case_ID"
).set_index(["sample_ID"])

# Add group frequencies
table = table.add_freq(
    groups={"LUAD": table.subset(samples=table.sample_metadata.subtype == "LUAD"),
            "ASC": table.subset(samples=table.sample_metadata.subtype == "ASC"),
            "LUSC": table.subset(samples=table.sample_metadata.subtype == "LUSC")}
)

# Filter and sort
freq = 0.1
table = (table.filter_by_freq(freq)
    .sort_features(by="freq")
    .sort_samples_by_group(group_col="subtype",
                           group_order=["LUAD", "ASC", "LUSC"], top=10)
)

# Setup colors and create oncoplot
categorical_columns = ["subtype", "sex", "smoke"]
cmap_dict = {key: cm.get_cmap(key, alpha=0.7) for key in categorical_columns}

oncoplot = (OncoPlot(table)
    .set_config(categorical_columns=categorical_columns,
                figsize=(30, 14),
                width_ratios=[25, 3, 0, 2])
    .mutation_heatmap()
    .plot_freq(freq_columns=["freq", "LUAD_freq", "ASC_freq", "LUSC_freq"])
    .plot_bar()
    .plot_categorical_metadata(cmap_dict=cmap_dict)
    .plot_all_legends()
    .save("mutation_oncoplot.tiff", dpi=300)
)

Numeric CNV Oncoplot

categorical_columns = ["subtype", "sex", "smoke"]
cmap_dict = {key: cm.get_cmap(key, alpha=0.7) for key in categorical_columns}

oncoplot = (OncoPlot(CNV_gene_cosmic)
    .set_config(categorical_columns=categorical_columns,
                figsize=(30, 10),
                width_ratios=[25, 1, 0, 3])
    .numeric_heatmap(yticklabels=False, cmap="coolwarm", vmin=-2, vmax=2)
    .plot_bar()
    .plot_categorical_metadata(cmap_dict=cmap_dict)
    .plot_all_legends()
    .save("cnv_oncoplot.tiff", dpi=600)
)

Lollipop Plot

maf = MAF.read_csv(YOUR_MAF_PATH)
gene = "EGFR"
AA_length, mutations_data = maf.get_protein_info(gene)
domains_data, refseq_ID = MAF.get_domain_info(gene, AA_length)

plot = LollipopPlot(
    protein_name=gene,
    protein_length=AA_length,
    domains=domains_data,
    mutations=mutations_data
)
plot.plot()

Multi-Omics with Cohort

cohort = Cohort(sample_IDs=sample_list)
cohort.add_table("mutations", mutation_pt)
cohort.add_table("cnv", cnv_table)
cohort.add_table("expression", expr_table)
cohort.add_sample_metadata(clinical_df)

# Save/load
cohort.to_sqlite("cohort.db")
cohort = Cohort.read_sqlite("cohort.db")

Machine Learning

from pymaftools import OmicsStackingModel
from pymaftools.model.modelUtils import evaluate_model, cross_validate_importance

model = OmicsStackingModel()
model.fit(cohort, labels)
preds = model.predict(cohort)
importance = model.get_omics_feature_importance()

metrics = evaluate_model(model, X_test, y_test)
results = cross_validate_importance(model, X, y, n_seeds=10)

FAQ

1. How to adjust font sizes in OncoPlot?

oncoplot = OncoPlot(pivot_table, ytick_fontsize=12)
oncoplot.mutation_heatmap(ytick_fontsize=10)
oncoplot.numeric_heatmap(ytick_fontsize=8)
oncoplot.plot_freq(annot_fontsize=10)

2. How to customize color mappings?

from pymaftools import ColorManager

color_manager = ColorManager()
color_manager.register_cmap("custom_mutations", {
    "Missense_Mutation": "#FF6B6B",
    "Nonsense_Mutation": "#4ECDC4",
    "Frame_Shift_Del": "#45B7D1"
})

mutation_cmap = color_manager.get_cmap("custom_mutations")
oncoplot.mutation_heatmap(cmap_dict=mutation_cmap)

3. How to save and load analysis results?

# SQLite format (PivotTable and Cohort)
pivot_table.to_sqlite("results.db")
loaded = PivotTable.read_sqlite("results.db")

cohort.to_sqlite("cohort.db")
loaded = Cohort.read_sqlite("cohort.db")

# Save figures
oncoplot.save("oncoplot.png", dpi=300)

Development and Testing

# Install with test dependencies
pip install -e .[test]

# Run tests
make test              # All tests
make test-core         # Core functionality
make test-plot         # Plotting tests
make test-fast         # Exclude slow tests
make test-coverage     # With coverage report

Test Categories

Core tests (tests/core/): PivotTable, MAF, Cohort
Plot tests (tests/plot/): All visualizations
Model tests (tests/model/): ML components
Integration tests (@pytest.mark.integration): End-to-end workflows

CI

Tests run on GitHub Actions for Python 3.10-3.12 (stable) and 3.13-3.14 (experimental).

License

MIT License - see the LICENSE file for details.

Author

xu62u4u6

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.4.1

May 31, 2026

0.4.0

Mar 6, 2026

0.3.0

Oct 15, 2025

0.2.2

Apr 16, 2025

0.2.1 yanked

Apr 16, 2025

0.2 yanked

Apr 15, 2025

0.1

Nov 29, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pymaftools-0.4.1.tar.gz (4.0 MB view details)

Uploaded May 31, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pymaftools-0.4.1-py3-none-any.whl (2.4 MB view details)

Uploaded May 31, 2026 Python 3

File details

Details for the file pymaftools-0.4.1.tar.gz.

File metadata

Download URL: pymaftools-0.4.1.tar.gz
Upload date: May 31, 2026
Size: 4.0 MB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for pymaftools-0.4.1.tar.gz
Algorithm	Hash digest
SHA256	`fb08c78c292e33b92f40211769baea3baf8f8fce3be41b2622090174687e0816`
MD5	`c2ceaafd7f4e7bbd9b5686c6e1934041`
BLAKE2b-256	`1184bdc68cd735f55a58dfec7d8c8e2b15de8817489a36d0ec20cd574ef33e21`

See more details on using hashes here.

File details

Details for the file pymaftools-0.4.1-py3-none-any.whl.

File metadata

Download URL: pymaftools-0.4.1-py3-none-any.whl
Upload date: May 31, 2026
Size: 2.4 MB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/5.1.1 CPython/3.12.3

File hashes

Hashes for pymaftools-0.4.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`98c29cb6e5828751a846452d7368dc2d78ed4925c421073d1db2961cfd202661`
MD5	`6f07bfb0ba0319835d77db23a64602c6`
BLAKE2b-256	`df1ded489b677a7fb81250d1a6d4cc392b45a363b06be6b618aeb796eb02f4d3`

See more details on using hashes here.

pymaftools 0.4.1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Project description

pymaftools

Features

Core Data Structures

Filtering & Statistical Analysis

Visualization

Machine Learning

Utilities

Requirements

Installation

Using uv (recommended)

Using pip

From GitHub (latest development version)

Usage

Getting Started

Advanced Filtering

Mutation Oncoplot with Sample Metadata

Numeric CNV Oncoplot

Lollipop Plot

Multi-Omics with Cohort

Machine Learning

FAQ

1. How to adjust font sizes in OncoPlot?

2. How to customize color mappings?

3. How to save and load analysis results?

Development and Testing

Test Categories

CI

License

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes