Skip to main content

TSUMUGI: Phenotype-driven gene network identifier

Project description

Tsumugi Logo

License DOI Contact

Translations: 日本語 | 한국어 | 简体中文 | 繁體中文 | हिन्दी | Bahasa Indonesia | Tiếng Việt | Español | Français | Deutsch | Português

TSUMUGI (Trait-driven Surveillance for Mutation-based Gene module Identification) is a web tool that uses knockout (KO) mouse phenotype data from the International Mouse Phenotyping Consortium (IMPC) to extract and visualize gene modules based on phenotypic similarity.

TSUMUGI (紡ぎ) comes from the idea of “weaving together gene groups that form phenotypes.”

This web app is available to everyone online👇️

🔗https://larc-tsukuba.github.io/tsumugi/

📖 How to Use TSUMUGI

TSUMUGI supports three kinds of input.

1. Phenotype

Enter a phenotype of interest to search for genes whose KO mice have similar overall phenotype profiles.
Phenotype names follow Mammalian Phenotype Ontology (MPO).

👉 Phenotype list

2. Gene

Specify one gene to search for other genes whose KO mice show similar phenotypes.
Gene symbols follow MGI.

👉 Gene list

3. Gene List

Paste multiple genes (one per line). This extracts phenotypically similar genes among the genes in the list.

[!CAUTION]
If no similar genes are found: No similar phenotypes were found among the entered genes.
If more than 200 similar genes are found: Too many genes submitted. Please limit the number to 200 or fewer.

📥 Download raw data

TSUMUGI publishes gzipped JSONL files.

genewise_phenotype_annotations.jsonl.gz

  • Gene symbol (e.g., "1110059G10Rik")
  • Marker accession ID (e.g., "MGI:1913452")
  • Phenotype term name/ID (e.g., "fused joints", "MP:0000137")
  • Effect size (e.g., 0.0, 1.324)
  • Significance flag (true/false)
  • Zygosity ("Homo", "Hetero", "Hemi")
  • Life stage ("Embryo", "Early", "Interval", "Late")
  • Sexual dimorphism ("None", "Male", "Female")
  • Disease annotation (e.g., [] or "Premature Ovarian Failure 18")

Example:

{"life_stage": "Early", "marker_symbol": "1110059G10Rik", "marker_accession_id": "MGI:1913452", "effect_size": 0.0, "mp_term_name": "fused joints", "disease_annotation": [], "significant": false, "zygosity": "Homo", "sexual_dimorphism": "None", "mp_term_id": "MP:0000137"}

pairwise_similarity_annotations.jsonl.gz

  • Gene pair (gene1_symbol, gene2_symbol)
  • phenotype_shared_annotations (per-phenotype metadata: life stage, zygosity, sexual dimorphism)
  • phenotype_similarity_score (Resnik-based Phenodigm score, 0–100)

Example:

{"gene1_symbol": "1110059G10Rik", "gene2_symbol": "Cog6", "phenotype_shared_annotations": {"vertebral transformation": {"zygosity": "Homo", "life_stage": "Early", "sexual_dimorphism": "Male"}}, "phenotype_similarity_score": 42}

🌐 Network

The page transitions and draws the network automatically.

[!IMPORTANT]
Gene pairs with 3 or more shared abnormal phenotypes and phenotypic similarity > 0.0 are visualized.

Network panel

Nodes represent genes. Click to see the list of abnormal phenotypes observed in that KO mouse; drag to rearrange positions.
Edges show shared phenotypes; click to view details.

Control panel

Adjust network display from the left panel.

Filter by phenotypic similarity

Phenotypes similarity slider thresholds edges by Resnik→Phenodigm score.

For how we compute similarity, see: 👉 🔍 How We Calculate Phenotypically Similar Genes

Filter by phenotype severity

Phenotype severity slider filters nodes by effect size (severity in KO mice). Higher values mean stronger impact.

Hidden for binary phenotypes (e.g., abnormal embryo development; binary list here) or single-gene input.

Specify genotype

Choose the genotype in which phenotypes appear:

  • Homo: homozygous
  • Hetero: heterozygous
  • Hemi: hemizygous

Specify sex

Extract sex-specific phenotypes:

  • Female
  • Male

Specify life stage

Filter by life stage in which phenotypes appear:

  • Embryo
  • Early (0–16 weeks)
  • Interval (17–48 weeks)
  • Late (49+ weeks)

Markup panel

Highlight: Human Disease

Highlight genes linked to human disease (IMPC Disease Models Portal data).

Search: Specific Gene

Search gene names within the network.

Layout & Display

Adjust layout, font size, edge width, and node repulsion (Cose layout).

Export

Export the current network as PNG/CSV/GraphML.
CSV includes connected-component (module) IDs and phenotype lists per gene; GraphML is Cytoscape-compatible.

🛠 Command-Line Edition

This release adds a CLI so you can download the latest IMPC updates yourself, rerun TSUMUGI, and apply finer filters and output options.

  • Recompute with IMPC statistical-results-ALL.csv.gz (optionally mp.obo, impc_phenodigm.csv).
  • Filter by presence/absence of MP terms.
  • Filter by gene list (comma-separated or text file).
  • Outputs: GraphML (tsumugi build-graphml), offline webapp bundle (tsumugi build-webapp).

Available commands

  • tsumugi run: Recompute the network from IMPC data
  • tsumugi mp --include/--exclude: Filter pairs that contain / do not show an MP term
  • tsumugi n-phenos --pairwise/--genewise (--min/--max): Filter by phenotype counts (pairwise or per gene)
  • tsumugi genes --keep/--drop: Keep/drop by gene list (comma-separated or text file)
  • tsumugi life-stage --keep/--drop: Filter by life stage (Embryo/Early/Interval/Late)
  • tsumugi sex --keep/--drop: Filter by sex (Male/Female/None)
  • tsumugi zygosity --keep/--drop: Filter by zygosity (Homo/Hetero/Hemi)
  • tsumugi build-graphml: Generate GraphML (Cytoscape, etc.)
  • tsumugi build-webapp: Generate TSUMUGI webapp assets (local HTML/CSS/JS)

Installation

BioConda:

conda install -c conda-forge -c bioconda tsumugi

PyPI:

pip install tsumugi

You are ready if tsumugi --version prints the version.

Common usage (per command)

1. Recompute from IMPC data (tsumugi run)

If --mp_obo is omitted, TSUMUGI uses the bundled data-version: releases/2025-08-27/mp.obo.
If --impc_phenodigm is omitted, it uses the file fetched on 2025-10-01 from the IMPC Disease Models Portal.

tsumugi run \
  --output_dir ./tsumugi-output \
  --statistical_results ./statistical-results-ALL.csv.gz \
  --threads 8

Outputs: ./tsumugi-output contains genewise annotations (genewise_phenotype_annotations.jsonl.gz), pairwise similarity data (pairwise_similarity_annotations.jsonl.gz), and visualization assets (TSUMUGI-webapp).

[!IMPORTANT]
The TSUMUGI-webapp directory includes OS-specific launch scripts; double-click to open the local web app:

  • Windows: open_webapp_windows.bat
  • macOS: open_webapp_mac.command
  • Linux: open_webapp_linux.sh

2. Filter by MP term (tsumugi mp --include/--exclude)

# Keep pairs that include MP:0001146
tsumugi mp --include MP:0001146 \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_filtered.jsonl

# Exclude pairs whose measured genes did not show MP:0001146
tsumugi mp --exclude MP:0001146 \
  --genewise genewise_phenotype_annotations.jsonl.gz \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_filtered.jsonl

3. Filter by phenotype counts (tsumugi n-phenos)

  • Shared phenotypes per pair:
tsumugi n-phenos --pairwise --min 3 --max 20 \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_min3_max20.jsonl
  • Phenotypes per gene (genewise required):
tsumugi n-phenos --genewise --min 5 --max 50 \
  --genewise genewise_phenotype_annotations.jsonl.gz \
  --in pairwise_similarity_annotations.jsonl.gz \
  > genewise_min5_max50.jsonl

--min or --max alone is fine.

4. Filter by gene list (tsumugi genes --keep/--drop)

tsumugi genes --keep genes.txt \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_keep_genes.jsonl

tsumugi genes --drop geneA,geneB \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_drop_genes.jsonl

5. Filter by life stage (tsumugi life-stage --keep/--drop)

tsumugi life-stage --keep Early \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_lifestage_early.jsonl

6. Filter by sex (tsumugi sex --keep/--drop)

tsumugi sex --drop Male \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_no_male.jsonl

7. Filter by zygosity (tsumugi zygosity --keep/--drop)

tsumugi zygosity --keep Homo \
  --in pairwise_similarity_annotations.jsonl.gz \
  > pairwise_homo.jsonl

8. Export GraphML / webapp

tsumugi build-graphml \
  --in pairwise_similarity_annotations.jsonl.gz \
  --genewise genewise_phenotype_annotations.jsonl.gz \
  > network.graphml

tsumugi build-webapp \
  --in pairwise_similarity_annotations.jsonl.gz \
  --genewise genewise_phenotype_annotations.jsonl.gz \
  --output_dir ./webapp_output

CLI supports STDIN/STDOUT, so you can chain commands:
zcat ... | tsumugi mp ... | tsumugi genes ... > out.jsonl

🔍 How We Calculate Phenotypically Similar Genes

Data source

IMPC Release-23.0 statistical-results-ALL.csv.gz
Columns: Data fields

Preprocessing

Extract gene–phenotype pairs with KO mouse P-value (p_value, female_ko_effect_p_value, or male_ko_effect_p_value) ≤ 0.0001.

  • Annotate genotype-specific phenotypes: homo, hetero, hemi
  • Annotate sex-specific phenotypes: female, male

Phenotypic similarity

TSUMUGI computes Resnik similarity between MP terms and rescales pairwise gene scores to Phenodigm (0–100).

  1. Build the MP ontology and compute Information Content (IC):
    IC(term) = -log((|Descendants(term)| + 1) / |All MP terms|)
  2. Resnik(t1, t2) = IC of the most informative common ancestor (MICA); if no common ancestor, similarity = 0.
  3. For each gene pair, create a matrix of significant MP terms and weight each Resnik score by metadata match (zygosity / life stage / sex) with factors 1.0 / 0.75 / 0.5 / 0.25. Take row/column maxima to obtain the actual max and mean similarity observed.
  4. Derive theoretical max and mean from IC values of the terms, then normalize:
    Phenodigm = 100 * 0.5 * ( actual_max / theoretical_max + actual_mean / theoretical_mean )
    If a theoretical denominator is 0, set that term to 0. The resulting 0–100 score feeds the downloadable tables and the Phenotypes similarity slider.

✉️ Contact

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tsumugi-0.5.0.tar.gz (1.7 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

tsumugi-0.5.0-py3-none-any.whl (1.8 MB view details)

Uploaded Python 3

File details

Details for the file tsumugi-0.5.0.tar.gz.

File metadata

  • Download URL: tsumugi-0.5.0.tar.gz
  • Upload date:
  • Size: 1.7 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tsumugi-0.5.0.tar.gz
Algorithm Hash digest
SHA256 b3b3a8e0acb8a0f4326e45aabb10d7d3b9e0f0cd432ef0cc49f3d61aeae04350
MD5 368fa41a88c79c948e81fb224041e2ad
BLAKE2b-256 f25f96073e1165402072939408a6e15e71affd643abb2493659ebb4953cbf7eb

See more details on using hashes here.

Provenance

The following attestation bundles were made for tsumugi-0.5.0.tar.gz:

Publisher: pypi.yml on akikuno/TSUMUGI-dev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file tsumugi-0.5.0-py3-none-any.whl.

File metadata

  • Download URL: tsumugi-0.5.0-py3-none-any.whl
  • Upload date:
  • Size: 1.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for tsumugi-0.5.0-py3-none-any.whl
Algorithm Hash digest
SHA256 35b1690510d5cfd59749e0581058ed87676d11b26ef79033357455e1ab8e7cdc
MD5 0a8def4d43cc63e10040d23b30070862
BLAKE2b-256 c49e062af7572f7362f29ff04b93f9f0253e6a0ad40d7b2afd6bd7c22c5a6cd4

See more details on using hashes here.

Provenance

The following attestation bundles were made for tsumugi-0.5.0-py3-none-any.whl:

Publisher: pypi.yml on akikuno/TSUMUGI-dev

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page