Bayesian Event-Based Model for Disease Subtype and Stage Inference

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

`pysubebm`

Installation

pip install bebms

or git clone this project, and then

pip install -e .

Generate synthetic data

If you need quick examples of data usable for bebms for testing purposes, you can use sample data available at bebms/data/samples.

If you need to generate synthetic data: Git clone this repository, and at the root, run

bash gen.sh

The generated data will be found at bebms/test/my_data as .csv files.

The parameters are pre-set and can be found at bebms/data/params.json. You can modify the parameters by modifying the json file.

You can also change parameters in config.toml to adjust what data to generate.

Run `bebms` algorithm

After git cloning this repository and generating syntheti cdata, to run bebms, at the root, run

bash test.sh

You can check bebms/test/test.py to learn how to use the run_bebms function.

The results will be saved in the folder of bebms/test/algo_results.

Compare with SuStaIn

You can also compare the results of bebms with those of SuStaIn.

First, you need to install packages required by SuSta

pip install git+https://github.com/noxtoby/awkde
pip install git+https://github.com/hongtaoh/ucl_kde_ebm
pip install git+https://github.com/hongtaoh/pySuStaIn

Then, at the root of this repository, run

bash test_sustain.sh

You can check details at bebms/test/test_sustain.py.

The results will be saved in the folder of bebms/test/sustain_results.

Save comparison results

You can save the results of bebms along with those of SuStaIn by running at the root:

python3 save_csv.py

The results will be found at the root as all_results.csv.

Use your own data

You can use your own data. But make sure that your data follows the format as in data in bebms/data/samples.

Find the optimal number of subtypes

After you have your own data, the first step is to find the optimal number of subtypes.

import pandas as pd 
import numpy as np 
import matplotlib.pyplot as plt 
from bebms.cross_validate import cross_validatation
import bebms.utils as utils

data_file = 'path/to/your/data.csv'

cvic_scores, optimal_n = cross_validatation(
    data_file=data_file,
    iterations=10000, # how many MCMC iterations to run. 
    n_shuffle=2, # how many biomarkers to shuffle in each subtype; recommend 2.
    n_subtype_shuffle=2, # how many subtypes to shuffle; recommend 2.
    burn_in=200, 
    prior_n=1, # Strength of the prior belief in prior estimate of the mean (μ), set to 1 as default
    prior_v=1, # # Prior degrees of freedom, influencing the certainty of prior estimate of the variance (σ²), set to 1 as default
    max_n_subtypes=6, # the max number of subtypes
    N_FOLDS=5, # K-fold validation. Choose K here. 
    seed=42, # random seed. 
    with_labels=True # whether to assume the knowledge of diagnosis labels, i.e., healthy or not. 
)

# to get the optimal number of subtypes
ml_n_subtypes = utils.choose_optimal_subtypes(cvic_scores)
print(ml_n_subtypes)

# Summarize results
df_cvic = pd.DataFrame({
    "n_subtypes": np.arange(1, 7),
    "CVIC": cvic_scores
})
print(df_cvic)

# Plot CVIC curve
plt.figure(figsize=(6,4))
plt.plot(df_cvic["n_subtypes"], df_cvic["CVIC"], marker="o")
plt.xlabel("Number of subtypes")
plt.ylabel("CVIC (lower is better)")
plt.title("Cross-validated model selection (BPEBM-S)")
plt.grid(True)
plt.show()

Run BEBMS

After you know the optimal number of subtypes, you can start running bebms on your dataset.

It's ideal if you can try different random seeds and see which one leads to the highest data log likelihood:

import pandas as pd 
import numpy as np 
from bebms import cross_validatation, run_bebms
from collections import defaultdict, Counter

data_file = 'path/to/your/data.csv'

dic = defaultdict(float)
for _ in range(10): # try 10 random seeds; modify the number as you wish. 
    x = np.random.randint(1, 2**32 - 1)
    results = run_bebms(
        data_file= data_file,
        n_subtypes=3, # that is the optimal number of subtypes you identified above
        output_dir='bebms_results',
        n_iter=20000, # number of MCMC iterations.
        n_shuffle=2, 
        n_subtype_shuffle=2,
        burn_in=200,
        thinning=1,
        seed = x, 
        obtain_results=True, # to get the results
        save_results=False, # but no need to save the results; why? because here we only need to get the data likelihood, and no need to save the results
        with_labels=True, # we assume the knowledge of diagnosis labels
        save_plots=False # we do not save plots
    )
    dic[x] = results['max_log_likelihood']

# By checking dic, you can know which random seed led to the highest data log likelihood

# Finally, you can run bebms to get the results. 
seed = 12345 # Suppose that is the optimal seed you identified above

results, all_orders, all_loglikes, best_order_matrix, biomarker_names, ml_stage, ml_subtype = run_bebms(
        data_file= data_file,
        n_subtypes=3,
        output_dir='bebms_results', # where results will be saved into
        n_iter=20000,
        n_shuffle=2,
        n_subtype_shuffle=2,
        burn_in=200,
        thinning=1,
        seed = seed,
        obtain_results=True,
        save_results=True, # Now we need to save results
        with_labels=True,
        save_plots=True # Now we need save the result plots. 
    )

Changelogs

2025-08-21 (V 0.0.3)
- Did the generate_data.py.
2025-08-22 (V 0.0.5)
- Did the mh.py
- Correct conjugate_priors implementation.
2025-08-23 (V 0.1.2)
- Improved functions in utils.py.
2025-08-29 (V 0.1.3)
- Didn't change much.
2025-08-30 (V 0.1.8)
- Optimized compute_likelihood_and_posteriors such that we only calculate healthy participants' ln likelihood once every time.
- Made sure subtype assignment accuracy does not apply to healthy participants at all.
- Fixed a major bug in data generation. The very low subtype assignment might be due to this error.
- Included both subtype accuracy in run.py.
2025-08-31 (V 0.2.5)
- Resacle event times and disease stages for exp7-9 such that max(event_times) = max_stage -1, and max(disease_stages) = max_stage.
- Changed the experiments and some of the implementation.
- Forcing max(event_times) = max_stage -1, but not forcing disease stages.
2025-09-01 (V 0.2.9)
- REMOVED THE Forcing max(event_times) = max_stage -1
- Modified the run.py.
2025-09-02 (V 0.3.3.1)
- Redid the staging and subtyping.
- Integrated with labels and not.
2025-09-04 (V 0.3.3.2)
- Made sure in staging with labels, the new_order indices starts from 1 instead of 0. This is because participant stages now start from 0.
2025-09-06 (V 0.3.5.6)
- Added the plot function back.
2025-09-08 (V 0.3.5.8)
- Added ml_subtype in output results.
- Added all_logs to the output returned in run.py.
2025-09-21 (V 0.3.9)
- Removed iteration >= burn_in when updating best_*.
2025-11-03 (V 0.4.1)
- Changed the package name to bebms.
- Edited README.
2025-11-06 (V 0.4.3)
- Updated README.
- Allowed keep_all_cols=True when generating synthetic data. Will use the long format in that situation.

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

0.6.0

Jan 31, 2026

0.5.0

Jan 27, 2026

0.4.8

Jan 21, 2026

0.4.5

Nov 7, 2025

0.4.4

Nov 7, 2025

This version

0.4.3

Nov 6, 2025

0.4.1

Nov 3, 2025

0.4.0

Nov 3, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

bebms-0.4.3.tar.gz (89.0 kB view details)

Uploaded Nov 6, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

bebms-0.4.3-py3-none-any.whl (127.8 kB view details)

Uploaded Nov 6, 2025 Python 3

File details

Details for the file bebms-0.4.3.tar.gz.

File metadata

Download URL: bebms-0.4.3.tar.gz
Upload date: Nov 6, 2025
Size: 89.0 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.19

File hashes

Hashes for bebms-0.4.3.tar.gz
Algorithm	Hash digest
SHA256	`7f83ddc930bcbb8feac2bd4f5300095f3c251235c30ea7d945bf3dee34101b6e`
MD5	`56365f9a1f263ed9bde387b5c6fdd134`
BLAKE2b-256	`1057a27db377296e1b2647f620e1018a6cc78016989b3205b1a9522abc7b5164`

See more details on using hashes here.

File details

Details for the file bebms-0.4.3-py3-none-any.whl.

File metadata

Download URL: bebms-0.4.3-py3-none-any.whl
Upload date: Nov 6, 2025
Size: 127.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.8.19

File hashes

Hashes for bebms-0.4.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`046bfc8d3eac1e21ab7fbf4e048d5ea543ccdda35961bb7ba4f1d8807fdb6979`
MD5	`1faefce5c9d8aa27107c0f01e3578f8c`
BLAKE2b-256	`ebc258d1a2e9797ecc2ffb1dff969eb35fccef778d80c433fc5933316121dbcc`

See more details on using hashes here.

bebms 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

`pysubebm`

Installation

Generate synthetic data

Run `bebms` algorithm

Compare with SuStaIn

Save comparison results

Use your own data

Find the optimal number of subtypes

Run BEBMS

Changelogs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

bebms 0.4.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

pysubebm

Installation

Generate synthetic data

Run bebms algorithm

Compare with SuStaIn

Save comparison results

Use your own data

Find the optimal number of subtypes

Run BEBMS

Changelogs

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`pysubebm`

Run `bebms` algorithm