Hospitalisation-risk predictor for elderly patients — an open, honestly-evaluated clinical decision-support research prototype.

These details have not been verified by PyPI

Project links

Project description

Praevius — Hospitalization Risk Predictor for Elderly Patients

A machine learning tool that helps health professionals identify elderly patients at higher risk of hospitalization — up to 1 and 3 years in advance.

Language / Idioma: 🇧🇷 Português | 🇬🇧 English (you are here)

🤗 The trained 1-year model is published on Hugging Face · ▶️ Try the interactive demo (Gradio Space — no install, no data stored).

What is this tool?

Praevius is an open-source clinical decision-support tool. It analyses routine clinical data collected from elderly patients — things like their walking speed, number of medications, cognitive scores, and history of falls — and produces a hospitalization risk score: a number between 0% and 100% that estimates how likely a patient is to need hospitalization within the next 1 year or 3 years.

Think of it as an early-warning system. The goal is not to replace clinical judgment, but to give clinicians an extra layer of information — a systematic way to flag patients who may need more attention before a crisis happens.

Who is this for?

This tool is designed for health professionals who work with elderly patients: geriatricians, general practitioners, nurses, physiotherapists, and care coordinators. You do not need any background in data science or machine learning to use it.

If you are a researcher or developer, the codebase is fully open and documented — contributions are very welcome (see Contributing).

Why does this matter?

Hospitalizations in the elderly are often preventable. Studies show that many hospitalization events are preceded by a gradual decline in physical function, cognitive capacity, and social engagement — signals that are often present in clinical records but are hard to synthesize manually when managing a large caseload.

This tool automates that synthesis. By processing multiple clinical variables at once and comparing a patient's profile against patterns learned from historical data, it can surface patients who are drifting toward higher risk — before the situation becomes urgent.

How does it work? (No technical background required)

Here is a plain-language explanation of the process:

1. Learning from historical data

The tool was trained on records from elderly patients, each of which included dozens of clinical measurements plus a record of whether that patient was hospitalized within 1 or 3 years. The algorithm studied these records (currently 117 visits from 30 unique patients — see Limitations) and learned which combinations of factors tend to appear before a hospitalization.

This is called "supervised machine learning" — we showed the algorithm many examples with known outcomes, and it identified the patterns that distinguish high-risk from low-risk profiles.

2. Scoring new patients

When you provide the tool with a new patient's data, it compares that patient's clinical profile against the patterns it learned, and outputs a probability score. A score of 0.78 (78%) would mean: "based on patients with similar profiles in our training data, there is approximately a 78% chance this patient will be hospitalized within the next year."

Important: with the current amount of training data, these probabilities are not yet calibrated — the exact percentage must not be read literally (this was measured; see Limitations). The tool therefore reports risk bands (low / moderate / high), and the band — not the exact number — is the reliable reading.

3. Explaining the score

The tool also explains why a patient received a particular score — which factors raised the risk and which ones lowered it. This is shown as a chart called a SHAP explanation (more on this in the Understanding Your Results section). You can point to specific clinical findings that drove the score and discuss them with your team.

4. What the tool does not do

It does not diagnose conditions
It does not prescribe treatment
It does not replace clinical assessment
It cannot tell you why a patient will be hospitalized — only that the pattern of their data resembles patients who historically were

Current status

This project is under active development. Here is where we are:

Phase	Description	Status
Phase 1	Foundation — robust model training and evaluation	✅ Complete
Phase 2	Data strategy — augment dataset with public health data	⏳ Planned
Phase 3	Clinical interface — Streamlit web app for direct use	✅ Complete
Phase 4	Open source packaging — installable, citable, CI/CD	🔨 In progress

Phase 1 checklist:

Core ML pipeline (data loading, feature engineering, model training, evaluation)
Resolve repository merge conflicts
Fix target binarization — hospitalization columns stored counts (0–3); converted to binary (0 = never, 1 = at least once)
Fix data leakage — switched from random row-level split to patient-level split; all visits from one patient now stay in the same set
Patient-level cross-validation — StratifiedGroupKFold ensures all visits from the same patient stay together; all 30 patients contribute to evaluation; k chosen automatically based on dataset size
SMOTE for class imbalance — applied inside each CV training fold only; never on test data; auto-disabled when minority class ≥ 40%
Remove Decision Tree — AUC near 0.5 on small datasets; unstable and not suitable for clinical use
Two-phase training pipeline — Phase A: patient-level cross-validation (honest evaluation, source of the committed charts) → Phase B: final model retrained on all data and saved as a Pipeline
predict.py — interactive script to run a risk prediction for a single new patient; outputs a risk summary chart + SHAP explanation
SHAP explanations per patient — waterfall chart showing which clinical factors increased or decreased each individual patient's risk score
Hyperparameter tuning with nested cross-validation — RandomizedSearchCV with patient-level CV selects hyperparameters; a nested (outer) CV then evaluates the tuned model on patients the search never saw. Honest tuned 1-year AUC: 0.739 ± 0.277 (3-year: 0.462 — below chance, do not use)
3-year model limitations documented — clearly communicated in Limitations section and performance table
End-to-end Pipeline (training = prediction) — all preprocessing (imputation, feature engineering, encoding, scaling) and the model are saved together as a single scikit-learn Pipeline. Preprocessing is now fitted only on training data inside every CV fold (removing preprocessing leakage), and predict.py applies exactly the same transformations as training — fixing a bug where hand-rolled preprocessing produced invalid scores

Getting started

What you need

Python 3.10 or higher (download here)
About 5 minutes for setup
A dataset is needed only if you want to retrain the model — Praevius ships with pre-trained models, so you can score patients right after installing.

Step 1 — Install Praevius

git clone https://github.com/Zanarino/praevius.git
cd praevius
pip install ".[app]"

This installs the praevius package, pulls every dependency, and creates two terminal commands: praevius (scoring) and praevius-app (the clinical interface). pip install . alone installs the core tool and the praevius command; the [app] extra adds Streamlit and the web interface. (Once the package is published, pip install praevius will work directly.)

If you are not familiar with git, you can also click the green Code button on GitHub and select Download ZIP first. If you see errors, make sure your Python version is 3.10 or higher (python --version).

Step 2 — Score a patient

Try it immediately with a fictional example patient — no dataset required, since the pre-trained models are bundled:

praevius --example

Run praevius with no arguments for an interactive prompt that asks for the patient's clinical values; anything you leave blank is filled with a typical training value. You get the 1-year risk band, the inter-model agreement indicator, a SHAP explanation of the factors, and charts saved to an outputs/ folder.

Step 3 — Open the clinical interface (optional)

praevius-app

Your browser opens a bilingual (Portuguese/English) form where you enter the patient's data — fields you don't have are filled automatically with typical training values. After acknowledging the research-prototype notice, you get the 1-year risk band, an indicator of how much the three models agree on this patient, the 3-year panel (currently "in development" — no score is shown, by design), and a SHAP chart explaining which factors drove the result. A one-page PDF report of the assessment (generated in memory) can be downloaded for the patient's record. The interface runs entirely on your machine: no patient data ever leaves it or is stored.

Step 4 — Retrain on your own data (advanced, optional)

Praevius ships with pre-trained models, so this is only needed to rebuild them on your own dataset. Place your data at raw_data/Virtual_Patient_Models_Dataset.csv (see The Dataset for the expected format), then run:

python -m praevius.predictive_model

This loads and cleans the data, evaluates the models via honest cross-validation, retrains them on all data, and saves the pipelines + model card (model_card.json) into the package (so they ship with it), with performance charts and reports in the outputs/ folder. It takes around 5 minutes on a standard laptop.

Understanding your results

After running the model, you will find the following files in the outputs/ folder. Here is what each one means:

`cv_summary_1year.csv` and `cv_summary_3years.csv` ← Start here

These are the most important files. They show the real model performance, measured honestly through cross-validation — a method that uses all 30 patients for evaluation, not just a small subset.

Column	What it means
`roc_auc_mean`	Mean AUC across all folds — this is the primary performance metric
`roc_auc_std`	Standard deviation — shows how stable performance is across different patient groups
`pr_auc_mean`	Mean Precision-Recall AUC — complements ROC-AUC, especially useful for imbalanced classes
`recall_mean`	Of the patients who were actually hospitalised, what proportion did the model correctly identify?
`f1_mean`	Balanced score combining precision and recall — more informative than AUC alone

`cv_fold_results_1year.csv` and `cv_fold_results_3years.csv`

These files show the results for each individual fold — useful for diagnosing which patient groups the model consistently gets right or wrong. High variability across folds indicates we need more data.

`nested_cv_tuned_1year.csv` and `nested_cv_tuned_3years.csv`

These files contain the nested cross-validation results for the hyperparameter-tuned Gradient Boosting: for each outer fold, the column inner_best_auc is the search's internal selection score and outer_test_auc is the honest evaluation on patients the search never saw. Use the mean of outer_test_auc when discussing tuned-model performance.

`calibration_1year.csv` / `calibration_curve_1year.png` (and 3-year equivalents)

The calibration evaluation: Brier score and expected calibration error (ECE) for the deployed model and for two recalibrated variants (sigmoid and isotonic), measured on out-of-fold predictions, plus the no-information baseline (base_rate_brier) and the resulting display_decision (bands_only or percentage) that the interfaces obey. The PNG is the reliability diagram — the closer a curve is to the diagonal, the more literally its percentages can be read.

What is ROC-AUC?

ROC-AUC (Area Under the Receiver Operating Characteristic Curve) is a standard way to measure how good a predictive model is. It tells you how well the model separates high-risk patients from low-risk ones — across all possible risk thresholds.

Think of it like this: if you randomly pick one patient who was hospitalised and one who was not, the AUC is the probability that the model will correctly rank the hospitalised one as higher-risk.

In clinical terms, AUC is similar to the combined sensitivity/specificity performance of a diagnostic test. A model with AUC 0.82 would correctly rank 82% of randomly chosen high-risk/low-risk pairs.

AUC value	What it means
1.00	Perfect — never makes a mistake
0.90–0.99	Excellent discrimination
0.80–0.89	Very good discrimination
0.70–0.79	Good discrimination
0.60–0.69	Fair — use with caution
0.50	No better than chance (equivalent to a coin flip)
< 0.50	Performing worse than chance — something is wrong

Current model performance:

⚠️ How to read these numbers

All evaluation is by patient-level cross-validation — every patient is scored by a model that never saw them, and the mean ± std across folds is the honest estimate of real-world performance. There is no separate train/test split; the charts further down are drawn from the same honest cross-validation (ROC) or the final model (feature importance).

The previous inflated AUC of 0.816 (before Phase 1 fixes) was caused by row-level splitting and target counts treated as categories — both forms of data leakage. Lower, honest numbers are better than higher, misleading ones.

Current cross-validated performance (30 patients, 5-fold, mean ± std):

Model	1-year ROC-AUC	3-year ROC-AUC
Logistic Regression	0.389 ± 0.171	0.476 ± 0.251
Random Forest	0.541 ± 0.256	0.502 ± 0.203
Gradient Boosting (default)	0.659 ± 0.197	0.548 ± 0.206
Gradient Boosting (tuned, nested CV) ¹	0.739 ± 0.277	0.462 ± 0.159

¹ The "tuned" row comes from nested cross-validation: the hyperparameter search runs inside each outer training fold, and the chosen model is evaluated on outer-fold patients the search never saw. This is the honest estimate of the tuning procedure — unlike the search's own best score (0.776 for 1 year, saved in best_params_*.csv), which is a selection score and optimistically biased by construction.

Two honest observations from the nested CV: the 1-year result has very high fold-to-fold variance (one fold scored 0.32 while the others scored 0.61–1.00 — see nested_cv_tuned_1year.csv), and the 3-year tuned model performs below chance (0.462) — tuning does not generalise at this horizon, reinforcing that the 3-year score must not be used.

These numbers are slightly lower than previously reported because a subtle form of preprocessing leakage was removed: imputation, feature engineering and scaling are now fitted only on the training data of each fold, instead of on the full dataset. Lower, honest numbers are better than higher, misleading ones.

Probability calibration (decides what the interfaces display):

Calibration — whether a "70%" score really means a 70% chance — was measured with patient-level cross-validation, comparing the deployed model against Platt (sigmoid) and isotonic recalibration. Result for the 1-year champion: Brier score 0.160, worse than the no-information baseline of 0.153 (always predicting the prevalence), with an expected calibration error of 0.123; recalibration does not fix this at the current sample size. Decision, recorded in the model card and obeyed by all interfaces: display risk bands (low / moderate / high) with the percentage de-emphasised.

Reliability diagram for the 1-year model

The reliability diagram above: the closer a curve sits to the diagonal, the more literally its percentages can be read. (The full numbers regenerate to outputs/calibration_1year.csv when you run training.)

Why is the variance so high (± 0.20+)? With only ~6 patients per test fold, one difficult patient can swing the AUC by 0.2 or more. This is expected and honest — it means the model's estimates are not yet stable enough for clinical decisions. The variance will reduce as more patients are added to the dataset. Our cross-validation code already scales automatically: it selects 5-fold for 30–99 patients, 10-fold for 100–299, and LOO for fewer than 20.

Do not use either model for clinical decisions at this stage. This is a research prototype under active development.

ROC curve — cross-validated (honest)

Cross-validated ROC curve for the 1-year horizon

The ROC curve plots True Positive Rate (how many high-risk patients we correctly catch) against False Positive Rate (how many low-risk patients we incorrectly flag) at every possible risk threshold. This curve is honest: it is built from the pooled out-of-fold cross-validation predictions, so every point comes from a model scoring a patient it never trained on.

A curve that hugs the top-left corner = excellent model
A curve that follows the diagonal dotted line = model is no better than chance
The AUC is the area under the curve — larger area means better performance

Feature importance — final model

Feature importance for the 1-year final model

This chart shows which clinical variables had the most influence on the final model (the champion Gradient Boosting pipeline trained on all patients). The longer the bar, the more important that variable was.

This is clinically useful for two reasons:

Sanity check: The top variables should make clinical sense. If something unexpected appears at the top (like patient ID), it signals a data problem.
Clinical insight: The chart may confirm or reveal which factors in your patient population most strongly predict hospitalization.

Variables you are likely to see at the top: frailty status (Fried criteria), number of comorbidities, gait speed, number of medications, and MMSE score.

The dataset

Format

The model expects a CSV file at raw_data/Virtual_Patient_Models_Dataset.csv with one row per patient visit. The key variables expected are listed below. All names are case-sensitive.

Variable	Type	Description
`part_id`	Integer	Patient identifier
`age`	Integer	Age in years
`gender`	String	Patient gender
`fried`	String	Frailty status: `Non frail`, `Pre-frail`, or `Frail`
`katz_index`	Integer	Katz Index of Independence in Activities of Daily Living (0–6)
`iadl_grade`	Integer	Instrumental Activities of Daily Living score
`gait_speed_4m`	Float	Gait speed over 4 metres (m/s)
`raise_chair_time`	Float	Time to rise from chair 5 times (seconds)
`falls_one_year`	Integer	Number of falls in the past year
`comorbidities_count`	Integer	Total number of comorbidities
`medication_count`	Integer	Number of medications
`mmse_total_score`	Integer	Mini-Mental State Examination score (0–30)
`depression_total_score`	Integer	Depression scale score
`hospitalization_one_year`	Integer	Target: 1 if hospitalized within 1 year, 0 if not
`hospitalization_three_years`	Integer	Target: 1 if hospitalized within 3 years, 0 if not

Variables coded as 999 are treated as missing values and handled automatically.

Privacy

Patient data must never be committed to this repository. The raw_data/ folder is excluded by .gitignore. When contributing, always verify that your commit does not contain real patient records.

In practice, this policy means:

The exploratory notebook (exploratory_analysis_dataset.ipynb) is committed with its outputs cleared, so no data rows appear in the repository. If you run it locally, clear the outputs before committing (Cell → All Output → Clear in Jupyter, or jupyter nbconvert --clear-output --inplace <notebook>).
The trained pipeline files in models/ contain only aggregate statistics of the training data (per-column medians, modes, means and standard deviations used for imputation and scaling) — never individual patient records.
The outputs/ folder (training charts, CSVs, reports) is git-ignored and regenerated locally by python -m praevius.predictive_model. Only a few curated illustrative charts are committed, under docs/img/; these are either aggregate (cross-validated ROC, feature importance, calibration) or based on a fictional example patient — never on a real record.

Sample / synthetic data

A synthetic dataset generated to match the statistical properties of real data (without containing any real patient records) will be provided in Phase 2 of this project. This will allow anyone to run and test the tool without access to clinical data.

Limitations

Being transparent about what this tool cannot do is as important as explaining what it can do.

1. Small training dataset The current model was trained on 117 records from 30 unique patients. This is a small sample by machine learning standards. The models may not generalise well to patient populations that differ from the training group, and performance estimates have wide uncertainty margins — as evidenced by the high standard deviation (± 0.20+) in cross-validation.

2. The 3-year model is not yet reliable The 3-year hospitalisation prediction model achieves a mean cross-validated AUC of approximately 0.55 — marginally above random chance, with high fold-to-fold variance (± 0.21). Worse, the hyperparameter-tuned version evaluated by nested cross-validation scores below chance (0.462) at this horizon. Predicting events 3 years ahead from this sample size is not yet feasible. We include the model for completeness and future development, but do not use the 3-year score for clinical decisions at this stage.

3. Overfitting is visible The small dataset makes the models prone to overfitting: in cross-validation, individual folds swing widely (one 1-year fold scores 0.32 while others reach 1.00). This wide fold-to-fold variance is the honest signature of a model memorising rather than generalising — it is expected to shrink as more patients are added.

4. Correlation, not causation The model finds statistical patterns. It cannot tell you why a patient is at high risk — only that their data profile resembles patients who were hospitalized. Always interpret the score in the context of your full clinical assessment.

5. Population specificity Models trained on one population may not perform equally well on another. Before relying on this tool in a new clinical setting, validate its performance on your own data.

6. This is a decision-support tool, not a decision-making tool Risk scores should inform — not replace — clinical judgment. A patient scored at 30% may still warrant intervention based on factors not captured in the data. A patient scored at 80% may have circumstances that make hospitalization unlikely.

7. The probabilities are not calibrated A formal calibration evaluation (patient-level cross-validation; see outputs/calibration_1year.csv) showed that the exact percentages cannot be read literally: the deployed model's Brier score is worse than a no-information baseline, and recalibration (Platt/isotonic) does not fix this at the current sample size. All interfaces therefore display risk bands (low / moderate / high) and de-emphasise the percentage. This is expected to improve as more patients are added (Phase 2).

Project roadmap

Phase 1 — Foundation (in progress)
├── ✅ Core ML pipeline
├── ✅ Merge conflict resolution
├── ✅ Fix target binarization (counts → binary)
├── ✅ Fix data leakage (patient-level split)
├── ✅ Patient-level cross-validation (StratifiedGroupKFold, auto k-selection)
├── ✅ SMOTE for class imbalance (inside CV folds only)
├── ✅ Remove Decision Tree (AUC ~0.5 on small N)
├── ✅ Two-phase pipeline (CV eval → final all-data model)
├── ✅ Single-patient prediction script (predict.py)
├── ✅ SHAP explanation per patient
├── ✅ Hyperparameter tuning (honest tuned 1-year AUC via nested CV: 0.739 ± 0.277)
├── ✅ Nested cross-validation — honest evaluation of the tuning procedure
├── ✅ 3-year model limitations clearly documented
└── ✅ End-to-end Pipeline — training and prediction share identical preprocessing

Phase 2 — Data strategy
├── ⬜ Augment with public datasets (NHANES, ELSA-Brasil, SHARE, InCHIANTI)
├── ⬜ Synthetic data generation (CTGAN) for open distribution
└── ⬜ Federated learning design for multi-institution contribution

Phase 3 — Clinical interface
├── ✅ Shared scoring engine, model card and inter-model agreement indicator
├── ✅ Probability calibration assessed — decision: interfaces show risk bands
├── ✅ Streamlit web application (local-only; disclaimer gate; no data stored)
├── ✅ Single-patient risk assessment form (blank fields imputed by the Pipeline)
├── ✅ SHAP explanation chart per patient
├── ✅ PDF report generation (in-memory, downloadable — nothing written to disk)
└── ✅ Portuguese / English bilingual interface

Phase 4 — Open source packaging
├── ✅ Installable Python package (`pip install`) with `praevius` / `praevius-app` commands
├── ✅ Automated tests (pytest + GitHub Actions CI)
├── ✅ CONTRIBUTING.md + CODE_OF_CONDUCT.md
├── ✅ GPL v3 license
├── ✅ CITATION.cff for academic citation
├── ✅ Ethics statement (ETHICS.md) + design rationale (docs/technical-decisions.md)
├── ✅ CHANGELOG.md
├── ✅ 1-year model published to Hugging Face
└── ⬜ Publish to PyPI · tag a citable release (Zenodo DOI)

Contributing

Contributions are welcome from both clinicians and developers.

If you are a clinician:

Share feedback on whether the outputs make clinical sense
Report variables that are commonly collected in your setting but missing from the model
Help validate the tool on new patient populations

If you are a developer or data scientist:

See the Phase 1–4 roadmap above for what needs to be built
Open an issue to discuss your idea before opening a pull request
Follow the existing code style and document everything for a non-technical audience

Getting started:

git clone https://github.com/Zanarino/praevius.git
cd praevius
pip install -e ".[dev]"   # editable install + test/build tooling
python -m pytest tests/   # run the test suite

See the CONTRIBUTING.md guide for detailed contribution instructions, and please follow our Code of Conduct. For the rationale behind the modelling choices (patient-level CV, the leakage fixes, why probabilities are shown as bands), see docs/technical-decisions.md. Notable changes are tracked in the CHANGELOG.

Ethics statement

This tool is designed to support clinical decision-making, not to automate it. We believe that:

Risk scores must always be explained, not just reported
Clinicians must retain full authority over care decisions
Patient data must be handled in accordance with applicable privacy law (LGPD in Brazil, GDPR in Europe, HIPAA in the US)
Model limitations must be communicated clearly and honestly to all users
The tool should never be used as a basis to withhold care from a patient

The full, bilingual ethics statement — covering intended use, data governance, fairness, accountability and our honesty commitments — is in ETHICS.md.

License

This project is licensed under the GNU General Public License v3.0 (GPL v3). You are free to use, modify, and distribute it — including for commercial purposes — but any derivative work must also be released under GPL v3 and made open source. This ensures the tool always remains free and open for the clinical community.

See the LICENSE file for details.

Citation

If you use this tool in research or clinical work, please cite it as:

Zanarino, R. (2026). Praevius: Hospitalization Risk Predictor for Elderly Patients.
GitHub. https://github.com/Zanarino/praevius

A CITATION.cff file is included in the repository for automated citation (GitHub's "Cite this repository" button).

Author

Rafael Zanarino Data Science for Healthcare | 2026

Built to improve the care of elderly patients.

Disclaimer: This tool is intended for research and clinical decision-support purposes only. It is not a certified medical device and must not be used as a sole basis for clinical decisions. Always consult applicable regulations before deploying in a clinical environment.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.1.0

Jun 26, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

praevius-0.1.0.tar.gz (333.6 kB view details)

Uploaded Jun 26, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

praevius-0.1.0-py3-none-any.whl (321.0 kB view details)

Uploaded Jun 26, 2026 Python 3

File details

Details for the file praevius-0.1.0.tar.gz.

File metadata

Download URL: praevius-0.1.0.tar.gz
Upload date: Jun 26, 2026
Size: 333.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for praevius-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`b751619713a9647432d193bd602ec2584ed325fba073ba0c7a00fb8e8ce0bae7`
MD5	`49c4393423b6e3a54fd5c459b8146de1`
BLAKE2b-256	`145b9d6c22950d715d078a24e7c56cd4015f30e0ec7aae95dbdee34ff38e2420`

See more details on using hashes here.

File details

Details for the file praevius-0.1.0-py3-none-any.whl.

File metadata

Download URL: praevius-0.1.0-py3-none-any.whl
Upload date: Jun 26, 2026
Size: 321.0 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.6

File hashes

Hashes for praevius-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`e5b93fe9cf6118718635698de0717b77af45b6168e264bcc8efb284c61d38c84`
MD5	`98370f8fbd25ecb02b49d40989e3f381`
BLAKE2b-256	`d5ee57f1d157414013c4590b55ee0c048e8fc36339b532378766b0ea0a7d3b68`

See more details on using hashes here.

praevius 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Praevius — Hospitalization Risk Predictor for Elderly Patients

What is this tool?

Who is this for?

Why does this matter?

How does it work? (No technical background required)

Current status

Getting started

What you need

Step 1 — Install Praevius

Step 2 — Score a patient

Step 3 — Open the clinical interface (optional)

Step 4 — Retrain on your own data (advanced, optional)

Understanding your results

cv_summary_1year.csv and cv_summary_3years.csv ← Start here

cv_fold_results_1year.csv and cv_fold_results_3years.csv

nested_cv_tuned_1year.csv and nested_cv_tuned_3years.csv

calibration_1year.csv / calibration_curve_1year.png (and 3-year equivalents)

ROC curve — cross-validated (honest)

Feature importance — final model

The dataset

Format

Privacy

Sample / synthetic data

Limitations

Project roadmap

Contributing

Ethics statement

License

Citation

Author

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`cv_summary_1year.csv` and `cv_summary_3years.csv` ← Start here

`cv_fold_results_1year.csv` and `cv_fold_results_3years.csv`

`nested_cv_tuned_1year.csv` and `nested_cv_tuned_3years.csv`

`calibration_1year.csv` / `calibration_curve_1year.png` (and 3-year equivalents)