Skip to main content

Private Bi-LSTM Event Log Synthesizer (PBLES)

Project description

PBLES (Private Bi-LSTM Event Log Synthesizer)

Overview

PBLES (Private Bi-LSTM Event Log Synthesizer) is a tool designed to generate process-oriented synthetic healthcare data. It addresses the privacy concerns in healthcare data sharing by integrating differential privacy techniques. By doing so, it can make it easier for researches to share synthetic data with stakeholders, facilitating AI and process mining research in healthcare.However, legal compliance, such as adherence to GDPR or other similar regulations, must be confirmed before sharing data, even if strong differential private guarantees are used.

Features

  • Process-Oriented Data Generation: Handles the complexity of healthcare data processes.
  • Multiple Perspectives: Considers various perspectives of healthcare data, not just control-flow.
  • Differential Privacy: Ensures privacy by incorporating differential privacy techniques.

Installation

To install PBLES, first clone the repository:

git clone https://github.com/martinkuhn94/PBLES.git

Then, install the required dependencies:

pip install -r requirements.txt

Usage

Training the Model

For the training of the model, the stacked layers are configured with 32, 16 and 8 LSTM units respectively, and an embedding dimension of 16. The model trains for 3 epochs with a batch size of 16. The number of clusters for numerical attributes is set to 10, and to speed up the training, only the top 50% quantile of traces by length are considered, in this example. The noise multiplier is set to 0.0, which means that the model is trained without differential privacy. To train the model with differential privacy, set the noise multiplier to a value greater than 0.0. The epsilon value can be retrieved after training the model.

import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm

# Read Event Log
path = "Sepsis_Cases_Event_Log.xes"
event_log = pm4py.read_xes(path)

# Train Model
pbles_model = EventLogDpLstm(lstm_units=32, embedding_output_dims=16, epochs=3, batch_size=16,
                               max_clusters=10, trace_quantile=0.5, noise_multiplier=0.0)

pbles_model.fit(event_log)
pbles_model.save_model("models/DP_Bi_LSTM_e=inf_Sepsis_Cases_Event_Log_test")

# Print Epsilon to verify Privacy Guarantees
print(pbles_model.epsilon)

Sampling Event Logs

To sample synthetic event logs, use the following example with a trained model can be used. The sample size is set to 160, and the batch size is set to 16. The synthetic event log is saved as a XES file. Pretrained models can be found in the "models" folder.

import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm

# Load Model
pbles_model = EventLogDpLstm()
pbles_model.load("models/DP_Bi_LSTM_e=inf_Sepsis_Case")

# Sample
event_log = pbles_model.sample(sample_size=160, batch_size=16)
event_log_xes = pm4py.convert_to_event_log(event_log)

# Save as XES File
xes_filename = "Synthetic_Sepsis_Case_Event_Log.xes"
pm4py.write_xes(event_log_xes, xes_filename)

# Save as XSLX File for quick inspection
df = pm4py.convert_to_dataframe(event_log_xes)
df['time:timestamp'] = df['time:timestamp'].astype(str)
df.to_excel("Synthetic_Sepsis_Case_Event_Log.xlsx", index=False)

Future Work

Future work will focus on enhancing the algorithm and making it available on PyPI.

Contribution

We welcome contributions from the community. If you have any suggestions or issues, please create a GitHub issue or a pull request.

License

This project is licensed under the GPL-3.0 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

PBLES-0.0.1.tar.gz (25.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

PBLES-0.0.1-py3-none-any.whl (27.1 kB view details)

Uploaded Python 3

File details

Details for the file PBLES-0.0.1.tar.gz.

File metadata

  • Download URL: PBLES-0.0.1.tar.gz
  • Upload date:
  • Size: 25.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for PBLES-0.0.1.tar.gz
Algorithm Hash digest
SHA256 eb8312829a8a7dd9a5ed3324fe1e3d7dbb8493ea4211a063826bcbb1eb63fd11
MD5 bc48bb2632e9055e85c64615ad185369
BLAKE2b-256 146f104c4353c0d614d0e73a88e153845e2c4ba97356dd5ed2ca79fe0362bff6

See more details on using hashes here.

File details

Details for the file PBLES-0.0.1-py3-none-any.whl.

File metadata

  • Download URL: PBLES-0.0.1-py3-none-any.whl
  • Upload date:
  • Size: 27.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.11.1

File hashes

Hashes for PBLES-0.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 236e64d7b209bd49020269a707c737de6d77dd4a28ecff2c3b7d3e5309bdd104
MD5 e21cdd216a895c547a430b3a79756b75
BLAKE2b-256 e41b2ab0d71c35184216808d205cbb943ff839b3205595a03f7bb75da7bdf4c3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page