Private Bi-LSTM Event Log Synthesizer (PBLES)
Project description
PBLES (Private Bi-LSTM Event Log Synthesizer)
Overview
PBLES (Private Bi-LSTM Event Log Synthesizer) is a tool designed to generate process-oriented synthetic healthcare data. It addresses the privacy concerns in healthcare data sharing by integrating differential privacy techniques. By doing so, it can make it easier for researches to share synthetic data with stakeholders, facilitating AI and process mining research in healthcare.However, legal compliance, such as adherence to GDPR or other similar regulations, must be confirmed before sharing data, even if strong differential private guarantees are used.
Features
- Process-Oriented Data Generation: Handles the complexity of healthcare data processes.
- Multiple Perspectives: Considers various perspectives of healthcare data, not just control-flow.
- Differential Privacy: Ensures privacy by incorporating differential privacy techniques.
Installation
To install PBLES, first clone the repository:
git clone https://github.com/martinkuhn94/PBLES.git
Then, install the required dependencies:
pip install -r requirements.txt
Usage
Training the Model
For the training of the model, the stacked layers are configured with 32, 16 and 8 LSTM units respectively, and an embedding dimension of 16. The model trains for 3 epochs with a batch size of 16. The number of clusters for numerical attributes is set to 10, and to speed up the training, only the top 50% quantile of traces by length are considered, in this example. The noise multiplier is set to 0.0, which means that the model is trained without differential privacy. To train the model with differential privacy, set the noise multiplier to a value greater than 0.0. The epsilon value can be retrieved after training the model.
import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm
# Read Event Log
path = "Sepsis_Cases_Event_Log.xes"
event_log = pm4py.read_xes(path)
# Train Model
pbles_model = EventLogDpLstm(lstm_units=32, embedding_output_dims=16, epochs=3, batch_size=16,
max_clusters=10, trace_quantile=0.5, noise_multiplier=0.0)
pbles_model.fit(event_log)
pbles_model.save_model("models/DP_Bi_LSTM_e=inf_Sepsis_Cases_Event_Log_test")
# Print Epsilon to verify Privacy Guarantees
print(pbles_model.epsilon)
Sampling Event Logs
To sample synthetic event logs, use the following example with a trained model can be used. The sample size is set to 160, and the batch size is set to 16. The synthetic event log is saved as a XES file. Pretrained models can be found in the "models" folder.
import pm4py
from PBLES.event_log_dp_lstm import EventLogDpLstm
# Load Model
pbles_model = EventLogDpLstm()
pbles_model.load("models/DP_Bi_LSTM_e=inf_Sepsis_Case")
# Sample
event_log = pbles_model.sample(sample_size=160, batch_size=16)
event_log_xes = pm4py.convert_to_event_log(event_log)
# Save as XES File
xes_filename = "Synthetic_Sepsis_Case_Event_Log.xes"
pm4py.write_xes(event_log_xes, xes_filename)
# Save as XSLX File for quick inspection
df = pm4py.convert_to_dataframe(event_log_xes)
df['time:timestamp'] = df['time:timestamp'].astype(str)
df.to_excel("Synthetic_Sepsis_Case_Event_Log.xlsx", index=False)
Future Work
Future work will focus on enhancing the algorithm and making it available on PyPI.
Contribution
We welcome contributions from the community. If you have any suggestions or issues, please create a GitHub issue or a pull request.
License
This project is licensed under the GPL-3.0 License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file PBLES-0.0.1.tar.gz.
File metadata
- Download URL: PBLES-0.0.1.tar.gz
- Upload date:
- Size: 25.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
eb8312829a8a7dd9a5ed3324fe1e3d7dbb8493ea4211a063826bcbb1eb63fd11
|
|
| MD5 |
bc48bb2632e9055e85c64615ad185369
|
|
| BLAKE2b-256 |
146f104c4353c0d614d0e73a88e153845e2c4ba97356dd5ed2ca79fe0362bff6
|
File details
Details for the file PBLES-0.0.1-py3-none-any.whl.
File metadata
- Download URL: PBLES-0.0.1-py3-none-any.whl
- Upload date:
- Size: 27.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.11.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
236e64d7b209bd49020269a707c737de6d77dd4a28ecff2c3b7d3e5309bdd104
|
|
| MD5 |
e21cdd216a895c547a430b3a79756b75
|
|
| BLAKE2b-256 |
e41b2ab0d71c35184216808d205cbb943ff839b3205595a03f7bb75da7bdf4c3
|