A lightweight and extensible Python package for managing data, tailored for researchers working with structured data.
Project description
📦 dwrappr
A lightweight and extensible Python package for managing data, tailored for researchers working with structured data. In addition to general data management features, the package introduces a data structure specifically optimized for ML research. This common format enables researchers to efficiently test new algorithms and methods, streamlining collaboration and ensuring consistency in data management across projects.
🧩 Features
- 🗃️ Consistent dataset object structure for handling structured data in ML use cases
- 🔄 Support for building a file-based internal dataset collaboration platform for researchers
- 🧰 General utilities for managing data like saving and loading
🚀 Quickstart
For executing the quickstart examples and get an overview of dwrappr's functionalities, please have a look at IEEE_examples.
Additional functionalities are showcased in:
- loading_dataset_from_file.py: Shows how to load a dataset from an existing dataset file
- scanning_folder_for_datasets.py: Shows how to scann a folder vor available datasets
- dataset_functionalities.py : Shows some of the main functionalities of the DataSet class.
👀 Functionality Ipnsights
Scan folder for dataset
DATASET_FOLDER = "./data/datasets/"
available_datasets = DataSet.get_available_datasets_in_folder(
DATASET_FOLDER
)
available_datasets.T
Loading specific dataset
DATASET_FILEPATH = "./data/datasets/manufacturing_process_ds.joblib"
ds = DataSet.load(DATASET_FILEPATH)
Generating dataset from raw data
RAW_DATA_FILEPATH= "./data/raw_data.csv"
#load raw data into pandas.DataFrame
df = pd.read_csv(RAW_DATA_FILEPATH)
"""
<some manual dataset preprocessing steps
like dropping missing values and chaning dtypes>
"""
#define metaData
meta = DataSetMeta(
name = "example_dataset",
synthetic_data=True,
time_series=False,
feature_names=["feature"],
target_names=["target"]
)
#generate DataSet
ds = DataSet.from_dataframe(
df=df,
meta=meta
)
#saving dataset
ds.save("./data/example_dataset.joblib", drop_meta_json=True)
Split dataset
(train/test-split)
import numpy as np
n_instances = 100
# Create the 'product_id' feature with 3 different categorical values
product_ids = np.random.choice(['1001', '2002', '3003', '4004', '5005', '6006', '7007'], size=n_instances)
# Generate two additional numeric features
feature_1 = np.random.rand(n_instances) * 100 # Random numbers between 0 and 100
feature_2 = np.random.rand(n_instances) * 50 # Random numbers between 0 and 50
# Generate a numeric target
target = feature_1 * 0.5 + feature_2 * 0.3 + np.random.randn(n_instances) * 5 # Adding some noise
# Create a DataFrame
df = pd.DataFrame({
'product_id': product_ids,
'feature_1': feature_1,
'feature_2': feature_2,
'target': target
})
ds = DataSet.from_dataframe(
df=df,
meta = DataSetMeta(
name = "example_dataset",
synthetic_data=True,
time_series=False,
feature_names=["product_id", "feature_1", "feature_2"],
target_names=["target"]
)
)
train_ds, test_ds = ds.split_dataset(
first_ds_size=0.5,
shuffle=True,
group_by_features=["product_id"]
)
📄 Help
See Documentation for details.
🛠️ Package Installation
- full version:
pip install dwrappr - light version (excluding sklearn library):
pip install dwrappr[light]
(keep package updated with pip install dwrappr --upgrade)
🔧 Maintainer
This project is maintained by Nils
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file dwrappr-1.0.13.tar.gz.
File metadata
- Download URL: dwrappr-1.0.13.tar.gz
- Upload date:
- Size: 20.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
80027061b349d967367f379277bee68853c22ab86daa541d8da8565ec6e20f32
|
|
| MD5 |
3537566608a3fa00ee3451dd8417edf9
|
|
| BLAKE2b-256 |
75e7a8cdbe440cc052eebbf735dc6197b0bbca9cd5c351ef31e07d8c4b36f782
|
File details
Details for the file dwrappr-1.0.13-py3-none-any.whl.
File metadata
- Download URL: dwrappr-1.0.13-py3-none-any.whl
- Upload date:
- Size: 20.5 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.13
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f779ff7553e50584519aa951564850e1c4adcf27e2545d3c4399b74262596b40
|
|
| MD5 |
16ab7cb54818262c27257014b32ddef2
|
|
| BLAKE2b-256 |
e2695508029bc0d82ead1e7d114462a713fa78f6bc11cf17684cdf03d7a3f184
|