Skip to main content

A lightweight and extensible Python package for managing data, tailored for researchers working with structured data.

Project description

📦 dwrappr

pypi versions License: MIT

A lightweight and extensible Python package for managing data, tailored for researchers working with structured data. In addition to general data management features, the package introduces a data structure specifically optimized for ML research. This common format enables researchers to efficiently test new algorithms and methods, streamlining collaboration and ensuring consistency in data management across projects.

🧩 Features

  • 🗃️ Consistent dataset object structure for handling structured data in ML use cases
  • 🔄 Support for building a file-based internal dataset collaboration platform for researchers
  • 🧰 General utilities for managing data like saving and loading

🚀 Quickstart

For executing the quickstart examples and get an overview of dwrappr's functionalities, please have a look at IEEE_examples.

Additional functionalities are showcased in:

  • loading_dataset_from_file.py: Shows how to load a dataset from an existing dataset file
  • scanning_folder_for_datasets.py: Shows how to scann a folder vor available datasets
  • dataset_functionalities.py : Shows some of the main functionalities of the DataSet class.

👀 Functionality Ipnsights

Scan folder for dataset

DATASET_FOLDER = "./data/datasets/"
available_datasets = DataSet.get_available_datasets_in_folder(
    DATASET_FOLDER
)
available_datasets.T

Loading specific dataset

DATASET_FILEPATH = "./data/datasets/manufacturing_process_ds.joblib"
ds = DataSet.load(DATASET_FILEPATH)

Generating dataset from raw data

RAW_DATA_FILEPATH= "./data/raw_data.csv"
#load raw data into pandas.DataFrame
df = pd.read_csv(RAW_DATA_FILEPATH)
"""
<some manual dataset preprocessing steps
like dropping missing values and chaning dtypes>
"""
#define metaData
meta = DataSetMeta(
    name = "example_dataset",
    synthetic_data=True,
    time_series=False,
    feature_names=["feature"],
    target_names=["target"]
)
#generate DataSet
ds = DataSet.from_dataframe(
    df=df,
    meta=meta
)
#saving dataset
ds.save("./data/example_dataset.joblib", drop_meta_json=True)

Split dataset

(train/test-split)

import numpy as np
n_instances = 100
# Create the 'product_id' feature with 3 different categorical values
product_ids = np.random.choice(['1001', '2002', '3003', '4004', '5005', '6006', '7007'], size=n_instances)
# Generate two additional numeric features
feature_1 = np.random.rand(n_instances) * 100  # Random numbers between 0 and 100
feature_2 = np.random.rand(n_instances) * 50   # Random numbers between 0 and 50
# Generate a numeric target
target = feature_1 * 0.5 + feature_2 * 0.3 + np.random.randn(n_instances) * 5  # Adding some noise
# Create a DataFrame
df = pd.DataFrame({
    'product_id': product_ids,
    'feature_1': feature_1,
    'feature_2': feature_2,
    'target': target
})
ds = DataSet.from_dataframe(
    df=df,
    meta = DataSetMeta(
        name = "example_dataset",
        synthetic_data=True,
        time_series=False,
        feature_names=["product_id", "feature_1", "feature_2"],
        target_names=["target"]
    )
)
train_ds, test_ds = ds.split_dataset(
    first_ds_size=0.5,
    shuffle=True,
    group_by_features=["product_id"]
)

📄 Help

See Documentation for details.

🛠️ Package Installation

  • full version: pip install dwrappr
  • light version (excluding sklearn library): pip install dwrappr[light]

(keep package updated with pip install dwrappr --upgrade)

🔧 Maintainer

This project is maintained by Nils

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

dwrappr-1.0.11.tar.gz (20.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

dwrappr-1.0.11-py3-none-any.whl (20.5 kB view details)

Uploaded Python 3

File details

Details for the file dwrappr-1.0.11.tar.gz.

File metadata

  • Download URL: dwrappr-1.0.11.tar.gz
  • Upload date:
  • Size: 20.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dwrappr-1.0.11.tar.gz
Algorithm Hash digest
SHA256 f2eceed6a0aaf98257e8f29a9ee0020588def289c694e1c1920a22014b8a2300
MD5 7275d12c2261fc10a8604c619cda62c2
BLAKE2b-256 cff3f47c1dd8e4f00d8b2aad2c44a01ec575ca27879e2f6e0bdc4e76dc8c569c

See more details on using hashes here.

File details

Details for the file dwrappr-1.0.11-py3-none-any.whl.

File metadata

  • Download URL: dwrappr-1.0.11-py3-none-any.whl
  • Upload date:
  • Size: 20.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.13

File hashes

Hashes for dwrappr-1.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 0176112db0b4b12c5878cfc8482b760a563ef7ba1661c4277c66afe2c9bec571
MD5 da6a8989b763c36832265dfd145e1f6c
BLAKE2b-256 0bb9e6315b167077d7d77e3014566d855cb07349ecc706e3af9e3b595b4f88dd

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page