A reusable workflow for data engineering pipelines

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

DataVolt: Enterprise Data Pipeline Framework

DataVolt Logo

Coverage License Version Python PyPI

Introduction

DataVolt is an enterprise-grade framework for building and maintaining scalable data engineering pipelines. It provides a comprehensive suite of tools for data ingestion, transformation, and processing, enabling organizations to standardize their data operations and speed up development cycles.

Core Capabilities

DataVolt delivers three primary value propositions:

Pipeline Standardization: Unified interfaces for data ingestion, transformation, and export operations
Operational Efficiency: Automated workflow orchestration and preprocessing capabilities
Enterprise Integration: Native support for cloud storage, SQL databases, and machine learning frameworks

Technical Architecture

DataVolt/
â”œâ”€â”€ loaders/           # Data Ingestion Layer
â”‚   â”œâ”€â”€ __init__.py
â”‚   â”œâ”€â”€ csv_loader.py  # CSV Processing Engine
â”‚   â”œâ”€â”€ sql_loader.py  # SQL Database Connector
â”‚   â”œâ”€â”€ s3_loader.py   # Cloud Storage Interface
â”‚   â””â”€â”€ custom_loader.py # Extensibility Framework
â”œâ”€â”€ preprocess/        # Data Transformation Layer
â”‚   â”œâ”€â”€ __init__.py
â”‚   â”œâ”€â”€ cleaning.py    # Data Cleansing Engine
â”‚   â”œâ”€â”€ encoding.py    # Feature Encoding Module
â”‚   â”œâ”€â”€ scaling.py     # Normalization Framework
â”‚   â”œâ”€â”€ feature_engineering.py # Feature Generation Engine
â”‚   â””â”€â”€ pipeline.py    # Pipeline Orchestrator
â””â”€â”€ ext/               # Extension Layer
    â”œâ”€â”€ logger.py      # Logging Framework
    â””â”€â”€ custom_step.py # Custom Pipeline Interface

Installation

Install via pip:

pip install datavolt

For improved dependency management:

uv install datavolt

Implementation Guide

Data Ingestion

from datavolt.loaders.csv_loader import CSVLoader

# Initialize data ingestion pipeline
loader = CSVLoader(file_path="data.csv")
dataset = loader.load()

Data Transformation

from datavolt.preprocess.pipeline import PreprocessingPipeline
from datavolt.preprocess.scaling import StandardScaler
from datavolt.preprocess.encoding import OneHotEncoder

# Configure transformation pipeline
pipeline = PreprocessingPipeline([
    StandardScaler(),
    OneHotEncoder()
])

# Execute transformations
processed_dataset = pipeline.run(dataset)

Model Integration

from datavolt.model.trainer import ModelTrainer
from datavolt.model.evaluator import Evaluator
from datavolt.model.model_export import ModelExporter

# Initialize model training
trainer = ModelTrainer(
    model="random_forest",
    parameters={"n_estimators": 100}
)

# Train and evaluate
model = trainer.train(processed_dataset, labels)
metrics = Evaluator().evaluate(model, test_data, test_labels)

# Export for production
ModelExporter().save(model, "models/random_forest.pkl")

Enterprise Applications

DataVolt is designed for organizations requiring:

Standardized data preprocessing workflows
Scalable machine learning pipelines
Reproducible feature engineering processes
Integration with existing data infrastructure

Contributing

We welcome contributions from the community. Please follow these steps:

Fork the repository
Create a feature branch (git checkout -b feature/enhancement)
Commit changes (git commit -am 'Add enhancement')
Push to branch (git push origin feature/enhancement)
Open a Pull Request

License

DataVolt is distributed under the MIT License. See LICENSE for details.

Support

Documentation: DataVolt Docs
Issue Tracking: GitHub Issues
Professional Support: Contact allanw.mk@gmail.com

Performance Benchmark Report

Generated on: 2025-01-21 12:15:12 Number of runs per loader: 3

Loader: CSVLoader

Time Taken: 0.06-second Memory Used: 3.02 MB CPU Usage: 75.2% Throughput: 167,002 records/second Data Size: 10,000 records

Performance Metrics:

Memory efficiency: 3,307.49 records/MB
Processing speed: 0.01 ms/record

DataVolt: Empowering Data Engineering Excellence

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.0.1

Jan 24, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

datavolt-0.0.1.tar.gz (24.9 kB view details)

Uploaded Jan 24, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

DataVolt-0.0.1-py3-none-any.whl (29.4 kB view details)

Uploaded Jan 24, 2025 Python 3

File details

Details for the file datavolt-0.0.1.tar.gz.

File metadata

Download URL: datavolt-0.0.1.tar.gz
Upload date: Jan 24, 2025
Size: 24.9 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.11

File hashes

Hashes for datavolt-0.0.1.tar.gz
Algorithm	Hash digest
SHA256	`ddd4714b49b9d2176055aee53786cf599411179e53c103f6dbb100d917061c38`
MD5	`e1ec5b6e1b3c51d33361041e9b7c4b6d`
BLAKE2b-256	`b002f177f790609aa19c65ca257bccc7a1d381fb807db72cffc76fe4c2fd7119`

See more details on using hashes here.

File details

Details for the file DataVolt-0.0.1-py3-none-any.whl.

File metadata

Download URL: DataVolt-0.0.1-py3-none-any.whl
Upload date: Jan 24, 2025
Size: 29.4 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.0.1 CPython/3.10.11

File hashes

Hashes for DataVolt-0.0.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`13540f362cda9eb9d0aede6503eb453743c22a6153ccbdbed4acf176fdf43f0e`
MD5	`4344ac917780a752e103be0d10861c7a`
BLAKE2b-256	`e17e42db131afe3a4e2ebefef13a5195bb4dbc215c901c844b9234bfa7226743`

See more details on using hashes here.

DataVolt 0.0.1

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

DataVolt: Enterprise Data Pipeline Framework

Introduction

Core Capabilities

Technical Architecture

Installation

Implementation Guide

Data Ingestion

Data Transformation

Model Integration

Enterprise Applications

Contributing

License

Support

Loader: CSVLoader

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes