Skip to main content

A high-performance Python library for blazing-fast data analysis

Project description

PyTurbo: High-Performance Data Analysis Library

PyTurbo is a high-performance Python library designed for accelerated data analysis, leveraging both CPU and GPU computing paradigms. It provides significant speedups over traditional pandas operations through vectorized operations and parallel processing.

🚀 Key Features

  • High Performance: Up to 120x speedup for complex calculations
  • GPU Acceleration: Seamless integration with RAPIDS cuDF for GPU-powered computing
  • Automatic Optimization: Smart fallback to CPU when GPU is unavailable
  • Pandas Compatible: Familiar pandas-like API for easy adoption
  • Memory Efficient: Optimized memory usage for large datasets

📊 Benchmark Results

Operation PyTurbo Pandas Speedup
Complex Scoring 0.16s 19.15s 120x
Rolling Ops 3.49s 36.15s 10x
Filtering 0.06s 0.11s 1.7x

🛠 Installation

Basic Installation (CPU Only)

pip install pyturbo

GPU-Accelerated Installation

For GPU support, you'll need NVIDIA CUDA toolkit and RAPIDS cuDF:

  1. Install CUDA Toolkit (11.x recommended):

    # Visit https://developer.nvidia.com/cuda-downloads
    
  2. Install RAPIDS cuDF:

    pip install cudf-cuda11x
    
  3. Install PyTurbo with GPU support:

    pip install pyturbo[gpu]
    

🎯 Quick Start

import pyturbo as pt
import pandas as pd

# Create a TurboFrame from pandas DataFrame
df = pd.read_csv('large_dataset.csv')
tf = pt.TurboFrame(df)

# Automatic GPU acceleration if available
tf = tf.gpu()  # Falls back to CPU if GPU unavailable

# Complex calculations up to 120x faster
scores = tf['value'].apply(complex_calculation)

# Optimized rolling operations (10x faster)
rolling_stats = tf['value'].rolling(window=1000).apply(lambda x: np.percentile(x, 75))

🔍 Example: Vehicle Analysis

import pyturbo as pt
import numpy as np

# Load data
df = pd.read_csv('vehicle_data.csv')
tf = pt.TurboFrame(df)

# Complex vehicle scoring (120x faster than pandas)
scores = pt.complex_vehicle_score_vectorized(tf)

# Efficient rolling calculations (10x faster)
rolling_scores = tf['score'].rolling(1000).apply(
    lambda x: np.percentile(x, 75), 
    engine='numpy'
)

# Group analysis with automatic optimization
stats = tf.groupby('category').agg({
    'speed': ['mean', 'std'],
    'score': ['mean', 'max']
})

🌟 Advanced Features

GPU Acceleration

# Check GPU availability
tf = pt.TurboFrame(df)
print(f"GPU Available: {tf.gpu_available}")

# Enable GPU processing
tf = tf.gpu()  # Automatic fallback to CPU if needed

Parallel Processing

# Automatic parallel processing for CPU operations
result = tf.parallel_apply(complex_function, num_workers=4)

Memory Optimization

# Efficient chunked processing for large datasets
chunks = tf.chunk_dataframe(num_chunks=4)
results = [chunk.process() for chunk in chunks]

📚 Documentation

For detailed documentation, visit PyTurbo Documentation

🤝 Contributing

We welcome contributions! Please see our Contributing Guide for details.

📝 License

PyTurbo is released under the MIT License. See LICENSE for details.

🙏 Acknowledgments

Special thanks to the RAPIDS team for their amazing GPU-accelerated data science tools.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyturbo_analytics-0.1.1.tar.gz (12.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pyturbo_analytics-0.1.1-py3-none-any.whl (11.9 kB view details)

Uploaded Python 3

File details

Details for the file pyturbo_analytics-0.1.1.tar.gz.

File metadata

  • Download URL: pyturbo_analytics-0.1.1.tar.gz
  • Upload date:
  • Size: 12.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for pyturbo_analytics-0.1.1.tar.gz
Algorithm Hash digest
SHA256 811ece9422b4709b96dba61957ced94c8eec0306314e9224df6135d420ce06ef
MD5 91ea71e676cbe0bf84d348cc4351495a
BLAKE2b-256 aa8545c7e1485194890e5f51fe6fc6ffa32441eb8b9a68500452eb3e6cd9ada4

See more details on using hashes here.

File details

Details for the file pyturbo_analytics-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for pyturbo_analytics-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 7b08c91a2156a915bfde4f880c2e70455ff1ec013c9582e4f6a76c496c6b7f54
MD5 0171b39906f8dfad8dd30cd960f87a4f
BLAKE2b-256 c893112815e86c80a3a507512dfdf8c44b4ece35ff563db0edcfaa495ec54197

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page