Skip to main content

A Python package for Altastata data processing and machine learning integration

Project description

Altastata Python Package v0.1.17

A powerful Python package for data processing and machine learning integration with Altastata.

Installation

pip install altastata

Features

  • Seamless integration with PyTorch and TensorFlow
  • Advanced data processing capabilities
  • Java integration through Py4J with optimized memory management
  • Support for large-scale data operations
  • Improved garbage collection and memory optimization
  • Enhanced error handling for cloud operations
  • Optimized file reading with direct attribute access
  • Comprehensive AWS IAM permission management
  • Confidential Computing Support: Deploy on Google Cloud Platform with AMD SEV security
  • Robust file operation status tracking

Quick Start

from altastata import AltaStataFunctions, AltaStataPyTorchDataset, AltaStataTensorFlowDataset
from altastata.altastata_tensorflow_dataset import register_altastata_functions_for_tensorflow
from altastata.altastata_pytorch_dataset import register_altastata_functions_for_pytorch

# Configuration parameters
user_properties = """#My Properties
#Sun Jan 05 12:10:23 EST 2025
AWSSecretKey=*****
AWSAccessKeyId=*****
myuser=bob123
accounttype=amazon-s3-secure
................................................................
region=us-east-1"""

private_key = """-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3,F26EBECE6DDAEC52

poe21ejZGZQ0GOe+EJjDdJpNvJcq/Yig9aYXY2rCGyxXLGVFeYJFg7z6gMCjIpSd
................................................................
wV5BUmp5CEmbeB4r/+BlFttRZBLBXT1sq80YyQIVLumq0Livao9mOg==
-----END RSA PRIVATE KEY-----"""

# Create an instance of AltaStataFunctions
altastata_functions = AltaStataFunctions.from_credentials(user_properties, private_key)
altastata_functions.set_password("my_password")

# Register the altastata functions for PyTorch or TensorFlow as a custom dataset
register_altastata_functions_for_pytorch(altastata_functions, "bob123_rsa")
register_altastata_functions_for_tensorflow(altastata_functions, "bob123_rsa")

# For PyTorch application use
torch_dataset = AltaStataPyTorchDataset(
    "bob123_rsa",
    root_dir=root_dir,
    file_pattern=pattern,
    transform=transform
)

# For TensorFlow application use
tensorflow_dataset = AltaStataTensorFlowDataset(
    "bob123_rsa",  # Using AltaStata account for testing
    root_dir=root_dir,
    file_pattern=pattern,
    preprocess_fn=preprocess_fn
)

Version Information

Current Version: 0.1.17

This version includes:

  • Rebuilt altastata-hadoop-all.jar with latest improvements
  • Enhanced error handling in delete_files operations
  • Simplified _read_file method for better performance
  • Updated AWS account configurations
  • Improved memory management and garbage collection
  • Comprehensive status tracking for cloud operations

Docker Support

The package is available as a multi-architecture Docker image that works natively on both AMD64 and ARM64 platforms:

# Pull multi-architecture image (automatically selects correct architecture)
docker pull ghcr.io/sergevil/altastata/jupyter-datascience:latest

# Or use docker-compose
docker-compose -f docker-compose-ghcr.yml up -d

Platform Support:

  • Apple Silicon Macs: Native ARM64 performance
  • Intel Macs: Native AMD64 performance
  • GCP Confidential GKE: Native AMD64 performance
  • Other platforms: Automatic architecture selection

Confidential Computing Deployment

Deploy Altastata in a secure, confidential computing environment on Google Cloud Platform:

# Navigate to confidential GKE setup
cd confidential-gke

# Deploy confidential cluster with AMD SEV security
./setup-cluster.sh

# Access Jupyter Lab at the provided URL
# Stop cluster when not in use (saves costs)
gcloud container clusters delete altastata-confidential-cluster --zone=us-central1-a

Features:

  • Hardware-level security with AMD SEV encryption
  • Memory encryption during data processing
  • Multi-cloud storage support (GCP, AWS, Azure)
  • Cost optimization with easy stop/start commands
  • Multi-architecture support for both AMD64 and ARM64 platforms

See confidential-gke/README.md for detailed setup instructions.

Recent Improvements

  • Multi-Architecture Support: Docker images now work natively on both AMD64 and ARM64 platforms
  • Error Handling: Enhanced delete_files method with detailed error reporting
  • Performance: Optimized file reading operations
  • Compatibility: Updated AWS IAM configurations for better permission management
  • Documentation: Consistent version numbering across all components

This project is licensed under the MIT License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

altastata-0.1.17.tar.gz (121.2 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

altastata-0.1.17-py3-none-any.whl (121.2 MB view details)

Uploaded Python 3

File details

Details for the file altastata-0.1.17.tar.gz.

File metadata

  • Download URL: altastata-0.1.17.tar.gz
  • Upload date:
  • Size: 121.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for altastata-0.1.17.tar.gz
Algorithm Hash digest
SHA256 8cdb3f82f111debf6de0822cb170fbbb1fb777c369de93d8db71e5403481ea39
MD5 2715e037c31f1d78a3df3aeda473b2c2
BLAKE2b-256 f22ad8fce1ceb76c076fd99c033652f0df652601eb24dc108d797736cac6aef9

See more details on using hashes here.

File details

Details for the file altastata-0.1.17-py3-none-any.whl.

File metadata

  • Download URL: altastata-0.1.17-py3-none-any.whl
  • Upload date:
  • Size: 121.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.8

File hashes

Hashes for altastata-0.1.17-py3-none-any.whl
Algorithm Hash digest
SHA256 7f681fd4d826847e80bac8c233f590f243f14596d60369281d74528f81355896
MD5 6d153400f9291ade3274374c42eddd40
BLAKE2b-256 7c76b902e4d1037123f25b3f11cfe1d28de895b0b16d2b3c1393b1c37ac029fa

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page