A Python package for Altastata data processing and machine learning integration
Project description
Altastata Python Package v0.1.18
A powerful Python package for data processing and machine learning integration with Altastata.
Installation
pip install altastata
Features
- Seamless integration with PyTorch and TensorFlow
- fsspec filesystem interface for standard Python file operations
- Real-time Event Notifications: Listen for file share, delete, and create events
- Advanced data processing capabilities
- Java integration through Py4J with optimized memory management
- Support for large-scale data operations
- Improved garbage collection and memory optimization
- Enhanced error handling for cloud operations
- Optimized file reading with direct attribute access
- Comprehensive AWS IAM permission management
- Confidential Computing Support: Deploy on Google Cloud Platform with AMD SEV security
- Robust file operation status tracking
Quick Start
from altastata import AltaStataFunctions, AltaStataPyTorchDataset, AltaStataTensorFlowDataset
from altastata.altastata_tensorflow_dataset import register_altastata_functions_for_tensorflow
from altastata.altastata_pytorch_dataset import register_altastata_functions_for_pytorch
# Configuration parameters
user_properties = """#My Properties
#Sun Jan 05 12:10:23 EST 2025
AWSSecretKey=*****
AWSAccessKeyId=*****
myuser=bob123
accounttype=amazon-s3-secure
................................................................
region=us-east-1"""
private_key = """-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3,F26EBECE6DDAEC52
poe21ejZGZQ0GOe+EJjDdJpNvJcq/Yig9aYXY2rCGyxXLGVFeYJFg7z6gMCjIpSd
................................................................
wV5BUmp5CEmbeB4r/+BlFttRZBLBXT1sq80YyQIVLumq0Livao9mOg==
-----END RSA PRIVATE KEY-----"""
# Create an instance of AltaStataFunctions
altastata_functions = AltaStataFunctions.from_credentials(user_properties, private_key)
altastata_functions.set_password("my_password")
# Register the altastata functions for PyTorch or TensorFlow as a custom dataset
register_altastata_functions_for_pytorch(altastata_functions, "bob123_rsa")
register_altastata_functions_for_tensorflow(altastata_functions, "bob123_rsa")
# For PyTorch application use
torch_dataset = AltaStataPyTorchDataset(
"bob123_rsa",
root_dir=root_dir,
file_pattern=pattern,
transform=transform
)
# For TensorFlow application use
tensorflow_dataset = AltaStataTensorFlowDataset(
"bob123_rsa", # Using AltaStata account for testing
root_dir=root_dir,
file_pattern=pattern,
preprocess_fn=preprocess_fn
)
fsspec Integration
from altastata import AltaStataFunctions
from altastata.fsspec import create_filesystem
# Create AltaStata connection
altastata_functions = AltaStataFunctions.from_account_dir('/path/to/account')
altastata_functions.set_password("your_password")
# Create fsspec filesystem
fs = create_filesystem(altastata_functions, "my_account")
# Use standard file operations
files = fs.ls("Public/")
with fs.open("Public/Documents/file.txt", "r") as f:
content = f.read()
Event Listener
Get real-time notifications when file operations occur:
from altastata import AltaStataFunctions
# Event handler
def event_handler(event_name, data):
print(f"📢 Event: {event_name}, Data: {data}")
if event_name == "SHARE":
print("File was shared!")
elif event_name == "DELETE":
print("File was deleted!")
# Initialize with callback server
altastata = AltaStataFunctions.from_account_dir(
'/path/to/account',
enable_callback_server=True,
callback_server_port=25334
)
altastata.set_password("your_password")
# Register listener
listener = altastata.add_event_listener(event_handler)
# Events will now be delivered in real-time!
# See event-listener-example/ for complete demos
Perfect for:
- Audit logging and compliance
- Real-time sync and backup
- Security monitoring
- RAG vector store updates
- Workflow automation
See event-listener-example/ for complete documentation and working examples.
LangChain Integration
Use Altastata as a document source for LangChain applications:
from langchain.document_loaders import DirectoryLoader
from altastata.fsspec import create_filesystem
from altastata import AltaStataFunctions
# Create AltaStata connection
altastata_functions = AltaStataFunctions.from_account_dir('/path/to/account')
altastata_functions.set_password("your_password")
# Create fsspec filesystem
fs = create_filesystem(altastata_functions, "my_account")
# Use with LangChain document loaders
loader = DirectoryLoader("Public/Documents/", filesystem=fs)
documents = loader.load()
# Use with vector stores
from langchain.vectorstores import FAISS
from langchain.embeddings import OpenAIEmbeddings
vectorstore = FAISS.from_documents(documents, OpenAIEmbeddings())
Perfect for:
- RAG (Retrieval-Augmented Generation) applications
- Document processing pipelines
- Knowledge base construction
- Multi-modal AI applications
Version Information
Current Version: 0.1.18
This version includes:
- Event Listener Support: Real-time notifications for file operations (share, delete, create)
- fsspec Integration: Standard Python filesystem interface for seamless file operations
- LangChain Integration: Native support for LangChain document loaders and vector stores
- Rebuilt
altastata-hadoop-all.jarwith latest improvements - Enhanced error handling in
delete_filesoperations - Simplified
_read_filemethod for better performance - Updated AWS account configurations
- Improved memory management and garbage collection
- Comprehensive status tracking for cloud operations
Docker Support
The package is available as a multi-architecture Docker image that works natively on both AMD64 and ARM64 platforms:
# Pull multi-architecture image (automatically selects correct architecture)
docker pull ghcr.io/sergevil/altastata/jupyter-datascience:latest
# Or use docker-compose
docker-compose -f docker-compose-ghcr.yml up -d
Platform Support:
- Apple Silicon Macs: Native ARM64 performance
- Intel Macs: Native AMD64 performance
- GCP Confidential GKE: Native AMD64 performance
- Other platforms: Automatic architecture selection
Confidential Computing Deployment
Deploy Altastata in a secure, confidential computing environment on Google Cloud Platform:
# Navigate to confidential GKE setup
cd confidential-gke
# Deploy confidential cluster with AMD SEV security
./setup-cluster.sh
# Access Jupyter Lab at the provided URL
# Stop cluster when not in use (saves costs)
gcloud container clusters delete altastata-confidential-cluster --zone=us-central1-a
Features:
- Hardware-level security with AMD SEV encryption
- Memory encryption during data processing
- Multi-cloud storage support (GCP, AWS, Azure)
- Cost optimization with easy stop/start commands
- Multi-architecture support for both AMD64 and ARM64 platforms
See confidential-gke/README.md for detailed setup instructions.
Recent Improvements
- Event Listener System: Real-time notifications for file share, delete, and create events via Py4J callbacks
- fsspec Integration: Standard Python filesystem interface for seamless file operations with any Python library
- LangChain Support: Native integration with LangChain document loaders and vector stores for RAG applications
- Multi-Architecture Support: Docker images now work natively on both AMD64 and ARM64 platforms
- Error Handling: Enhanced
delete_filesmethod with detailed error reporting - Performance: Optimized file reading operations
- Compatibility: Updated AWS IAM configurations for better permission management
- Documentation: Consistent version numbering across all components
This project is licensed under the MIT License - see the LICENSE file for details.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file altastata-0.1.18.tar.gz.
File metadata
- Download URL: altastata-0.1.18.tar.gz
- Upload date:
- Size: 121.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b30edf8c4eb6bc6174203d12f25504375a5859de422de4c346b44d422f210b98
|
|
| MD5 |
07c5968dd28c396423ae1723bb15ddf1
|
|
| BLAKE2b-256 |
2053e61457b57cf9c22935de936023193781f747b9201d95e8859483e221b97a
|
File details
Details for the file altastata-0.1.18-py3-none-any.whl.
File metadata
- Download URL: altastata-0.1.18-py3-none-any.whl
- Upload date:
- Size: 121.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9f9cda5707b495d886d18030a3f6314d8858f0732b9383627466d3d65d355381
|
|
| MD5 |
153bc9e38f5797699e9f5e2b931a3ba1
|
|
| BLAKE2b-256 |
6025da60f78112a83ee6349fcac8b90f631ba1557f528be8af939d070ed647d1
|