Official Python SDK for Helix Connect Data Marketplace
Project description
Helix Connect Python SDK
Official Python SDK for Helix Connect Data Marketplace - a secure, scalable platform for exchanging datasets between producers and consumers.
๐ Features
- Consumer API: Download and subscribe to datasets
- Producer API: Upload and manage datasets (includes all consumer features)
- Admin API: Platform management (includes all producer + consumer features)
- Secure: AWS SigV4 authentication + AES-256-GCM envelope encryption
- Efficient: Compress-then-encrypt pipeline with ~90% space savings
- Progress Tracking: Real-time upload/download progress callbacks
- Notifications: SQS-based dataset update notifications with long-polling
- Type-Safe: Full type hints with mypy support
๐ฆ Installation
pip install helix-connect
Development Installation
git clone https://github.com/helix-tools/helix-connect-sdk-python.git
cd helix-connect-sdk-python
pip install -e ".[dev]"
๐ง Prerequisites
- Python 3.8 or higher
- AWS credentials (provided during customer onboarding)
- Helix Connect customer ID (UUID format)
๐ Quick Start
Consumer: Download Datasets
from helix_connect import HelixConsumer
# Initialize consumer
consumer = HelixConsumer(
aws_access_key_id="your-access-key",
aws_secret_access_key="your-secret-key",
customer_id="your-customer-id",
api_endpoint="https://api.helix-connect.com" # optional
)
# List available datasets
datasets = consumer.list_datasets()
for ds in datasets:
print(f"{ds['name']}: {ds['description']}")
# Download a dataset
consumer.download_dataset(
dataset_id="a1b2c3d4-e5f6-7890-abcd-ef1234567890",
output_path="./data/my_dataset.csv"
)
# Subscribe to dataset updates
consumer.subscribe_to_dataset(dataset_id="...")
# Poll for notifications (long-polling with auto-download)
notifications = consumer.poll_notifications(
max_messages=10,
wait_time=20, # seconds
auto_download=True,
output_dir="./downloads"
)
Producer: Upload Datasets
from helix_connect import HelixProducer
# Initialize producer (inherits all consumer capabilities)
producer = HelixProducer(
aws_access_key_id="your-access-key",
aws_secret_access_key="your-secret-key",
customer_id="your-customer-id"
)
# Upload a dataset with progress tracking
def progress_callback(bytes_transferred, total_bytes):
percent = (bytes_transferred / total_bytes) * 100
print(f"Progress: {percent:.1f}%")
producer.upload_dataset(
file_path="./data/my_dataset.csv",
dataset_name="my-awesome-dataset",
description="Q4 2024 sales data",
data_freshness="daily",
progress_callback=progress_callback
)
# Update existing dataset
producer.update_dataset(
dataset_id="...",
file_path="./data/updated_dataset.csv"
)
# List your uploaded datasets
my_datasets = producer.list_my_datasets()
Admin: Platform Management
from helix_connect import HelixAdmin
# Initialize admin (inherits producer + consumer capabilities)
admin = HelixAdmin(
aws_access_key_id="admin-access-key",
aws_secret_access_key="admin-secret-key",
customer_id="admin-customer-id"
)
# Create new customer
customer = admin.create_customer(
customer_name="Acme Corp",
contact_email="data@acme.com"
)
# List all customers
customers = admin.list_customers()
# Get platform statistics
stats = admin.get_platform_stats()
print(f"Total datasets: {stats['total_datasets']}")
print(f"Total customers: {stats['total_customers']}")
Admin: JWT Token Generation
Generate schema-compliant JWT tokens for testing, development, or service-to-service communication:
from helix_connect import HelixAdmin
admin = HelixAdmin(
aws_access_key_id="admin-access-key",
aws_secret_access_key="admin-secret-key",
customer_id="admin-customer-id"
)
# Generate a user token
token = admin.generate_token(
sub="user@example.com",
customer_id="company-123",
email="user@example.com",
customer_type="consumer", # "producer", "consumer", or "both"
tier="starter",
)
# Generate an admin token (convenience method)
admin_token = admin.generate_admin_token(
sub="admin@helix.tools",
customer_id="company-admin",
email="admin@helix.tools",
customer_type="both",
)
# Token with custom expiry and all claims
token = admin.generate_token(
sub="user@example.com",
customer_id="company-123",
email="user@example.com",
customer_type="producer",
role="user",
tier="enterprise",
login_method="oauth",
expiry_minutes=120,
)
JWT Secret Resolution Order:
- Explicit
secretargument HELIX_JWT_SECRETenvironment variable- SSM Parameter Store (
/{env}/customers/{customer_id}/jwt_secret)
Token Claims:
- Required:
sub,customer_id,email,customer_type,role,iss,iat,exp - Optional:
tier,authenticated_at,login_method,nbf
๐๏ธ Architecture
Class Hierarchy
HelixConsumer (base class)
โ
HelixProducer (adds upload capabilities)
โ
HelixAdmin (adds platform management)
Each class inherits all capabilities from its parent, so:
- Producers can also consume data
- Admins can produce and consume data
Security & Encryption
The SDK implements a compress-then-encrypt pipeline with envelope encryption:
- Compression: Gzip compression (configurable levels 1-9)
- Envelope Encryption:
- Generates random 256-bit AES key
- Encrypts data with AES-256-GCM
- Encrypts AES key with AWS KMS
- Packages as:
[key_len][encrypted_key][iv][tag][encrypted_data]
This approach:
- โ Supports files of unlimited size (no KMS 4KB limit)
- โ Achieves ~90% space savings through compression
- โ Provides authenticated encryption with GCM
- โ Uses AWS KMS for secure key management
Network Configuration
- API Timeouts: 10s connect, 30s read (configurable)
- Download Timeouts: 10s connect, unlimited read (for large files)
- Credential Validation: Fail-fast with STS on initialization
๐ Examples
See the examples/ directory for comprehensive usage examples:
consumer_example.py- Download, subscribe, poll notificationsproducer_example.py- Upload, update, manage datasetsadmin_example.py- Platform management (internal use)
๐งช Testing
# Run all tests
pytest
# Run with coverage
pytest --cov=helix_connect --cov-report=html
# Run specific test suite
pytest tests/test_encryption_compression.py -v
# Run standalone pipeline test
python tests/test_pipeline_standalone.py
Test Results
The SDK includes comprehensive tests for the encryption/compression pipeline:
โ test_compress_data - 90.9% compression on JSON data
โ test_envelope_encryption_decryption - AES-256-GCM envelope format
โ test_full_pipeline_compress_then_encrypt - End-to-end verification
โ test_wrong_order_encrypt_then_compress - Proves old order was broken
โ 10 tests total, all passing
โ๏ธ Configuration
Environment Variables
# Required
export AWS_ACCESS_KEY_ID="your-access-key"
export AWS_SECRET_ACCESS_KEY="your-secret-key"
export HELIX_CUSTOMER_ID="your-customer-id"
# Optional
export HELIX_API_ENDPOINT="https://api-go.helix.tools"
export HELIX_COMPRESSION_LEVEL="6" # 1-9, default: 6
Programmatic Configuration
consumer = HelixConsumer(
aws_access_key_id="...",
aws_secret_access_key="...",
customer_id="...",
api_endpoint="https://api-go.helix.tools",
region="us-east-1",
compression_level=6 # 1=fastest, 9=best compression
)
๐ Security Best Practices
- Never commit credentials to version control
- Use environment variables or AWS Secrets Manager
- Rotate credentials regularly
- Use IAM roles when running on AWS infrastructure
- Validate data integrity after downloads
- Monitor CloudWatch logs for anomalies
๐ Error Handling
The SDK provides specific exceptions for different error scenarios:
from helix_connect.exceptions import (
AuthenticationError,
PermissionDeniedError,
DatasetNotFoundError,
RateLimitError,
UploadError,
DownloadError,
HelixError # Base exception
)
try:
consumer.download_dataset(dataset_id="...", output_path="...")
except AuthenticationError:
print("Invalid AWS credentials")
except PermissionDeniedError:
print("No access to this dataset - subscribe first")
except DatasetNotFoundError:
print("Dataset doesn't exist")
except RateLimitError as e:
print(f"Rate limit exceeded - retry after {e.retry_after}s")
except HelixError as e:
print(f"General error: {e}")
๐ Performance
Compression Benchmarks
Based on real-world testing with JSON data:
| Data Type | Original Size | Compressed | Savings |
|---|---|---|---|
| JSON (user data) | 92 KB | 8 KB | 90.9% |
| CSV (sales data) | 150 KB | 18 KB | 88.0% |
| XML (config) | 45 KB | 6 KB | 86.7% |
Note: Encrypting first (old broken code) resulted in ~0% compression!
Network Performance
- Chunked uploads: 8MB chunks for large files
- Parallel downloads: Multi-threaded for multiple datasets
- Progress callbacks: Real-time feedback without performance impact
- Connection pooling: Reuses HTTP connections for efficiency
๐ ๏ธ Development
Build & Validate
# Build package
python -m build
# Run build script (includes validation)
./scripts/build.sh
# Lint code
flake8 helix_connect/
black helix_connect/
mypy helix_connect/
Project Structure
helix-connect-sdk-python/
โโโ helix_connect/ # SDK source code
โ โโโ __init__.py # Package exports
โ โโโ consumer.py # Consumer API
โ โโโ producer.py # Producer API
โ โโโ admin.py # Admin API
โ โโโ exceptions.py # Custom exceptions
โโโ tests/ # Test suite
โ โโโ test_encryption_compression.py
โ โโโ test_pipeline_standalone.py
โโโ examples/ # Usage examples
โ โโโ consumer_example.py
โ โโโ producer_example.py
โ โโโ admin_example.py
โโโ scripts/ # Build scripts
โ โโโ build.sh
โโโ pyproject.toml # Package configuration
โโโ README.md # This file
๐ค Contributing
We welcome contributions! Please:
- Fork the repository
- Create a feature branch (
git checkout -b feature/amazing-feature) - Run tests (
pytest) - Commit changes (
git commit -m 'Add amazing feature') - Push to branch (
git push origin feature/amazing-feature) - Open a Pull Request
Code Standards
- Style: Follow PEP 8 (enforced by
black) - Types: Include type hints for all functions
- Tests: Maintain >80% coverage
- Docs: Update docstrings for public APIs
๐ License
This project is licensed under the MIT License - see the LICENSE file for details.
๐ Links
- Homepage: https://helix-connect.com
- Documentation: https://docs.helix-connect.com
- GitHub: https://github.com/helix-tools/helix-connect-sdk-python
- PyPI: https://pypi.org/project/helix-connect/
- Support: contact@helix.tools
๐ Changelog
v1.0.0 (2024-10-14)
โจ Features
- Initial release with Consumer, Producer, and Admin APIs
- AES-256-GCM envelope encryption for unlimited file sizes
- Compress-then-encrypt pipeline with ~90% space savings
- Real-time progress tracking for uploads/downloads
- SQS-based dataset update notifications
- Long-polling support with auto-download
- Comprehensive test suite (10 tests, all passing)
๐ง Improvements
- Network timeouts (API: 30s, Downloads: unlimited)
- Credential validation on initialization (fail-fast)
- Proper exception handling throughout
- Type hints for all public APIs
๐ Bug Fixes
- Fixed KMS 4KB limit with envelope encryption
- Fixed compress-then-encrypt order (was reversed)
- Removed all emojis (encoding issues)
- Fixed bare except clauses
๐ฌ Support
For questions, issues, or feature requests:
- GitHub Issues: https://github.com/helix-tools/helix-connect-sdk-python/issues
- Email: contact@helix.tools
- Documentation: https://docs.helix-connect.com
๐ Acknowledgments
Built with:
- boto3 - AWS SDK for Python
- cryptography - Cryptographic recipes and primitives
- requests - HTTP library
Made with โค๏ธ by the Helix Tools team
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file helix_connect-2.3.0.tar.gz.
File metadata
- Download URL: helix_connect-2.3.0.tar.gz
- Upload date:
- Size: 58.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b802d245cab2d09b232e630c862cfd7baf014c7f6b7dcc5f7f7a226b0a219730
|
|
| MD5 |
766820c2d47034f6e8cfcb0dced057fa
|
|
| BLAKE2b-256 |
619f524f5e05fe7159e2d1fe0bff5cc5c13138867cc5b0c3e9013a896588ba00
|
File details
Details for the file helix_connect-2.3.0-py3-none-any.whl.
File metadata
- Download URL: helix_connect-2.3.0-py3-none-any.whl
- Upload date:
- Size: 40.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.12
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
5dbd70cfa1e8c38e5088685d6cab273775893efb523956e866adc992da5b863f
|
|
| MD5 |
f3a48530b4a28952a5b2f74e724cbabb
|
|
| BLAKE2b-256 |
e562455ea75dd1dc56e71b79729964671dc2ce5fc5af91562b79df10b16f96f4
|