Skip to main content

A Python package for Altastata data processing and machine learning integration

Project description

Altastata Python Package

Secure, encrypted cloud storage for Python — with fsspec, PyTorch/TensorFlow, LangChain, Databricks, Snowflake, boto3/S3, gRPC, and a bundled Web UI (AltaStata Console).

pip install altastata

What you get

Capability How
Encrypted files in S3, Azure, IBM COS, etc. AltaStataFunctions
Standard Python file APIs fsspec (create_filesystem)
ML datasets AltaStataPyTorchDataset, AltaStataTensorFlowDataset
LangChain / RAG document loading fsspec + DirectoryLoader / TextLoader
Databricks / Apache Spark AltaStata Hadoop FS (altastata-hadoop JAR)
Snowflake external stages S3 Gateway (port 9876) or Snowpark Python + fsspec
S3 tools (boto3, aws CLI, s3fs) S3-compatible API on port 9876
gRPC API (Python transport="grpc", JS clients) altastata-services JVM on port 9877
Real-time share/delete events gRPC EventsService, Py4J callback, or Web UI
Web UI — Finder-style file manager in the browser http://127.0.0.1:9877 (same JVM as gRPC)

Configure your account

Two equivalent ways to connect from Python:

1. Account folder on disk (typical)

Each user keeps a directory under ~/.altastata/accounts/<display-name>/:

amazon.rsa.bob123/
  altastata-myorg-bob123.user.properties   # from your admin
  private.key                              # RSA (password-encrypted PEM)
  public.key
Account type Key files Password
RSA private.key, public.key Yes
PQC kyber_private.key, dilithium_private.key, … Yes
HPCS hpcs-privkey.blob, public.key, hpcs.marker No
HSM *user.properties only No
from altastata import AltaStataFunctions

f = AltaStataFunctions.from_account_dir(
    "/path/to/.altastata/accounts/amazon.rsa.bob123",
    transport="grpc",
    password="your_password",
)

2. Inline credentials (user_properties + private_key)

Pass the same text you would have in files — useful for notebooks, secrets managers, or CI:

from altastata import AltaStataFunctions

user_properties = """#My Properties
#Sun Jan 05 12:10:23 EST 2025
AWSSecretKey=*****
AWSAccessKeyId=*****
myuser=bob123
accounttype=amazon-s3-secure
acccontainer-prefix=altastata-myorg-
region=us-east-1
metadata-encryption=RSA"""

private_key = """-----BEGIN RSA PRIVATE KEY-----
Proc-Type: 4,ENCRYPTED
DEK-Info: DES-EDE3,F26EBECE6DDAEC52

... encrypted PEM body ...
-----END RSA PRIVATE KEY-----"""

altastata_functions = AltaStataFunctions.from_credentials(user_properties, private_key)
altastata_functions.set_password("my_password")

# Or with gRPC transport:
altastata_functions = AltaStataFunctions.from_credentials(
    user_properties,
    private_key,
    transport="grpc",
    password="my_password",
)

Your org admin creates *user.properties after you send them public.key (RSA/PQC/HPCS).


Quick start (gRPC — recommended)

transport="grpc" auto-starts the bundled Java gateway (Web UI + gRPC + S3).

from altastata import AltaStataFunctions

# RSA / PQC
f = AltaStataFunctions.from_account_dir(
    "/path/to/.altastata/accounts/amazon.rsa.bob123",
    transport="grpc",
    password="your_password",
)

# HPCS / HSM — empty password
f = AltaStataFunctions.from_account_dir(
    "/path/to/.altastata/accounts/amazon.rsa.hpcs.myuser",
    transport="grpc",
    password="",
)

print(f.list_cloud_versions("Public/", True))

Ports

One bundled Java process (altastata-grpc-server / altastata-services) listens on:

Port Service
9877 gRPC (file ops, auth, events) + Web UI static files
9876 S3-compatible REST API
25333 Py4J (legacy in-process bridge to Python)

HPCS in Docker / Jupyter

Mount a populated grep11client.yaml (e.g. /etc/ep11client/grep11client.yaml) and hpcs-privkey.blob. See containers/jupyter/README-Docker.md.


Legacy Py4J transport (default)

from altastata import AltaStataFunctions

f = AltaStataFunctions.from_account_dir("/path/to/account")
f.set_password("your_password")

fsspec

from altastata import AltaStataFunctions
from altastata.fsspec import create_filesystem

f = AltaStataFunctions.from_account_dir("/path/to/account", transport="grpc", password="secret")
fs = create_filesystem(f, "my_account")

with fs.open("Public/readme.txt", "r") as fh:
    print(fh.read())

Works with pandas, dask, and other fsspec consumers.


LangChain, Databricks, Snowflake

LangChain / RAG

Load encrypted documents without copying them to local disk:

from altastata import AltaStataFunctions
from altastata.fsspec import create_filesystem
from langchain_core.documents import Document

f = AltaStataFunctions.from_account_dir("/path/to/account", transport="grpc", password="secret")
fs = create_filesystem(f, "my_account")

with fs.open("Public/docs/policy.txt", "r") as fh:
    docs = [Document(page_content=fh.read(), metadata={"source": "Public/docs/policy.txt"})]

TextLoader, DirectoryLoader, and other LangChain loaders work via the altastata:// fsspec protocol once the filesystem is registered — see examples/fsspec-example/ and full RAG pipelines in examples/rag-example/.

Databricks / Apache Spark

Use the AltaStata Hadoop filesystem implementation so Spark jobs read encrypted paths on cluster storage (altastata://… or configured Hadoop URI). Deploy the altastata-hadoop shadow JAR on Databricks / Spark clusters.

Snowflake

  • External stage via S3: point Snowflake at the bundled S3 Gateway (http://host:9876) as an S3-compatible endpoint for encrypted objects in your backing bucket.
  • Snowpark Python: use fsspec / create_filesystem in Snowpark notebooks to read AltaStata paths with the same account credentials.

S3-compatible API (boto3, aws CLI, s3fs)

f = AltaStataFunctions.from_account_dir("/path/to/account", transport="grpc", password="secret")

s3 = f.boto3_s3()   # pip install boto3
s3.put_object(Bucket="altastata-bucket", Key="hello.txt", Body=b"hi")

f.install_aws_env()   # AWS_* for !aws s3 ls in Jupyter

PyTorch & TensorFlow

from altastata import AltaStataFunctions, AltaStataPyTorchDataset
from altastata.altastata_pytorch_dataset import register_altastata_functions_for_pytorch

f = AltaStataFunctions.from_account_dir("/path/to/account", transport="grpc", password="secret")
register_altastata_functions_for_pytorch(f, "my_account")
dataset = AltaStataPyTorchDataset("my_account", root_dir="Public/", file_pattern="*.jpg")

See examples/pytorch-example/ and examples/tensorflow-example/.


Event notifications

def on_event(name, data):
    print(name, data)

f = AltaStataFunctions.from_account_dir("/path/to/account", enable_callback_server=True)
f.set_password("secret")
f.add_event_listener(on_event)

With gRPC / Web UI, SHARE and DELETE events also appear in the browser and via EventsService.Watch.

See examples/event-listener-example/.


Docker Jupyter (optional)

cd containers/jupyter
docker compose -f docker-compose.yml -f docker-compose-ghcr.yml up -d

Images: ghcr.io/sergevil/altastata/jupyter-datascience-{arm64,amd64}:latest


Web UI (AltaStata Console)

The wheel ships a browser file manager. Start the gateway:

altastata-grpc-server
# same as: python -m altastata.grpc_server

Open http://127.0.0.1:9877 — Miller-column browser, upload/download, share, generate keys, and live refresh on SHARE/DELETE events.

Sign in: Settings → Choose account folderSign in

Account type Password in Settings
RSA / PQC Your account password
HPCS / HSM Leave blank

Set ALTASTATA_WEB_UI_DIR= (empty) to disable the UI and run gRPC-only.


More documentation

Questions?

Email contact@altastata.com.

License

MIT License — see LICENSE.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

altastata-0.1.45.tar.gz (137.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

altastata-0.1.45-py3-none-any.whl (137.8 MB view details)

Uploaded Python 3

File details

Details for the file altastata-0.1.45.tar.gz.

File metadata

  • Download URL: altastata-0.1.45.tar.gz
  • Upload date:
  • Size: 137.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for altastata-0.1.45.tar.gz
Algorithm Hash digest
SHA256 4829f3fcbb3de7c1933b7c82b8c7d2822c3c892a91162cecf548aa115b8af4b6
MD5 30b9cc8e3a5fe0dae5a572cee1e6c34e
BLAKE2b-256 4f6e2c1ef83073027c4c1814019bf3991436d3f844c0e5d51c19dbe0300142d7

See more details on using hashes here.

File details

Details for the file altastata-0.1.45-py3-none-any.whl.

File metadata

  • Download URL: altastata-0.1.45-py3-none-any.whl
  • Upload date:
  • Size: 137.8 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.13.2

File hashes

Hashes for altastata-0.1.45-py3-none-any.whl
Algorithm Hash digest
SHA256 9855334a5c1f221039dfced2eb653deaf48a381e2b06abeb5efd90bf2388d4a1
MD5 9ac5eaaf01bab0039d6540c08409136c
BLAKE2b-256 d428676849a98a6ae49dd51f5ee7d7b9454dd6ddf4fa30c66a0a7927acbcd4b4

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page