A python Package to run Spark code in different AWS Compute

These details have not been verified by PyPI

Project description

SageMakerStudioDataEngineeringSessions

SageMaker Unified Studio Data Engineering Sessions

This pacakge depends on SageMaker Unified Studio environment, if you are using SageMaker Unified Studio, see AWS Doc for guidance.

This package contains functionality to support SageMaker Unified Studio connecting to various AWS Compute including EMR/EMR Serverless/Glue/Redshift etc.

It is utilizing ipython magics and AWS DataZone Connections to achieve the following features.

Features

Connect to remote compute
Execute Spark code in remote compute in Python/Scala
Execute SQL queries in remote compute
Send local variables to remote compute

How to setup

If you are using SageMaker Unifed Studio, you can skip this part, SageMaker Unifed Studio already set up the package.

This package contains various Jupyter Magics to achieve its functionality.

To load these magics, make sure you have iPython config file generated. If not, you could run ipython profile create, then a file with path ~/.ipython/profile_default/ipython_config.py should be generated

Then you will need to add the following line in the end of that config file

c.InteractiveShellApp.extensions.extend(['sagemaker_studio_dataengineering_sessions.sagemaker_connection_magic'])

Once that is finished, you could restart the ipython kernel and run %help to see a list of supported magics

Interactive vs background session

This packages uses SM_INPUT_NOTEBOOK_NAME environment variable to determine if the execution is through interactive or background session. See sagemaker_studio_dataengineering_sessions/sagemaker_database_session_manager/redshift/redshift_session.py file for usage.

Examples

To connect to remote compute, a DataZone Connection is required, you could create it via CreateConnection API, Let's say there's an existing connection called project.spark.

Supported Connection Type:

IAM
SPARK
REDSHIFT
ATHENA

Connect to remote compute and Execute Spark Code in Python

The following example will connect to AWS Glue Interactive session and run the spark code in Glue.

%%pyspark project.spark

import sys
import boto3
from awsglue.utils import getResolvedOptions
from pyspark.context import SparkContext
from pyspark.sql import SparkSession
from pyspark.sql.functions import col

args = getResolvedOptions(sys.argv, ["redshift_url", "redshift_iam_role", "redshift_tempdir","redshift_jdbc_iam_url"])
print(f"{args}")

sc = SparkContext.getOrCreate()
spark = SparkSession(sc)

df = spark.read.csv(f"s3://sagemaker-example-files-prod-{boto3.session.Session().region_name}/datasets/tabular/dirty-titanic/", header=True)
df.show(5, truncate=False)
df.printSchema()

df.createOrReplaceTempView("df_sql_tempview")

Execute Spark Code in Scala

The following example will connect to AWS Glue Interactive session and run the spark code in Scala.

%%scalaspark project.spark
val dfScala = spark.sql("SELECT count(0) FROM df_sql_tempview")
dfScala.show()

Execute SQL query in remote compute

The following example will connect to AWS Glue Interactive session and run the spark code in Scala.

%%sql project.redshift
select current_user()

Some other helpful magics

%help - list available magics and related information

%send_to_remote - send local variable to remote compute

%%configure - configure spark application config in remote compute

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.3.21

May 19, 2026

1.3.20

May 5, 2026

1.3.19

Apr 22, 2026

1.3.18

Apr 1, 2026

1.3.17

Mar 30, 2026

This version

1.3.16

Mar 25, 2026

1.3.14

Mar 17, 2026

1.3.13

Feb 16, 2026

1.3.12

Jan 12, 2026

1.3.11

Dec 15, 2025

1.3.9

Nov 26, 2025

1.3.7

Nov 3, 2025

1.3.6

Nov 2, 2025

1.3.5

Oct 29, 2025

1.3.4

Oct 20, 2025

1.2.6

Oct 17, 2025

1.2.5

Oct 1, 2025

1.2.4

Aug 16, 2025

1.2.3

Aug 16, 2025

1.2.2

Aug 13, 2025

1.2.1

Jul 26, 2025

1.2.0

Jul 23, 2025

1.1.8

Oct 17, 2025

1.1.7

Sep 12, 2025

1.1.5

Jul 26, 2025

1.1.4

Jul 17, 2025

1.1.2

Jun 26, 2025

1.1.1

Jun 23, 2025

1.1.0

Jun 23, 2025

1.0.13

May 30, 2025

1.0.12

May 19, 2025

1.0.11

May 10, 2025

1.0.10

Apr 18, 2025

1.0.9

Apr 16, 2025

1.0.7

Apr 12, 2025

1.0.6

Mar 6, 2025

1.0.4

Feb 22, 2025

1.0.3

Feb 20, 2025

1.0.2

Feb 19, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagemaker_studio_dataengineering_sessions-1.3.16.tar.gz (343.3 kB view details)

Uploaded Mar 25, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sagemaker_studio_dataengineering_sessions-1.3.16-py3-none-any.whl (410.9 kB view details)

Uploaded Mar 25, 2026 Python 3

File details

Details for the file sagemaker_studio_dataengineering_sessions-1.3.16.tar.gz.

File metadata

Download URL: sagemaker_studio_dataengineering_sessions-1.3.16.tar.gz
Upload date: Mar 25, 2026
Size: 343.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sagemaker_studio_dataengineering_sessions-1.3.16.tar.gz
Algorithm	Hash digest
SHA256	`c5dc8ee5e7969a57ebcbfbbbf1739a55fc853f400ed5e19b8358f0daa0cec6cc`
MD5	`f8547a2016809405aaad0750e63a0a09`
BLAKE2b-256	`937df8d3afcf0a807075a6caf1ecfb3b232a468404aa89bf3c1a24321f628bf1`

See more details on using hashes here.

File details

Details for the file sagemaker_studio_dataengineering_sessions-1.3.16-py3-none-any.whl.

File metadata

Download URL: sagemaker_studio_dataengineering_sessions-1.3.16-py3-none-any.whl
Upload date: Mar 25, 2026
Size: 410.9 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for sagemaker_studio_dataengineering_sessions-1.3.16-py3-none-any.whl
Algorithm	Hash digest
SHA256	`68b4ba3df8350d524d38c7fd53614259476fbcb936858028e7a2d395bdbc4955`
MD5	`a34a05f2265658c78e5eb7c0fc84feb6`
BLAKE2b-256	`d1bfeb00d333307ccba89e534a0b5ac01edb4c12ebc05cd3efae9b400e2db113`

See more details on using hashes here.

sagemaker-studio-dataengineering-sessions 1.3.16

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

SageMakerStudioDataEngineeringSessions

Features

How to setup

Interactive vs background session

Examples

Supported Connection Type:

Connect to remote compute and Execute Spark Code in Python

Execute Spark Code in Scala

Execute SQL query in remote compute

Some other helpful magics

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes