SageMaker Studio Analytics Extension
Project description
SageMaker Studio Analytics Extension
This is a notebook extension provided by AWS SageMaker Studio Team to integrate with analytics resources. Currently, it supports connecting SageMaker Studio Notebook to Spark(EMR) cluster through SparkMagic library.
Usage
Before you can use the magic command to connect Studio notebook to EMR, please ensure the SageMaker Studio has the connectivity to Spark cluster(livy service). You can refer to this AWS blog for how to set up SageMaker Studio and EMR cluster.
Register the magic command:
%load_ext sagemaker_studio_analytics_extension.magics
Show help content:
Docstring:
::
%sm_analytics [--auth-type AUTH_TYPE] [--cluster-id CLUSTER_ID]
[--language LANGUAGE]
[--assumable-role-arn ASSUMABLE_ROLE_ARN]
[--emr-execution-role-arn EMR_EXECUTION_ROLE_ARN]
[--secret SECRET]
[--verify-certificate VERIFY_CERTIFICATE]
[command [command ...]]
positional arguments:
command Command to execute. The command consists of a service
name followed by a ' ' followed by an operation.
Supported services are ['emr'] and supported
operations are ['connect']. For example a valid
command is 'emr connect'.
optional arguments:
--auth-type AUTH_TYPE
The authentication type to be used. Supported
authentication types are {'Basic_Access', 'Kerberos',
'None'}.
--cluster-id CLUSTER_ID
The cluster id to connect to.
--language LANGUAGE Language to use. The supported languages for IPython
kernel(s) are {'python', 'scala'}. This is a required
argument for IPython kernels, but not for magic
kernels such as PySpark or SparkScala.
--assumable-role-arn ASSUMABLE_ROLE_ARN
The IAM role to assume when connecting to a cluster in
a different AWS account. This argument is not required
when connecting to a cluster in the same AWS account.
--emr-execution-role-arn EMR_EXECUTION_ROLE_ARN
The IAM role passed to EMR to set up EMR job security
context. This argument is optional and used when IAM
Passthrough feature is enabled for EMR.
--secret SECRET The AWS Secrets Manager SecretID.
--verify-certificate VERIFY_CERTIFICATE
Determine if SSL certificate should be verified when
using HTTPS to connect to EMR. Supported values are
['True', 'False', 'PathToCert']. If a path-to-cert-
file is provided, the certificate verification will be
done with the certificate in the provided file
path.Note that the default
Examples
- Connect Studio notebook using IPython Kernel to EMR cluster protected by Kerberos.
%sm_analytics emr connect --cluster-id j-1JIIZS02SEVCS --auth-type Kerberos --language python
- Connect Studio notebook using IPython Kernel to HTTP Basic Auth protected EMR cluster and create the Scala based session.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type Basic_Access --language scala
- Connect Studio notebook using IPython Kernel to EMR cluster directly without Livy authentication.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type None --language python
- Connect Studio notebook using PySpark or Spark(scala) Kernel to HTTP Basic Auth protected EMR cluster.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type Basic_Access
License
This library is licensed under the Apache 2.0 License. See the LICENSE file.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file sagemaker_studio_analytics_extension-0.1.2.tar.gz
.
File metadata
- Download URL: sagemaker_studio_analytics_extension-0.1.2.tar.gz
- Upload date:
- Size: 55.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.12.4
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7bbc3b8f3d950f5396761ebb19de76a07a9a1b8999b0f652191e38a5de23b4e |
|
MD5 | 78ef2ef7879382a36bb36ea874c3c0d4 |
|
BLAKE2b-256 | fa96e939d9b220fc07b32456600fe6f48879279d57263de6845c6672d7ec5547 |