Skip to main content

SageMaker Studio Analytics Extension

Project description

SageMaker Studio Analytics Extension

This is a notebook extension provided by AWS SageMaker Studio Team to integrate with analytics resources. Currently, it supports connecting SageMaker Studio Notebook to Spark(EMR) cluster through SparkMagic library.

Usage

Before you can use the magic command to connect Studio notebook to EMR, please ensure the SageMaker Studio has the connectivity to Spark cluster(livy service). You can refer to this AWS blog for how to set up SageMaker Studio and EMR cluster.

Register the magic command:

%load_ext sagemaker_studio_analytics_extension.magics

Show help content:

%sm_analytics?

Docstring:
::

  %sm_analytics [--auth-type AUTH_TYPE] [--cluster-id CLUSTER_ID]
                    [--language LANGUAGE]
                    [command [command ...]]

positional arguments:
  command               Command to execute. The command consists of a service
                        name followed by a ' ' followed by an operation.
                        Supported services are {'emr'} and supported
                        operations are {'connect'}. For example a valid
                        command is 'emr connect'.

optional arguments:
  --auth-type AUTH_TYPE
                        The authentication type to be used. Supported
                        authentication types are {'Kerberos', 'None',
                        'Basic_Access'}.
  --cluster-id CLUSTER_ID
                        The cluster id to connect to.
  --language LANGUAGE   Language to use. The supported languages for IPython
                        kernel(s) are {'scala', 'python'}. This is a required
                        argument for IPython kernels, but not for magic
                        kernels such as PySpark or SparkScala.

Examples

  1. Connect Studio notebook using IPython Kernel to EMR cluster protected by Kerberos.
%sm_analytics emr connect --cluster-id j-1JIIZS02SEVCS --auth-type Kerberos --language python
  1. Connect Studio notebook using IPython Kernel to HTTP Basic Auth protected EMR cluster and create the Scala based session.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type Basic_Access  --language scala
  1. Connect Studio notebook using IPython Kernel to EMR cluster directly without Livy authentication.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type None  --language python
  1. Connect Studio notebook using PySpark or Spark(scala) Kernel to HTTP Basic Auth protected EMR cluster.
%sm_analytics emr connect --cluster-id j-1KHIOQZAQUF5P --auth-type Basic_Access

License

This library is licensed under the Apache 2.0 License. See the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

Built Distribution

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page