Skip to main content

Python Command line tool to manage configuration of sparkmagic kernels on studio

Project description

SageMaker SparkMagic Library

Version Code style: black

This is a CLI tool for generating configuration of SparkMagic, Kerberos required to connect to EMR cluster. In particular, it generates following two files

  1. SparkMagic Config: This config file contains information needed to connect SparkMagic kernel's running on studio to Livy application running on EMR. CLI obtains EMR cluster details like Ip address etc.. by describing EMR cluster

  2. Krb5.conf: If EMR cluster uses kerberos security configuration, this library also generates krb5.conf needed for user authentication on studio

Usage

This CLI tool comes pre-installed on Studio SparkMagic Image. It can be used from any notebook created from that image.

Connecting to non-kerberos cluster:

In a notebook cell, execute following commands

%local

!sm-sparkmagic connect --cluster-id "j-xxxxxxxxx"

sample output:

Successfully read emr cluster(j-xxxxxxxx) details
SparkMagic config file is written to location /etc/sparkmagic/config.json
Completed setting up configuration files for SparkMagic to connect to EMR cluster j-xxxxxxxx


Please complete following steps to complete the connection
1. Restart kernel to complete your setup. This is required so SparkMagic can pickup generated configuration

Connecting to kerberos cluster:

It's very similar to non-kerberos cluster, except you can pass

!sm-sparkmagic connect --cluster-id "j-xxxxxxxx" --user-name "ec2-user"

sample output:

Please follow below steps to complete the setup:
1. Please open image terminal and run 'kinit ec2-user'(user_name: ec2-user) to get kerberos ticket
2. Restart kernel to complete your setup. This is required so SparkMagic can pickup generated configuration

Connecting to EMR cluster in another account

To setup configuration for EMR cluster in another account, run following command

%local

!sm-sparkmagic connect --cluster-id "j-xxxxx" --role-arn "arn:aws:iam::222222222222:role/role-on-emr-cluster-account"

FAQ

  • Can I connect to multiple clusters at same time?
    • You can only connect to one cluster at a time. Tool generates configuration needed to connect to one cluster. If you want to connect to different cluster, one has to re-execute the command providing different cell
  • Can I use this CLI on non-SparkMagic image on studio?
    • This cli only comes pre-installed on SparkMagic Image. One can install on any other image if needed
  • Can I use this library on SageMaker Notebook instances?
    • It does not come installed on Notebooks either, but you can install and try using it. You may have to relocate SparkMagic conf file

Installing

Install the CLI using pip.

pip install sagemaker-studio-sparkmagic-lib

Following extra permissions are required on the role to be able to describe cluster

{
    "Version": "2012-10-17",
    "Statement": [
        {
            "Effect": "Allow",
            "Action": [
                "elasticmapreduce:DescribeCluster",
                "elasticmapreduce:DescribeSecurityConfiguration",
                "elasticmapreduce:ListInstances"
            ],
            "Resource": "arn:aws:elasticmapreduce:*:*:cluster/*"
        }
    ]
}

Development

  • checkout the repository, and install locally
make install
  • To test locally, you can start python3 REPL and run following python code
import sagemaker_studio_sparkmagic_lib.sparkmagic as sm
sm.connect_to_emr_cluster(cluster_id= "j-xxx", user_name="ec2-user", krb_file_override_path="/tmp/krb5.conf",
     spark_magic_override_path="/tmp/config.json", restart_kernel=False)
  • To test on studio, create a tar ball and install on studio or your custom image accordingly
python setup.py sdist

Security

See CONTRIBUTING for more information.

License

This library is licensed under the MIT-0 License. See the LICENSE file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sagemaker_studio_sparkmagic_lib-0.1.3.tar.gz (13.0 kB view details)

Uploaded Source

Built Distribution

sagemaker_studio_sparkmagic_lib-0.1.3-py3-none-any.whl (15.5 kB view details)

Uploaded Python 3

File details

Details for the file sagemaker_studio_sparkmagic_lib-0.1.3.tar.gz.

File metadata

  • Download URL: sagemaker_studio_sparkmagic_lib-0.1.3.tar.gz
  • Upload date:
  • Size: 13.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.7.9

File hashes

Hashes for sagemaker_studio_sparkmagic_lib-0.1.3.tar.gz
Algorithm Hash digest
SHA256 66326bfda1bd63b158dd90bd236c8feb6edeedb1c61a71895f2ee84d84225dd2
MD5 20bfbcb3537495eae813bfbdcb666239
BLAKE2b-256 f30745984502c1417f19f8502f01b610044f9e2891eeb32182232a48d7f7ee1a

See more details on using hashes here.

File details

Details for the file sagemaker_studio_sparkmagic_lib-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: sagemaker_studio_sparkmagic_lib-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 15.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/3.10.1 pkginfo/1.7.1 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.61.2 CPython/3.7.9

File hashes

Hashes for sagemaker_studio_sparkmagic_lib-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 9a4ed07040ea8a6d758b544e9c41ae96a4045bf04a42bab92d7458c62b2141bf
MD5 092a1e7c85b3a795ad69898ccb93931f
BLAKE2b-256 9a39b32263ec419c0200b95e9bfd2a075c56912b5ab8b719bdd4723a4040fdac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page