Skip to main content

Azure Data Lake management magics for Jupyter Notebook

Project description

Azure Data Service Notebook (Alpha)

Azure Data Service Notebook is a set of extentions for working with Azure Data Service (e.g. Azure Data Lake, HDIsight, CosmosDB, Azure SQL and Azure Data Warehouse etc.) using Jupyter Notebook.

WARNING: This SDK/CLI is currently in very early stage of development. It can and will change in backwards incompatible ways.

Latest Version: 0.0.1a0

Feature

Azure Data Service Notebook currently provides a set of Jupyter Magic Functions for users to access Azure Data Lake. Available magics are captured in the table below. Please click on the command name to see the syntax reference.

Command Function
%adl login Line magic* to log in to Azure Data Lake.
%adl listaccounts Line magic to list the Azure Data Lake Analytic accounts for current user.
%adl listjobs Line magic to list the Azure Data Lake jobs for a given account.
%%adl submitjob Cell magic* to submit a USQL job to Azure Data Lake cluster.
%adl viewjob Line magic to view detailed job info.
%adl liststoreaccounts Line magic to list the Azure Data Lake Store accounts.
%adl liststorefolders Line magic to list the folders under a given directory.
%adl liststorefiles Line magic to list the files under a given directory.
%adl sample Line magic to sample a given file, return results as Pandas DataFrame.
%adl logout Line magic to log out.

*Please check Magic Functions for detailed definiton of Line magic and Cell magics.

Installation

  • Download and Install python 3.6+
  • Install jupyter: pip install jupyter
  • Install adlmagic extention : pip install --no-cache-dir adlmagics

Examples

  • adlmagics_demo.ipynb, demo file of adlmgics functions for Azure Data Lake job control and data exploration.
  • usql_samples.ipynb, samples code of common U-SQL scenarios, e.g. query a TSV file, create a database, populate table, query table and create rowset in script.

Feedback

Reference

%adl login

Line magic to login to Azure Data Lake service.

%adl login --tenant <tenant>

Input Parameters

Name Type Example Description
tenant required string microsoft.onmicrosoft.com The value of this argument can either be an .onmicrosoft.com domain or the Azure object ID for the tenant.

%adl listaccounts

Line magic to enumerate the Azure Data Lake Analytic accounts for current user. The account list will be returned as Pandas DataFrame, you can call Pandas funtions directly afterward.

%adl listaccounts	--page_index
			--page_account_number

Input Parameters

Name Type Example Description
page_index required int 0 The result page number. This must be greater than 0. Default value is 0.
page_account_number required int 10 The number of results per page.

%adl listjobs

Line magic to enumerate the Azure Data Lake jobs for a given account. The job list will be returned as Pandas DataFrame, you can call Pandas funtions directly afterward.

%adl listjobs	--account <azure data lake analytic account> 
		--page_index
		--page_account_number

Input Parameters

Name Type Example Description
account required string telemetryadla The Azure Data Lake Analytics account to list the job from.
page_index required int 0 The result page number. This must be greater than 0. Default value is 0.
page_account_number required int 10 The number of results per page.

%%adl submitjob

Cell magic to submit a U-SQL job to Azure Data Lake cluster.

%%adl submitjob	--account <zure data lake analytic account>
		--name <job name>
		--parallelism
		--priority
		--runtime

Input Parameters

Name Type Example Description
account required string telemetryadla the Azure Data Lake Analytics account to execute job operations on.
name required string myscript the friendly name of the job to submit.
parallelism int 5 the degree of parallelism used for this job. This must be greater than 0, if set to less than 0 it will default to 1.
priority int 1000 the priority value for the current job. Lower numbers have a higher priority. By default, a job has a priority of 1000. This must be greater than 0.
runtime string default the runtime version of the Data Lake Analytics engine to use for the specific type of job being run.

%adl viewjob

Line magic to view detailed job info.

%adl view job	--account <azure data lake analytic account>
		--job_id <job GUID to be viewed>

Input Parameters

Name Type Example Description
account required string telemetryadla the Azure Data Lake Analytics account to execute job operations on.
job_id required GUID 36a62f78-1881-1935-8a6a-9e37b497582d job identifier. uniquely identifies the job across all jobs submitted to the service.

%adl liststoreacconts

Line magic to list the Azure Data Lake Store accounts.

%adl liststoreaccounts

%adl liststorefolders

Line magic to list the folders under a given directory.

%adl liststorefolders	--account <azure data lake store account>
			--folder_path 

Input Parameters

Name Type Example Description
account required string telemetryadls the name of the Data Lake Store account.
folder_path required string root/data the directory path under the Data Lake Store account.

%adl liststorefiles

Line magic to list the files under a given directory.

%adl liststorefiles	--account <azure data lake store account>
			--folder_path

Input Parameters

Name Type Example Description
account required string telemetryadls the name of the Data Lake Store account.
folder_path required string root/data the directory path under the Data Lake Store account.

%adl sample

Line magic to sample a given file, return results as Pandas DataFrame.

%adl sample	--account <azure data lake store account>
		--file_path 
		--file_type 
		--encoding 
		--row_number 

Input Parameters

Name Type Example Description
account required string telemetryadls the name of the Data Lake Store account.
file_path required string root/data/sample.tsv the file path to sample data from.
file_type string tsv the type of the file to sample from.
encoding string UTF-8 encoding type of the file.
row_number int 10 number of rows to read from the file.

%adl logout

Line magic to log out.

%adl logout

Contributing

This project welcomes contributions and suggestions. Most contributions require you to agree to a Contributor License Agreement (CLA) declaring that you have the right to, and actually do, grant us the rights to use your contribution. For details, visit https://cla.microsoft.com.

When you submit a pull request, a CLA-bot will automatically determine whether you need to provide a CLA and decorate the PR appropriately (e.g., label, comment). Simply follow the instructions provided by the bot. You will only need to do this once across all repos using our CLA.

This project has adopted the Microsoft Open Source Code of Conduct. For more information see the Code of Conduct FAQ or contact opencode@microsoft.com with any additional questions or comments.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

adlmagics-0.0.1a2.tar.gz (12.7 kB view details)

Uploaded Source

Built Distribution

adlmagics-0.0.1a2-py3-none-any.whl (23.5 kB view details)

Uploaded Python 3

File details

Details for the file adlmagics-0.0.1a2.tar.gz.

File metadata

  • Download URL: adlmagics-0.0.1a2.tar.gz
  • Upload date:
  • Size: 12.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for adlmagics-0.0.1a2.tar.gz
Algorithm Hash digest
SHA256 5ce794a7acda0c795411c9237609dd77378528dea75b63c85a98ebf155f1bdea
MD5 0823c620381681310548b9e6c0624793
BLAKE2b-256 7a2a80a22b988f103079bd047ffe2eb156a2356a85e94032bb847d89cd7244ef

See more details on using hashes here.

File details

Details for the file adlmagics-0.0.1a2-py3-none-any.whl.

File metadata

File hashes

Hashes for adlmagics-0.0.1a2-py3-none-any.whl
Algorithm Hash digest
SHA256 eb23dce01e37c42322ac1d8e0557df008c33eac5e5df4dac17956f9d7c7c8db7
MD5 12193ddda9e48613296d174765d6b769
BLAKE2b-256 586969f0e784f9e2e48a505ab974d032d784beb98cc6aeddb2c72c97c6845331

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page