Skip to main content

Python wrapper for YARN Applications

Project description

Build Status Coverage Status

The knit library provides a Python interface to Scala for interacting with the YARN resource manager.

View the documentation for knit.

Overview

knit allows you to use python in conjunction with YARN, the most common resource manager for Hadoop systems. It provides to following high-level entry-points:

  • CondaCreator, a way to create zipped conda environments, so that they can be uploaded to HDFS and extracted for use in YARN containers
  • YARNAPI, an interface to the YARN resource manager to get application/container statuses, logs, and to kill running jobs
  • Knit, a YARN application runner, which generates an instance of a scala-based YARN client, and launches an application on YARN, which in turn runs commands in YARN containers
  • DaskYARNCluster, launches a Dask distributed cluster on YARN, one worker process per container.

The intent is to use knit from a cluster edge-node, i.e., with YARN configuration and the CLI available locally.

Quickstart

Install from conda-forge

> conda install -c conda-forge knit

or with pip

> pip install knit

If installing from source, you must first build the java library (requires java and maven)

> python setup.py install mvn

To run an arbitrary command on the yarn cluster

import knit
k = knit.Knit()
k.start('env')  # wait some time
k.logs()

To start a dask cluster on YARN

import dask_yarn
cluster = dask_yarn.DaskYARNCluster()
cluster.start(nworkers=4, memory=1024, cpus=2)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for knit, version 0.2.4
Filename, size File type Python version Upload date Hashes
Filename, size knit-0.2.4-py2.py3-none-any.whl (23.2 MB) File type Wheel Python version py2.py3 Upload date Hashes View hashes
Filename, size knit-0.2.4.tar.gz (23.2 MB) File type Source Python version None Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page