Skip to main content

Asynchronous parallel SSH library

Project description

Asynchronous parallel SSH client library.

Run SSH commands over many - hundreds/hundreds of thousands - number of servers asynchronously and with minimal system load on the client host.

License Latest Version https://travis-ci.org/ParallelSSH/parallel-ssh.svg?branch=master https://coveralls.io/repos/ParallelSSH/parallel-ssh/badge.png?branch=master Latest documentation

Installation

pip install parallel-ssh

Usage Example

See documentation on read the docs for more complete examples.

Run ls on two remote hosts in parallel with sudo.

from pprint import pprint
from pssh.pssh_client import ParallelSSHClient

hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)

output = client.run_command('ls -ltrh /tmp/', sudo=True)
pprint(output)
Output:
{'myhost1':
      host=myhost1
      cmd=<Greenlet>
      channel=<channel>
      stdout=<generator>
      stderr=<generator>
      stdin=<channel>
      exception=None
 'myhost2':
      <..>
}

Standard output buffers are available in output object. Iterating on them can be used to get output as it becomes available. Iteration ends only when command has finished, though it may be interrupted and resumed at any point.

Host output attributes are available in host output object, for example output['myhost1'].stdout.

for host in output:
   for line in output[host].stdout:
       pprint("Host %s - output: %s" % (host, line))
Output:
Host myhost1 - output: drwxr-xr-x  6 xxx xxx 4.0K Jan  1 00:00 xxx
Host myhost1 - output: <..>
Host myhost2 - output: drwxr-xr-x  6 xxx xxx 4.0K Jan  1 00:00 xxx
Host myhost2 - output: <..>

Exit codes become available once output is iterated on to completion or client.join(output) is called.

for host in output:
    print(output[host].exit_code)
Output:
0
0

The client’s join function can be used to block and wait for all parallel commands to finish:

client.join(output)

Similarly, output and exit codes are available after client.join is called:

output = client.run_command('exit 0')

# Block and gather exit codes. Output is updated in-place
client.join(output)
pprint(output.values()[0].exit_code)

# Output is available
for line in output.values()[0].stdout:
    pprint(line)
Output:
0
<..stdout..>

There is also a built in host logger that can be enabled to log output from remote hosts. The helper function pssh.utils.enable_host_logger will enable host logging to stdout, for example:

import pssh.utils
pssh.utils.enable_host_logger()
client.join(client.run_command('uname'), consume_output=True)
Output:
[localhost]       Linux

Design And Goals

ParallelSSH’s design goals and motivation are to provide a library for running asynchronous SSH commands in parallel with little to no load induced on the system by doing so with the intended usage being completely programmatic and non-interactive.

To meet these goals, API driven solutions are preferred first and foremost. This frees up the developer to drive the library via any method desired, be that environment variables, CI driven tasks, command line tools, existing OpenSSH or new configuration files, from within an application et al.

Scaling

Some guide lines on scaling ParallelSSH client and pool size numbers.

In general, long lived commands with little or no output gathering will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single process running them in these use cases.

Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the event loop. Multiple python processes, each with its own event loop, may be used to scale this use case further as CPU overhead allows.

Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered either over multiple still running commands, or while more commands are being triggered, is overhead increased.

Technical Details

To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the gevent library, a co-routine (greenlet) needs to yield the event loop to allow others to execute - co-operation. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes overhead with other co-routines that also want to use the event loop.

This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.

On the other end of the spectrum, long lived remote commands that generate no output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.

Output generation is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.

SFTP/SCP

SFTP is supported (SCP version 2) natively, no scp binary required.

For example to copy a local file to remote hosts in parallel:

from pssh import ParallelSSHClient, utils
from gevent import joinall

utils.enable_logger(utils.logger)
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
greenlets = client.copy_file('../test', 'test_dir/test')
joinall(greenlets, raise_error=True)
Output:
Copied local file ../test to remote destination myhost1:test_dir/test
Copied local file ../test to remote destination myhost2:test_dir/test

There is similar capability to copy remote files to local ones suffixed with the host’s name with the copy_remote_file function.

Directory recursion is supported in both cases via the recurse parameter - defaults to off.

See SFTP documentation for more examples.

Frequently asked questions

Q:

Why should I use this library and not, for example, fabric?

A:

In short, the tools are intended for different use cases.

ParallelSSH is a parallel SSH client library that scales well over hundreds to hundreds of thousands of hosts - per Design And Goals - a use case that is very common on cloud platforms and virtual machine automation. It would be best used where it is a good fit for the use case at hand.

Fabric and tools like it on the other hand are not well suited to such use cases, for many reasons, performance and differing design goals in particular. The similarity is only that these tools also make use of SSH to run commands.

ParallelSSH is in other words well suited to be the SSH client tools like Fabric and Ansible and others use to run their commands rather than a direct replacement for.

By focusing on providing a well defined, lightweight - actual code is a few hundred lines - library, ParallelSSH is far better suited for run this command on X number of hosts tasks for which frameworks like Fabric, Capistrano and others are overkill and unsuprisignly, as it is not what they are for, ill-suited to and do not perform particularly well with.

Fabric and tools like it are high level deployment frameworks - as opposed to general purpose libraries - for building deployment tasks to perform on hosts matching a role with task chaining, a DSL like syntax and are primarily intended for command line use for which the framework is a good fit for - very far removed from an SSH client library.

Fabric in particular is a port of Capistrano from Ruby to Python. Its design goals are to provide a faithful port of Capistrano with its tasks and roles framework to python with interactive command line being the intended usage.

Furthermore, Fabric’s use as a library is non-standard and in many cases just plain broken and currently stands at over 7,000 lines of code most of which is lacking code testing.

In addition, Fabric’s parallel command implementation uses a combination of both threads and processes with extremely high CPU usage and system load while running with as little as hosts in the single digits.

Q:

Is Windows supported?

A:

The library installs and works on Windows though not formally supported as unit tests are currently Posix system based.

Pip versions >= 8.0 are required for binary package installation of gevent on Windows, a dependency of ParallelSSH.

Though ParallelSSH is pure python code and will run on any platform that has a working Python interpreter, its gevent dependency and certain dependencies of paramiko contain native code which either needs a binary package to be provided for the platform or to be built from source. Binary packages for gevent are provided for OSX, Linux and Windows platforms as of this time of writing.

Q:

Is there a user’s group for feedback and discussion about ParallelSSH?

A:

There is a public ParallelSSH Google group setup for this purpose - both posting and viewing are open to the public.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

parallel-ssh-1.1.1.tar.gz (45.9 kB view details)

Uploaded Source

Built Distribution

parallel_ssh-1.1.1-py2.py3-none-any.whl (37.8 kB view details)

Uploaded Python 2 Python 3

File details

Details for the file parallel-ssh-1.1.1.tar.gz.

File metadata

File hashes

Hashes for parallel-ssh-1.1.1.tar.gz
Algorithm Hash digest
SHA256 f09c3015d76adba08ddcddf7a27a62c803a5670e2f0891e849740155229d81fa
MD5 c77285a903a14b7e16a6a2dc9d072f52
BLAKE2b-256 8d7628e1e3978cbc8083a5aeacb8470d726ea631d2d2a9d57eafa0c43c3504e4

See more details on using hashes here.

File details

Details for the file parallel_ssh-1.1.1-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for parallel_ssh-1.1.1-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 9f2e2ceb7f4652d74b327e494638359613d6fe94c0c443bec342d9b7ae0d3883
MD5 08642ba5f526ed39f5fe877a407f62f6
BLAKE2b-256 b6ea028fd9b4aa893ab8b49f0160006224d99f1f7dbdbcc86909f66b15acda49

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page