Asynchronous parallel SSH library
Project description
Asynchronous parallel SSH client library.
Run SSH commands over many - hundreds/hundreds of thousands - number of servers asynchronously and with minimal system load on the client host.
Native code based client with extremely high performance - based on libssh2 C library.
Installation
pip install parallel-ssh
Usage Example
See documentation on read the docs for more complete examples.
Run uname on two remote hosts in parallel.
from pssh.clients import ParallelSSHClient
hosts = ['localhost', 'localhost']
client = ParallelSSHClient(hosts)
output = client.run_command('uname', return_list=True)
for host_output in output:
for line in host_output.stdout:
print(line)
- Output:
Linux Linux
Native client
Starting from version 1.2.0, the default client in parallel-ssh has changed to the native client which offers much greater performance and reduced overhead.
The new default client is based on libssh2 via the ssh2-python extension library and supports non-blocking mode natively. Binary wheel packages with libssh2 included are provided for Linux, OSX and Windows platforms and all supported Python versions.
See this post for a performance comparison of the available clients.
The paramiko based client under pssh.clients.miko and the old pssh.pssh_client imports will be removed on the release of 2.0.0.
See documentation for a feature comparison of the two clients.
Native Code Client Features
Highest performance and least overhead of any Python SSH library
Thread safe - makes use of native threads for CPU bound calls like authentication
Natively non-blocking utilising libssh2 via ssh2-python - no monkey patching of the Python standard library
Significantly reduced overhead in CPU and memory usage
Exit codes
Once either standard output is iterated on to completion, or client.join(output, consume_output=True) is called, exit codes become available in host output.
Iteration ends only when remote command has completed, though it may be interrupted and resumed at any point.
HostOutput.exit_code is a dynamic property and will return None when exit code is not ready, meaning command has not finished, or channel is unavailable due to error.
Once all output has been gathered exit codes become available even without calling join.
output = client.run_command('uname', return_list=True)
for host_out in output:
for line in host_out.stdout:
print(line)
print(host_out.exit_code)
- Output:
Linux 0 Linux 0
Waiting for Completion
The client’s join function can be used to wait for all commands in output object to finish.
After join returns, commands have finished and all output can be read without blocking.
client.join(output)
for host_out in output:
for line in host_output.stdout:
print(line)
print(host_out.exit_code)
Similarly, exit codes are available after client.join(output, consume_output=True).
consume_output flag must be set to get exit codes when not reading from stdout. Future releases aim to remove the need for consume_output to be set.
output = client.run_command('uname')
# Wait for commands to complete and consume output so can get exit codes
client.join(output, consume_output=True)
for host_output in output:
print(host_out.exit_code)
- Output:
0 0
Build in Host Output Logger
There is also a built in host logger that can be enabled to log output from remote hosts for both stdout and stderr. The helper function pssh.utils.enable_host_logger will enable host logging to stdout.
To log output without having to iterate over output generators, the consume_output flag must be enabled - for example:
from pssh.utils import enable_host_logger
enable_host_logger()
output = client.run_command('uname')
client.join(output, consume_output=True)
- Output:
[localhost] Linux
SCP
SCP is supported - native clients only - and provides the best performance for file copying.
Unlike with the SFTP functionality, remote files that already exist are not overwritten and an exception is raised instead.
Note that enabling recursion with SCP requires server SFTP support for creating remote directories.
To copy a local file to remote hosts in parallel with SCP:
from pssh.clients import ParallelSSHClient
from gevent import joinall
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
cmds = client.scp_send('../test', 'test_dir/test')
joinall(cmds, raise_error=True)
See also documentation for SCP recv.
SFTP
SFTP is supported natively. In the case of the deprecated paramiko clients, several bugs exist with SFTP performance and behaviour - avoid if at all possible.
To copy a local file to remote hosts in parallel:
from pssh.clients import ParallelSSHClient
from pssh.utils import enable_logger, logger
from gevent import joinall
enable_logger(logger)
hosts = ['myhost1', 'myhost2']
client = ParallelSSHClient(hosts)
cmds = client.copy_file('../test', 'test_dir/test')
joinall(cmds, raise_error=True)
- Output:
Copied local file ../test to remote destination myhost1:test_dir/test Copied local file ../test to remote destination myhost2:test_dir/test
There is similar capability to copy remote files to local ones suffixed with the host’s name with the copy_remote_file function.
Directory recursion is supported in both cases via the recurse parameter - defaults to off.
See SFTP documentation for more examples.
Design And Goals
parallel-ssh’s design goals and motivation are to provide a library for running non-blocking asynchronous SSH commands in parallel with little to no load induced on the system by doing so with the intended usage being completely programmatic and non-interactive.
To meet these goals, API driven solutions are preferred first and foremost. This frees up developers to drive the library via any method desired, be that environment variables, CI driven tasks, command line tools, existing OpenSSH or new configuration files, from within an application et al.
Comparison With Alternatives
There are not many alternatives for SSH libraries in Python. Of the few that do exist, here is how they compare with parallel-ssh.
As always, it is best to use a tool that is suited to the task at hand. parallel-ssh is a library for programmatic and non-interactive use - see Design And Goals. If requirements do not match what it provides then it best not be used. Same applies for the tools described below.
Paramiko
The default SSH client library in parallel-ssh <=``1.6.x`` series.
Pure Python code, while having native extensions as dependencies, with poor performance and numerous bugs compared to both OpenSSH binaries and the libssh2 based native clients in parallel-ssh 1.2.x and above. Recent versions have regressed in performance and have blocker issues.
It does not support non-blocking mode, so to make it non-blocking monkey patching must be used which affects all other uses of the Python standard library. However, some functionality like Kerberos (GSS-API) authentication is not currently provided by other libraries.
asyncssh
Python 3 only asyncio framework using client library. License (EPL) is not compatible with GPL, BSD or other open source licenses and combined works cannot be distributed.
Therefore unsuitable for use in many projects, including parallel-ssh.
Fabric
Port of Capistrano from Ruby to Python. Intended for command line use and is heavily systems administration oriented rather than non-interactive library. Same maintainer as Paramiko.
Uses Paramiko and suffers from the same limitations. More over, uses threads for parallelisation, while not being thread safe, and exhibits very poor performance and extremely high CPU usage even for limited number of hosts - 1 to 10 - with scaling limited to one core.
Library API is non-standard, poorly documented and with numerous issues as API use is not intended.
Ansible
A configuration management and automation tool that makes use of SSH remote commands. Uses, in parts, both Paramiko and OpenSSH binaries.
Similarly to Fabric, uses threads for parallelisation and suffers from the poor scaling that this model offers.
See The State of Python SSH Libraries for what to expect from scaling SSH with threads, as compared to non-blocking I/O with parallel-ssh.
Again similar to Fabric, its intended and documented use is interactive via command line rather than library API based. It may, however, be an option if Ansible is already being used for automation purposes with existing playbooks, the number of hosts is small, and when the use case is interactive via command line.
parallel-ssh is, on the other hand, a suitable option for Ansible as an SSH client that would improve its parallel SSH performance significantly.
ssh2-python
Wrapper to libssh2 C library. Used by parallel-ssh as of 1.2.0 and is by same author.
Does not do parallelisation out of the box but can be made parallel via Python’s threading library relatively easily and as it is a wrapper to a native library that releases Python’s GIL, can scale to multiple cores.
parallel-ssh uses ssh2-python in its native non-blocking mode with event loop and co-operative sockets provided by gevent for an extremely high performance library without the side-effects of monkey patching - see benchmarks.
In addition, parallel-ssh uses native threads to offload CPU blocked tasks like authentication in order to scale to multiple cores while still remaining non-blocking for network I/O.
pssh.clients.native.SSHClient is a single host natively non-blocking client for users that do not need parallel capabilities but still want a non-blocking client with native code performance.
Out of all the available Python SSH libraries, libssh2 and ssh2-python have been shown, see benchmarks above, to perform the best with the least resource utilisation and ironically for a native code extension the least amount of dependencies. Only libssh2 C library and its dependencies which are included in binary wheels.
However, it lacks support for some SSH features present elsewhere like GSS-API and certificate authentication.
Scaling
Some guide lines on scaling parallel-ssh and pool size numbers.
In general, long lived commands with little or no output gathering will scale better. Pool sizes in the multiple thousands have been used successfully with little CPU overhead in the single thread running them in these use cases.
Conversely, many short lived commands with output gathering will not scale as well. In this use case, smaller pool sizes in the hundreds are likely to perform better with regards to CPU overhead in the event loop.
Multiple Python native threads, each of which can get its own event loop, may be used to scale this use case further as number of CPU cores allows. Note that parallel-ssh imports must be done within the target function of the newly started thread for it to receive its own event loop. gevent.get_hub() may be used to confirm that the worker thread event loop differs from the main thread.
Gathering is highlighted here as output generation does not affect scaling. Only when output is gathered either over multiple still running commands, or while more commands are being triggered, is overhead increased.
Technical Details
To understand why this is, consider that in co-operative multi tasking, which is being used in this project via the gevent library, a co-routine (greenlet) needs to yield the event loop to allow others to execute - co-operation. When one co-routine is constantly grabbing the event loop in order to gather output, or when co-routines are constantly trying to start new short-lived commands, it causes contention with other co-routines that also want to use the event loop.
This manifests itself as increased CPU usage in the process running the event loop and reduced performance with regards to scaling improvements from increasing pool size.
On the other end of the spectrum, long lived remote commands that generate no output only need the event loop at the start, when they are establishing connections, and at the end, when they are finished and need to gather exit codes, which results in practically zero CPU overhead at any time other than start or end of command execution.
Output generation is done remotely and has no effect on the event loop until output is gathered - output buffers are iterated on. Only at that point does the event loop need to be held.
User’s group
There is a public ParallelSSH Google group setup for this purpose - both posting and viewing are open to the public.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file parallel-ssh-2.0.0rc1.tar.gz
.
File metadata
- Download URL: parallel-ssh-2.0.0rc1.tar.gz
- Upload date:
- Size: 67.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 250cb5a4b70943aada4c3140f35b46b34b2f526cd363474125d99f74f4c2b67d |
|
MD5 | dc8c021ecc7ac563ba5550f51ee76bdd |
|
BLAKE2b-256 | def3abd25b41707d688d10c55ae318bf323c554dc3c69b9d99d7d81f394b65cc |
File details
Details for the file parallel_ssh-2.0.0rc1-py2.py3-none-any.whl
.
File metadata
- Download URL: parallel_ssh-2.0.0rc1-py2.py3-none-any.whl
- Upload date:
- Size: 81.8 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.50.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | b6b73f95d1a13c435be621ca4a65957c7d9270830a06186151dae774794182c4 |
|
MD5 | 2d8950d3c9787673de52e01c3e43d21e |
|
BLAKE2b-256 | 323d7b0760aad4d83c4282e1cff85b6e560209d0e78aa2e4d5583afac6ad639b |