Skip to main content

No project description provided

Project description

python-grid5000 is a python package wrapping the Grid’5000 REST API. You can use it as a library in your python project or you can explore the Grid’5000 resources interactively using the embedded shell.

Warning

The code is currently being developed heavily. Jump to the contributing section if you want to be involved.

1 Thanks

The core code is borrowed from python-gitlab with small adaptations to conform with the Grid5000 API models (with an ’s’!)

2 Contributing

  • To contribute, you can drop me an email or open an issue for a bug report, or feature request.
  • There are many areas where this can be improved some of them are listed here:

3 Comparison with …

  • RESTfully: It consumes REST API following the HATEOAS principles. This allows the client to fully discover the resources and actions available. Most of the G5K API follow theses principles but, for instance the Storage API don’t. Thus RESTfully isn’t compatible with all the features offered by the Grid’5000 API. It’s a ruby library. Python-grid5000 borrows the friendly syntax for resource browsing, but in python.
  • Execo: Written in Python. The api module gathers a lot of utils functions leveraging the Grid’5000 API. Resources aren’t exposed in a syntax friendly manner, instead functions for some classical operations are exposed (mainly getters). It has a convenient way of caching the reference API. Python-grid5000 is a wrapper around the Grid’5000 that seeks 100% coverage. Python-grid5000 is resource oriented.
  • Raw requests: The reference for HTTP library in python. Good for prototyping but low-level. python-grid5000 encapsulates this library.

4 Installation and examples

  • Please refer to https://api.grid5000.fr/doc/4.0/reference/spec.html for the complete specification.
  • All the examples are exported in the examples subdirectory so you can easily test and adapt them.
  • The configuration is read from a configuration file located in the home directory (should be compatible with the restfully one). It can be created with the following:
  • When accessing the API from outside of Grid’5000 (e.g your local workstation), you need to specify the following configuration file:
echo '
username: MYLOGIN
password: MYPASSWORD
' > ~/.python-grid5000.yaml
  • When accessing the API from a Grid’5000 frontend, providing the username and password is optionnal. Nevertheless you’ll need to deal with SSL verification by specifying the path to the certificate to use:
echo '
verify_ssl: /etc/ssl/certs/ca-certificates.crt
' > ~/.python-grid5000.yaml
  • Using a virtualenv is recommended (python 3.5+ is required)
virtualenv -p python3 venv
source venv/bin/activate
pip install python-grid5000

4.1 Grid’5000 shell

If you call grid5000 on the command line you should land in a ipython shell. Before starting, the file $HOME/.python-grid5000.yaml will be loaded.

$) grid5000

Python 3.6.5 (default, Jun 17 2018, 21:32:15)
Type 'copyright', 'credits' or 'license' for more information
IPython 7.3.0 -- An enhanced Interactive Python. Type '?' for help.

In [1]: gk.sites.list()
Out[1]:
[<Site uid:grenoble>,
 <Site uid:lille>,
 <Site uid:luxembourg>,
 <Site uid:lyon>,
 <Site uid:nancy>,
 <Site uid:nantes>,
 <Site uid:rennes>,
 <Site uid:sophia>]

In [2]: # gk is your entry point

4.2 Reference API

4.2.1 Get node information

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

node_info = gk.sites["nancy"].clusters["grisou"].nodes["grisou-1"]
print("grisou-1 has {threads} threads and has {ram} bytes of RAM".format(
    threads=node_info.architecture["nb_threads"],
    ram=node_info.main_memory["ram_size"]))

4.2.2 Get Versions of resources

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

root_versions = gk.root.get().versions.list()
print(root_versions)

rennes = gk.sites["rennes"]
site_versions = rennes.versions.list()
print(site_versions)

cluster = rennes.clusters["paravance"]
cluster_versions = cluster.versions.list()
print(cluster_versions)

node_versions = cluster.nodes["paravance-1"]
print(node_versions)

4.3 Monitoring API

4.3.1 Get Statuses of resources

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

rennes = gk.sites["rennes"]
site_statuses = rennes.status.list()
print(site_statuses)

cluster = rennes.clusters["paravance"]
cluster_statuses = cluster.status.list()

4.4 Job API

4.4.1 Job filtering

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

# state=running will be placed in the query params
running_jobs = gk.sites["rennes"].jobs.list(state="running")
print(running_jobs)

# get a specific job by its uid
job = gk.sites["rennes"].jobs.get("424242")
print(job)
# or using the bracket notation
job = gk.sites["rennes"].jobs["424242"]
print(job)

4.4.2 Submit a job

import logging
import os
import time

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

# This is equivalent to gk.sites.get("rennes")
site = gk.sites["rennes"]

job = site.jobs.create({"name": "pyg5k",
                        "command": "sleep 3600"})

while job.state != "running":
    job.refresh()
    print("Waiting for the job [%s] to be running" % job.uid)
    time.sleep(10)

print(job)
print("Assigned nodes : %s" % job.assigned_nodes)

4.5 Deployment API

4.5.1 Deploy an environment

import logging
import os
import time

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

# This is equivalent to gk.sites.get("rennes")
site = gk.sites["rennes"]

job = site.jobs.create({"name": "pyg5k",
                        "command": "sleep 3600",
                        "types": ["deploy"]})

while job.state != "running":
    job.refresh()
    print("Waiting the job [%s] to be running" % job.uid)
    time.sleep(10)

print("Assigned nodes : %s" % job.assigned_nodes)

deployment = site.deployments.create({"nodes": job.assigned_nodes,
                                      "environment": "debian9-x64-min"})
# To get SSH access to your nodes you can pass your public key
#
# from pathlib import Path
#
# key_path = Path.home().joinpath(".ssh", "id_rsa.pub")
#
#
# deployment = site.deployments.create({"nodes": job.assigned_nodes,
#                                       "environment": "debian9-x64-min"
#                                       "key": key_path.read_text()})

while deployment.status != "terminated":
    deployment.refresh()
    print("Waiting for the deployment [%s] to be finished" % deployment.uid)
    time.sleep(10)

print(deployment.result)

4.6 Storage API

4.6.1 Get Storage accesses

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

print(gk.sites["rennes"].storage["msimonin"].access.list())

4.6.2 Set storage accesses (e.g for vms)

from netaddr import IPNetwork
import logging
import os
import time

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)
site = gk.sites["rennes"]

job = site.jobs.create({"name": "pyg5k",
                        "command": "sleep 3600",
                        "resources": "slash_22=1+nodes=1"})

while job.state != "running":
    job.refresh()
    print("Waiting the job [%s] to be running" % job.uid)
    time.sleep(5)

subnet = job.resources_by_type['subnets'][0]
ip_network = [str(ip) for ip in IPNetwork(subnet)]

# create acces for all ips in the subnet
access = site.storage["msimonin"].access.create({"ipv4": ip_network,
                                                  "termination": {"job": job.uid,
                                                                  "site": site.uid}})

4.7 Vlan API

4.7.1 Get vlan(s)

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

site = gk.sites["rennes"]

# Get all vlans
vlans = site.vlans.list()
print(vlans)

# Get on specific
vlan = site.vlans.get("4")
print(vlan)

vlan = site.vlans["4"]
print(vlan)

# Get vlan of some nodes
print(site.vlansnodes.submit({"nodes": ["paravance-1.rennes.grid5000.fr", "paravance-2.rennes.grid5000.fr"]}))


# Get nodes in vlan
print(site.vlans["4"].nodes.list())

4.7.2 Set nodes in vlan

  • Putting primary interface in a vlan

    import logging
    import os
    import time
    
    from grid5000 import Grid5000
    
    
    logging.basicConfig(level=logging.DEBUG)
    
    conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
    gk = Grid5000.from_yaml(conf_file)
    site = gk.sites["rennes"]
    
    job = site.jobs.create({"name": "pyg5k",
                            "command": "sleep 3600",
                            "resources": "{type='kavlan'}/vlan=1+nodes=1",
                            "types": ["deploy"]})
    
    while job.state != "running":
        job.refresh()
        print("Waiting the job [%s] to be runnning" % job.uid)
        time.sleep(5)
    
    deployment = site.deployments.create({"nodes": job.assigned_nodes,
                                          "environment": "debian9-x64-min",
                                          "vlan": job.resources_by_type["vlans"][0]})
    
    while deployment.status != "terminated":
        deployment.refresh()
        print("Waiting for the deployment [%s] to be finished" % deployment.uid)
        time.sleep(10)
    
    print(deployment.result)
    
  • Putting the secondary interface in a vlan

    import logging
    import os
    import time
    
    from grid5000 import Grid5000
    
    
    logging.basicConfig(level=logging.DEBUG)
    
    
    def _to_network_address(host, interface):
        """Translate a host to a network address
        e.g:
        paranoia-20.rennes.grid5000.fr -> paranoia-20-eth2.rennes.grid5000.fr
        """
        splitted = host.split('.')
        splitted[0] = splitted[0] + "-" + interface
    
        return ".".join(splitted)
    
    
    conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
    gk = Grid5000.from_yaml(conf_file)
    
    site = gk.sites["rennes"]
    
    job = site.jobs.create({"name": "pyg5k",
                            "command": "sleep 3600",
                            "resources": "{type='kavlan'}/vlan=1+{cluster='paranoia'}nodes=1",
                            "types": ["deploy"]
    })
    
    while job.state != "running":
        job.refresh()
        print("Waiting the job [%s] to be runnning" % job.uid)
        time.sleep(5)
    
    vlanid = job.resources_by_type["vlans"][0]
    
    # we hard code the interface but this can be discovered in the node info
    # TODO: write the code here to discover
    nodes = [_to_network_address(n, "eth2") for n in job.assigned_nodes]
    print(nodes)
    
    # set in vlan
    site.vlans[vlanid].submit({"nodes": nodes})
    

4.8 Metrics API

4.8.1 Get the timeseries corresponding to a job

Credits to lturpin.

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)


def get_job_consumption(job_id, gk, site):
    metrics = gk.sites[site].metrics
    job = gk.sites[site].jobs[job_id]
    # nodes as list : "cluster-number.site.grid5000.fr"
    nodes_dom = job.assigned_nodes
    # nodes as list : "cluster-number"
    nodes = map(lambda node_dom: node_dom.split('.')[0], nodes_dom)
    # nodes as string : "cluster-number,cluster-number,..."
    nodes_str = ','.join(nodes)

    start = job.started_at
    end = job.stopped_at
    kwargs = {
        "only": nodes_str,
        "resolution": 1,
        "from": start,
        "to": end
    }
    timeseries = metrics["power"].timeseries.list(**kwargs)
    return timeseries


conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

timeseries = get_job_consumption("1092446", gk, "lyon")
print(timeseries)

4.8.2 Get some timeseries (and plot them)

For this example you’ll need matplotlib, seaborn and pandas.

import logging
import os

from grid5000 import Grid5000

import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
import time

logging.basicConfig(level=logging.DEBUG)

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

metrics = gk.sites["lyon"].metrics
print("--- available metrics")
print(metrics.list())

print("---- power metric")
print(metrics["power"])

print("----- a timeserie")
now = time.time()
kwargs = {
    "only": "nova-1,nova-2,nova-3",
    "resolution": 1,
    "from": int(now - 600),
    "to": int(now)
}
timeseries = metrics["power"].timeseries.list(**kwargs)

# let's visualize this
df = pd.DataFrame()
for timeserie in timeseries:
    print(timeserie)
    timestamp = timeserie.timestamps
    value = timeserie.values
    measurement = timeserie.uid
    df = pd.concat([df, pd.DataFrame({
        "timestamp": timestamp,
        "value": value,
        "measurement": [measurement]*len(timestamp)
    })])

sns.relplot(data=df,
            x="timestamp",
            y="value",
            hue="measurement",
            kind="line")
plt.show()

4.9 More snippets

4.9.1 Site of a cluster

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

clusters = ["dahu", "parasilo", "chetemi"]

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

sites = gk.sites.list()
matches = []
for site in sites:
    candidates = site.clusters.list()
    matching = [c.uid for c in candidates if c.uid in clusters]
    if len(matching) == 1:
        matches.append((site, matching[0]))
        clusters.remove(matching[0])
print("We found the following matches %s" % matches)

4.9.2 Get all job with a given name on all the sites

import logging
import os

from grid5000 import Grid5000


logging.basicConfig(level=logging.DEBUG)

NAME = "pyg5k"

conf_file = os.path.join(os.environ.get("HOME"), ".python-grid5000.yaml")
gk = Grid5000.from_yaml(conf_file)

sites = gk.sites.list()
site = gk.sites["rennes"]
sites = [gk.sites["rennes"], gk.sites["nancy"], gk.sites["grenoble"]]

# creates some jobs
jobs = []
for site in sites:
    job = site.jobs.create({"name": "pyg5k",
                            "command": "sleep 3600"})
    jobs.append(job)

_jobs = []
for site in sites:
    _jobs.append((site.uid, site.jobs.list(name=NAME,
                                           state="waiting,launching,running")))

print("We found %s" % _jobs)

# deleting the jobs
for job in jobs:
    job.delete()

4.9.3 Caching API responses

The Grid’5000 reference API is static. In this situation to speed up the requests, one could leverage heavily on caching. Currently python-grid5000 doesn’t do caching out-of the box but defers that to the consuming application. There are many solutions to implement a cache. Amongst them LRU cache (https://docs.python.org/3/library/functools.html#functools.lru_cache) provides an in-memory caching facilities but doesn’t give you control on the cache. The ring library (https://ring-cache.readthedocs.io/en/stable/) is great as it implements different backends for your cache (esp. cross-processes cache) and give you control on the cached object. Enough talking:

import logging
import threading
import os

import diskcache
from grid5000 import Grid5000
import ring


_api_lock = threading.Lock()
# Keep track of the api client
_api_client = None

storage = diskcache.Cache('cachedir')

def get_api_client():
    """Gets the reference to the API cient (singleton)."""
    with _api_lock:
        global _api_client
        if not _api_client:
            conf_file = os.path.join(os.environ.get("HOME"),
                                    ".python-grid5000.yaml")
            _api_client = Grid5000.from_yaml(conf_file)

        return _api_client


@ring.disk(storage)
def get_sites_obj():
    """Get all the sites."""
    gk = get_api_client()
    return gk.sites.list()


@ring.disk(storage)
def get_all_clusters_obj():
    """Get all the clusters."""
    sites = get_sites_obj()
    clusters = []
    for site in sites:
        # should we cache the list aswell ?
        clusters.extend(site.clusters.list())
    return clusters


if __name__ == "__main__":
    logging.basicConfig(level=logging.DEBUG)
    clusters = get_all_clusters_obj()
    print(clusters)
    print("Known key in the cache")
    print(get_all_clusters_obj.get())
    print("Calling again the function is now faster")
    clusters = get_all_clusters_obj()
    print(clusters)

4.9.4 Using Grid’5000 client certificates

python-grid5000 can also be used as a trusted client with Grid’5000 internal certificate. In this mode users can pass the g5k_user argument to most calls to specify which user the API call should be made as. In cases where g5k_user is not specified API calls will be made as the anonymous user whose access is limited to the Grid’5000 reference API. In this mode python-grid5000 does not store any login information, so g5k_user must be provided explicitly provided on every call that requires one.

import logging

from grid5000 import Grid5000

logging.basicConfig(level=logging.DEBUG)

gk = Grid5000(
   uri="https://api-ext.grid5000.fr/stable/",
   sslcert="/path/to/ssl/certfile.cert",
   sslkey="/path/to/ssl/keyfile.key"
   )

gk.sites.list()

job = site.jobs.create({"name": "pyg5k",
                        "command": "sleep 3600"},
                        g5k_user = "auser1")


# Since the 'anonymous' user can not inspect jobs the following call will raise exception
# python-grid5000.exceptions.Grid5000AuthenticationError: 401 Unauthorized
job.refresh()

# Both following call work since any user can request info on any jobs.
job.refresh(g5k_user='auser1')
job.refresh(g5k_user='auser2')

# Some operations can only be performed by the jobs creator.
# The following call will raise exception
# pyg5k.exceptions.Grid5000DeleteError: 403 Unauthorized
job.delete(g5k_user='auser2')

# This call works as expected
job.delete(g5k_user='auser1')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for python-grid5000, version 0.3.0
Filename, size File type Python version Upload date Hashes
Filename, size python_grid5000-0.3.0-py3-none-any.whl (21.5 kB) File type Wheel Python version py3 Upload date Hashes View

Supported by

Pingdom Pingdom Monitoring Google Google Object Storage and Download Analytics Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page