Skip to main content

A collection of tools to standardize xpublish hosting

Project description

xpublish-host

A collection tools and standards for deploying xpublish instances.

Why?

With ~50 netCDF-based datasets to be published through xpublish, Axiom needed a standard way to configure each of these deployments. We could have created single repository and defined each individual xpublish deployment, we could have created individual repositories for each dataset, or we could have done something in the middle. We decided to abstract out the parts common to each deployment and put it here into xpublish-host. This prevents the re-implementation of things like authentication (tbd), logging, metrics, and allows data engineers to focus on the data and not the deployment.

Goals

  • Standardize the configuration of an xpublish deployment (plugins, ports, cache, dask clusters, datasets, etc.) using config files and environmental variables, not python code.
  • Standardize monitoring and metrics of an xpublish deployment,
  • Provide a pre-built Docker image to run an opinionated xpublish deployment.

Ideas

xpublish-host makes no assumptions about the datasets you want to publish through xpublish and only requires the path to an importable python function that returns the object you want to be passed in as an argument to xpublish.Rest. This will allow xpublish-host to support datasets in addition to xarray.Dataset in the future, such as Parquet files.

We maintain an xpublish-inventory repository that defines YAML configurations and python functions for each xpublish dataset we want to publish. Those YAML configurations and python functions are installed as library into the xpublish-host container on deployment. There are better ways to do this (auto-discovery) but you have to start somewhere.

Installation

Most users will not need to install xpublish_host directly as a library but instead will use the Docker image to deploy an xpublish instance. If you want to use the xpublish_host tools and config objects directly in python code, you can of course install it:

For conda users you can

conda install --channel conda-forge xpublish_host

or, if you are a pip user

pip install xpublish_host

Usage

Configruation

The configuration is managed using Pydantic BaseSettings and GoodConf for loading configuration from files.

The xpublish_host configuration can be set in a few ways

  • Environmental variables - prefixed with XPUB_, they map directly to the pydantic settings classes,
  • Environment files - Load environmental variables from a file. Uses XPUB_ENV_FILES to control the location of this file if it is defined. See the Pydantic docs for more information,
  • Configuration files (JSON and YAML) - GoodConf based configuration files. When using the xpublish_host.config.serve helper this file can be set by defining XPUB_CONFIG_FILE.
  • Python arguments (API only) - When using xpublish-host as a library you can use the args/kwargs of each configuration object to control your xpublish instance.

There are three Settings classes:

  • PluginConfig - configure xpublish plugins,
  • DatasetConfig - configure the datasets available to xpublish,
  • RestConfig - configure how the xpublish instance is run, including the PluginConfig and DatasetConfig.

The best way to get familiar with which configuration options are available (until the documentation catches up) is to look at the actually configuration classes in xpublish_host/config.py and the tests in tests/test_config.py.

A feature-full configuration is as follows, which includes the defaults for each field.

# These are passed into the `xpublish.Rest.serve` method to control how the
# server is run
publish_host: "0.0.0.0"
publish_port: 9000
log_level: debug

# Dask cluster configuration. Current uses a LocalCluster.
# The keyword arguments are passed directly into `dask.distributed.LocalCluster`
# Omitting cluster_config or setting to null will load the defaults.
# Settings cluster_config to an empty dict will avoid using a dask cluster.
cluster_config:
  processes: true
  n_workers: 8
  threads_per_worker: 1
  memory_limit: 4GiB
  host: "0.0.0.0"
  scheduler_port: 0  # random port
  dashboard_address: 0.0.0.0:0  # random port
  worker_dashboard_address: 0.0.0.0:0  # random port

# Should xpublish discover and load plugins?
plugins_load_defaults: true

# Define any additional plugins. This is where you can override
# default plugins. These will replace any auto-discovered plugins.
# The keys here (pc1) are not important and are not used internally
plugins_config:
  pc1:
    module: xpublish.plugins.included.zarr.ZarrPlugin
      kwargs:
        dataset_router_prefix: /zarr

# Keyword arguments to pass into `xpublish.Rest` as app_kws
# i.e. xpublish.Rest(..., app_kws=app_config)
app_config:
  docs_url: /api
  openapi_url: /api.json

# Keyword arguments to pass into `xpublish.Rest` as cache_kws
# i.e. xpublish.Rest(..., cache_kws=cache_config)
cache_config:
  available_bytes: 1e11

# Define all of the datasets to load into the xpublish instance.
# The keys here (dc1) are not important and are not used internally
datasets_config:
  dc1:
    # The ID is used as the "key" of the dataset in `xpublish.Rest`
    # i.e. xpublish.Rest({ [dataset.id]: [loader_function_return] })
    id: dataset_id
    title: Dataset Title
    description: Dataset Description
    # Path to an importable python function that returns the dataset you want
    # to pass into `xpublish.Rest`
    loader: [python module path]
    # Arguments passed into the `loader` function
    args:
      - [loader arg1]
      - [loader arg2]
    # Keyword arguments passed into the `loader` function. See the `examples`
    # directory for more details on how this can be used.
    kwargs:
      t_axis: 'time'
      y_axis: 'lat'
      x_axis: 'lon'
      open_kwargs:
        parallel: false

API

To deploy an xpublish instance while pulling settings from a yaml file and environmental variables you can use the serve function. This is what is used under the hood in the Docker image.

from xpublish_host.config import serve

serve('config.yaml')

os.environ['XPUB_ENV_FILES'] = '/home/user/.env'
serve()

os.environ['XPUB_CONFIG_FILE'] = 'config.yaml'
serve()

You can also use the RestConfig and DatasetConfig objects directly to serve datasets

RestConfig

from xpublish_host.config import RestConfig

dc = DatasetConfig(
    id='id',
    title='title',
    description='description',
    loader='[python function path]',
)

rc = RestConfig(datasets_config={'ds': dc})
rc.load('[config_file]')  # optionally load a configuration file
rest = rc.setup()  # This returns an `xpublish.Rest` instance
rest.serve(
    host='0.0.0.0',
    port=9000,
    log_level='debug',
)

DatsetConfig

from xpublish_host.config import DatasetConfig

dc = DatasetConfig(
    id='id',
    title='title',
    description='description',
    loader='[python function path]',
)

# Keyword arguments are passed into RestConfig and can include all of the
# top level configuration options.
dc.serve()

CLI

There is a CLI command you can use to run an xpublish server and optionally pass in the path to a configuration file:

# Pass in a config file
python xpublish_host/config.py -c xpublish_host/examples/example.yaml

# Use ENV variables
XPUB_CONFIG_FILE=xpublish_host/examples/example.yaml python xpublish_host/config.py

Either way, xpublish will be running on port 9000 with (2) datasets: simple and kwargs. You can access the instance at http://[host]:9000/datasets/.

Docker

The Docker image by default loads a configuration file from /xpd/config.yaml and an environmental variable file from /xpd/.env. You can change the location of those files by setting the env variables XPUB_CONFIG_FILE and XPUB_ENV_FILES respectively.

docker build -t xpublish-host .

# Using default config path
docker run --rm -p 9000:9000 -v "$(pwd)/xpublish_host/examples/example.yaml:/xpd/config.yaml" xpublish-host

# Using ENV variables
docker run --rm -p 9000:9000 -e "XPUB_CONFIG_FILE=/xpd/xpublish_host/examples/example.yaml" xpublish-host

Either way, xpublish will be running on port 9000 with (2) datasets: simple and kwargs. You can access the instance at http://[host]:9000/datasets/.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xpublish-host-1.0.0.tar.gz (9.1 kB view details)

Uploaded Source

Built Distribution

xpublish_host-1.0.0-py3-none-any.whl (9.4 kB view details)

Uploaded Python 3

File details

Details for the file xpublish-host-1.0.0.tar.gz.

File metadata

  • Download URL: xpublish-host-1.0.0.tar.gz
  • Upload date:
  • Size: 9.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.2

File hashes

Hashes for xpublish-host-1.0.0.tar.gz
Algorithm Hash digest
SHA256 9089f3fd19b58407133fb7bb9d3676abd1d5045d33ed89855c43028e494a3248
MD5 11603c09ba0eadf745eb077a7392fea1
BLAKE2b-256 35a1d4a8e5f8bbdaf56735dd8e89eca6196d3a6e3203650b16ba9d1c1362bc9b

See more details on using hashes here.

File details

Details for the file xpublish_host-1.0.0-py3-none-any.whl.

File metadata

File hashes

Hashes for xpublish_host-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 d899d6e422897a0b0bb226aaec94a217cc5c11c5f450a7bf47af406115632ab9
MD5 0154b6d8d2d00c119ac9c477db7bbdfa
BLAKE2b-256 d9d330c8f772538565d0b067cdde9b733f252ad035bd4443f9428b5c457309cc

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page