A collection of tools to standardize xpublish hosting
Project description
xpublish-host
A collection tools and standards for deploying xpublish
instances.
Why?
With ~50 netCDF-based datasets to be published through xpublish
, Axiom needed a standard way to configure each of these deployments. We could have created single repository and defined each individual xpublish
deployment, we could have created individual repositories for each dataset, or we could have done something in the middle. We decided to abstract out the parts common to each deployment and put it here into xpublish-host
. This prevents the re-implementation of things like authentication (tbd), logging, metrics, and allows data engineers to focus on the data and not the deployment.
Goals
- Standardize the configuration of an
xpublish
deployment (plugins, ports, cache, dask clusters, datasets, etc.) using config files and environmental variables, not python code. - Standardize monitoring and metrics of an
xpublish
deployment, - Provide a pre-built Docker image to run an opinionated
xpublish
deployment.
Ideas
xpublish-host
makes no assumptions about the datasets you want to publish through xpublish
and only requires the path to an importable python function that returns the object you want to be passed in as an argument to xpublish.Rest
. This will allow xpublish-host
to support datasets in addition to xarray.Dataset
in the future, such as Parquet files.
We maintain an xpublish-inventory
repository that defines YAML configurations and python functions for each xpublish
dataset we want to publish. Those YAML configurations and python functions are installed as library into the xpublish-host
container on deployment. There are better ways to do this (auto-discovery) but you have to start somewhere.
Installation
Most users will not need to install xpublish_host
directly as a library but instead will use the Docker image to deploy an xpublish
instance. If you want to use the xpublish_host
tools and config objects directly in python code, you can of course install it:
For conda
users you can
conda install --channel conda-forge xpublish_host
or, if you are a pip
user
pip install xpublish_host
Usage
Configruation
The configuration is managed using Pydantic
BaseSettings and GoodConf for loading configuration from files.
The xpublish_host
configuration can be set in a few ways
- Environmental variables - prefixed with
XPUB_
, they map directly to thepydantic
settings classes, - Environment files - Load environmental variables from a file. Uses
XPUB_ENV_FILES
to control the location of this file if it is defined. See thePydantic
docs for more information, - Configuration files (JSON and YAML) -
GoodConf
based configuration files. When using thexpublish_host.config.serve
helper this file can be set by definingXPUB_CONFIG_FILE
. - Python arguments (API only) - When using
xpublish-host
as a library you can use the args/kwargs of each configuration object to control yourxpublish
instance.
There are three Settings classes:
PluginConfig
- configurexpublish
plugins,DatasetConfig
- configure the datasets available toxpublish
,RestConfig
- configure how thexpublish
instance is run, including thePluginConfig
andDatasetConfig
.
The best way to get familiar with which configuration options are available (until the documentation catches up) is to look at the actually configuration classes in xpublish_host/config.py
and the tests in tests/test_config.py
.
A feature-full configuration is as follows, which includes the defaults for each field.
# These are passed into the `xpublish.Rest.serve` method to control how the
# server is run
publish_host: "0.0.0.0"
publish_port: 9000
log_level: debug
# Dask cluster configuration. Current uses a LocalCluster.
# The keyword arguments are passed directly into `dask.distributed.LocalCluster`
# Omitting cluster_config or setting to null will load the defaults.
# Settings cluster_config to an empty dict will avoid using a dask cluster.
cluster_config:
processes: true
n_workers: 8
threads_per_worker: 1
memory_limit: 4GiB
host: "0.0.0.0"
scheduler_port: 0 # random port
dashboard_address: 0.0.0.0:0 # random port
worker_dashboard_address: 0.0.0.0:0 # random port
# Should xpublish discover and load plugins?
plugins_load_defaults: true
# Define any additional plugins. This is where you can override
# default plugins. These will replace any auto-discovered plugins.
# The keys here (pc1) are not important and are not used internally
plugins_config:
pc1:
module: xpublish.plugins.included.zarr.ZarrPlugin
kwargs:
dataset_router_prefix: /zarr
# Keyword arguments to pass into `xpublish.Rest` as app_kws
# i.e. xpublish.Rest(..., app_kws=app_config)
app_config:
docs_url: /api
openapi_url: /api.json
# Keyword arguments to pass into `xpublish.Rest` as cache_kws
# i.e. xpublish.Rest(..., cache_kws=cache_config)
cache_config:
available_bytes: 1e11
# Define all of the datasets to load into the xpublish instance.
# The keys here (dc1) are not important and are not used internally
datasets_config:
dc1:
# The ID is used as the "key" of the dataset in `xpublish.Rest`
# i.e. xpublish.Rest({ [dataset.id]: [loader_function_return] })
id: dataset_id
title: Dataset Title
description: Dataset Description
# Path to an importable python function that returns the dataset you want
# to pass into `xpublish.Rest`
loader: [python module path]
# Arguments passed into the `loader` function
args:
- [loader arg1]
- [loader arg2]
# Keyword arguments passed into the `loader` function. See the `examples`
# directory for more details on how this can be used.
kwargs:
t_axis: 'time'
y_axis: 'lat'
x_axis: 'lon'
open_kwargs:
parallel: false
API
To deploy an xpublish
instance while pulling settings from a yaml file and environmental variables you can use the serve
function. This is what is used under the hood in the Docker image.
from xpublish_host.config import serve
serve('config.yaml')
os.environ['XPUB_ENV_FILES'] = '/home/user/.env'
serve()
os.environ['XPUB_CONFIG_FILE'] = 'config.yaml'
serve()
You can also use the RestConfig
and DatasetConfig
objects directly to serve datasets
RestConfig
from xpublish_host.config import RestConfig
dc = DatasetConfig(
id='id',
title='title',
description='description',
loader='[python function path]',
)
rc = RestConfig(datasets_config={'ds': dc})
rc.load('[config_file]') # optionally load a configuration file
rest = rc.setup() # This returns an `xpublish.Rest` instance
rest.serve(
host='0.0.0.0',
port=9000,
log_level='debug',
)
DatsetConfig
from xpublish_host.config import DatasetConfig
dc = DatasetConfig(
id='id',
title='title',
description='description',
loader='[python function path]',
)
# Keyword arguments are passed into RestConfig and can include all of the
# top level configuration options.
dc.serve()
CLI
There is a CLI command you can use to run an xpublish
server and optionally pass in the path to a configuration file:
# Pass in a config file
python xpublish_host/config.py -c xpublish_host/examples/example.yaml
# Use ENV variables
XPUB_CONFIG_FILE=xpublish_host/examples/example.yaml python xpublish_host/config.py
Either way, xpublish
will be running on port 9000 with (2) datasets: simple
and kwargs
. You can access the instance at http://[host]:9000/datasets/
.
Docker
The Docker image by default loads a configuration file from /xpd/config.yaml
and an environmental variable file from /xpd/.env
. You can change the location of those files by setting the env variables XPUB_CONFIG_FILE
and XPUB_ENV_FILES
respectively.
docker build -t xpublish-host .
# Using default config path
docker run --rm -p 9000:9000 -v "$(pwd)/xpublish_host/examples/example.yaml:/xpd/config.yaml" xpublish-host
# Using ENV variables
docker run --rm -p 9000:9000 -e "XPUB_CONFIG_FILE=/xpd/xpublish_host/examples/example.yaml" xpublish-host
Either way, xpublish
will be running on port 9000 with (2) datasets: simple
and kwargs
. You can access the instance at http://[host]:9000/datasets/
.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file xpublish-host-1.0.0.tar.gz
.
File metadata
- Download URL: xpublish-host-1.0.0.tar.gz
- Upload date:
- Size: 9.1 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9089f3fd19b58407133fb7bb9d3676abd1d5045d33ed89855c43028e494a3248 |
|
MD5 | 11603c09ba0eadf745eb077a7392fea1 |
|
BLAKE2b-256 | 35a1d4a8e5f8bbdaf56735dd8e89eca6196d3a6e3203650b16ba9d1c1362bc9b |
File details
Details for the file xpublish_host-1.0.0-py3-none-any.whl
.
File metadata
- Download URL: xpublish_host-1.0.0-py3-none-any.whl
- Upload date:
- Size: 9.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d899d6e422897a0b0bb226aaec94a217cc5c11c5f450a7bf47af406115632ab9 |
|
MD5 | 0154b6d8d2d00c119ac9c477db7bbdfa |
|
BLAKE2b-256 | d9d330c8f772538565d0b067cdde9b733f252ad035bd4443f9428b5c457309cc |