Skip to main content

A command line tool that makes it easy to run modular Trino environments locally.

Project description

Minitrino

A command line tool that makes it easy to run modular Trino environments locally. Compatible with Starburst versions 388-e and later.

PyPI version Build Status Trino Slack


Latest Stable Release: 2.0.4


Overview


Requirements

  • Docker Desktop >= 3.5
  • Python >= 3.8
  • Pip
  • Linux / MacOS

Installation and Upgrades

End Users

Minitrino is available on PyPI and the library is available for public download on GitHub. To set everything up, run:

  pip install minitrino
  minitrino -v lib_install

Using this installation method, the LIB_PATH variable will point to ~/.minitrino/lib/.

Developers

In the project's root directory, run ./install.sh to install the Minitrino CLI. If you encounter errors during installation, try running sudo -H ./install.sh -v.

Using this installation method, the LIB_PATH variable will point to ${MINITRINO_REPOSITORY_DIRECTORY}/lib/.

Using Colima and Other Docker Contexts

For users not operating on the default Docker Desktop context, you can set the DOCKER_HOST shell environment variable to point to the desired context's .sock file, e.g.:

echo 'export DOCKER_HOST="unix://${HOME}/.colima/default/docker.sock"' >> ~/.bash_profile

End User Upgrades

To upgrade the Minitrino CLI, run:

pip install minitrino --upgrade

Each CLI version has its own respective library. To install the updated library, run:

minitrino -v lib_install

Note: Installing the new library will overwrite all modules and snapshots in the current library. If you have customized modules or snapshot files in lib/snapshots/, make sure to take a backup of the ~/.minitrino/lib directory prior to running this command in order to persist your local changes.


Workflow Examples

Minitrino is best suited for local Trino development. This project is not intended for usage on a large scale, and is intentionally designed to limit the deployment to a single coordinator container.

Provision an Environment

Provision the postgres module with a specific SEP version:

minitrino -v -e STARBURST_VER=${VER} provision -m postgres

Provision the iceberg and oauth2 modules:

minitrino -v provision -m postgres -m oauth2

Append the running environment with a third module:

minitrino -v provision -m postgres -m oauth2 -m hive

All environments expose the Starburst service on localhost:8080, meaning you can visit the web UI directly, or you can connect external clients, such as DBeaver, to the localhost service. The trino container shell can be directly accessed via:

docker exec -it trino bash 

Modify Files in a Running Container

You can modify files inside a running container with this workflow:

docker exec -it trino bash 
echo "io.trino=DEBUG" >> /etc/starburst/log.properties
exit
docker restart trino

Shut Down an Environment

minitrino down

Remove Minitrino Resources

To remove all images, run:

minitrino remove --images

To remove images from a specific module, run:

minitrino remove --images \
  --label com.starburst.tests.module.${MODULE}=${MODULE_TYPE}-${MODULE}

Where ${MODULE_TYPE} is one of: admin, catalog, security.

You can also use the remove command to remove individual volumes with the --volumes option.

Snapshot a Customized Module

Users designing and customizing their own modules can persist them with the snapshot command. For example, if a user has modified the hive module, they can persist their changes by running:

minitrino snapshot --name ${SNAPSHOT_NAME} -m hive

By default, the snapshot file is placed in ${LIB_PATH}/${SNAPSHOT_NAME}.tar.gz. The snapshot can be saved to a different directory by passing the --directory option to the snapshot command.


Minitrino Configuration File

Sticky configuration is set in ~/.minitrino/minitrino.cfg. The sections in this file each serve a separate purpose.

[CLI] Section

These configs allow the user to customize the behavior of Minitrino.

  • LIB_PATH: The filesystem path of the Minitrino library (specifically to the lib/ directory).
  • TEXT_EDITOR: The text editor to use with the config command, e.g. "vi", "nano", etc. Defaults to the shell's default editor.

[MODULES] Section

This section has only one default config: SEP_LIC_PATH. This is required if using licensed Starburst Enterprise features. It can point to any valid license on your filesystem.

This section can also be used to set environment variables passed to containers provisioned by Minitrino. Environment variables are only passed to a container if the variable is specified in the module's docker-compose.yml file.

For example, if your minitrino.cfg config file contains this variable:

DB_PASSWORD=password123

And your docker-compose.yml file contains this:

services:
  trino:
    environment:
      DB_PASSWORD: ${DB_PASSWORD}

Then DB_PASSWORD is accessible inside of the Trino container as an environment variable. This functionality can be applied to any container as long as the convention above is followed.


Project Structure

The library is built around Docker Compose files and utilizes Docker's ability to extend Compose files.

The Starburst Trino service is defined in a Compose file at the library root, and all other services look up in the directory tree to reference the parent Trino service.

A simplified library structure:

lib
├── Dockerfile
├── docker-compose.yml
├── minitrino.env
├── modules
│   ├── admin
│      └── ...
│   ├── catalog
│      └── postgres
│          ├── metadata.json
│          ├── postgres.yml
│          ├── readme.md
│          └── resources
│              ├── postgres
│                 └── postgres.env
│              └── trino
│                  └── postgres.properties
│   ├── resources
│   └── security
│       └── ...
├── snapshots
└── version

Trino Dockerfile

Minitrino modifies Starburst's Docker image by adding the Trino CLI to the image as well as by adding sudo privileges to the trino user. This is required for certain bootstrap scripts (i.e. using microdnf to install packages in a Trino container for a module).


Add New Modules (Tutorial)

Adding new modules is relatively simple, but there are a few important guidelines to follow to ensure compatibility with the Minitrino CLI. Module design principals are the same all modules. The example below demonstrates the process of creating a new catalog module for a Postgres service.

Create the Module Directory

Create the module's directory in the lib/modules/catalog/ directory:

mkdir lib/modules/catalog/my-postgres/
cd lib/modules/catalog/my-postgres/

Add Trino Resources

All resources for a module go inside of a resources/ directory within the module. Inside this directory, place Trino-specific resources into a trino/ directory, then mount the resources to the Trino service defined in the root docker-compose.yml file.

mkdir -p resources/trino/

In the newly-created resources/trino/ directory, add a properties file.

bash -c "cat << EOF > postgres.properties
connector.name=postgresql
connection-url=jdbc:postgresql://postgres:5432/minitrino
connection-user=admin
connection-password=trinoRocks15
EOF"

Note: Passwords in the default modules tend to be trinoRocks15. For consistency throughout the library, it is recommended to use this as the password of choice for new module development.


Add the Docker Compose YAML

In lib/modules/catalog/my-postgres/, add a Docker Compose file:

touch my-postgres.yml

Notice the naming convention: my-postgres.yml. Giving the same root name of "my-postgres" to both the parent directory my-postgres/ and to the Docker Compose file my-postgres.yml will allow Minitrino to find the new catalog module.

Next, add an environment file for the Postgres service. Non-Trino resources should go into their own directory, so create one for postgres:

mkdir resources/postgres/

In the newly-created directory, add an environment file which will register the variables in the Postgres container when it is provisioned:

bash -c "cat << EOF > postgres.env
POSTGRES_USER=admin
POSTGRES_PASSWORD=trinoRocks15
POSTGRES_DB=minitrino
EOF"

This file will initialize Postgres with a database minitrino, a user trino, and a password trinoRocks15.

Add a Metadata File

The metadata.json file allows Minitrino to obtain key information about the module. It is required for a module to work with the CLI.

In lib/modules/catalog/my-postgres/, add a metadata.json file:

bash -c 'cat << EOF > metadata.json
{
  "description": "Creates a Postgres catalog using the standard Postgres connector.",
  "incompatibleModules": [],
  "dependentModules": [],
  "enterprise": false
}
EOF'
  • description: describes the module.
  • incompatibleModules: restricts certain modules from being provisioned alongside the given module. The * wildcard is a supported convention if the module is incompatible with all other modules.
  • dependentModules: specifies which modules must be provisioned alongside the target. Dependent modules will be automatically provisioned with the provision command.
  • enterprise: requires a Starburst license file (starburstdata.license).

The metadata file information can be exposed via the modules command.

Add a Readme File

This step is not required for personal development, but it is required to commit a module to the Minitrino repository.

In lib/modules/catalog/my-postgres/, add a readme.md file:

touch readme.md

This file should contain an overview of the module.

Review Progress

The resulting directory tree should look like this (from the lib/modules/catalog/ directory):

my-postgres
├── metadata.json
├── my-postgres.yml
├── readme.md
└── resources
    ├── postgres
       └── postgres.env
    └── trino
        └── postgres.properties

Configure the Docker Compose YAML File

We will now define the my-postgres.yml Docker Compose file. Set it up as follows:

version: '3.8'
services:

  trino:
    volumes:
      - ./modules/catalog/my-postgres/resources/trino/postgres.properties:/etc/starburst/catalog/postgres.properties

  postgres:
    image: postgres:${POSTGRES_VER}
    container_name: postgres
    env_file:
      - ./modules/catalog/my-postgres/resources/postgres/postgres.env
    labels:
      - com.starburst.tests=minitrino
      - com.starburst.tests.module.my-postgres=catalog-my-postgres

Important Implementation Details: Paths and Labels

We can observe a few things about the Compose file we just defined.

Path References for Volumes and Build Contexts

First, the volumes we mount are not relative to the Compose file itself, they are relative to the base docker-compose.yml file in the library root. This is because the CLI extends Compose files, meaning that all path references in child Compose files need to be relative to the positioning of the parent Compose file.

The base Compose file is determined when you execute a Docker Compose command––the first Compose file referenced in the command becomes the base file, and that happens to be the docker-compose.yml file in the library root. This is how Minitrino constructs these commands.

If this is confusing, you can read more about extending Compose files on the Docker docs.

Minitrino Docker Labels

Secondly, notice the applied sets of labels to the Postgres service. These labels tell the CLI which resources to target when executing commands.

In general, there is no need to apply labels to the Trino service since they are already applied in the parent Compose file unless the module is an extension of the Trino service itself (i.e. the biac module). Labels should always be applied to:

  • Docker services (AKA the resulting container)
  • Persistent volumes
  • Images built from a Dockerfile (see the main docker-compose.yml file for an example)

Labels should be defined in pairs of two. The convention is:

  • The standard Minitrino resource label: com.starburst.tests=minitrino
  • A module-specific resource label: com.starburst.tests.module.${MODULE_NAME}=${MODULE_CATEGORY}-${MODULE_NAME}
    • For this label, the module-type should be one of: admin, catalog, or security
    • This applies a unique label to the module, allowing it to be isolated when necessary

In Compose files where multiple services are defined, all services should be labeled with the same label sets (see the hive for an example).


Note: A named volume is defined explicitly in the Compose file, and these should always have label sets applied to them. Below is an example of the Compose file we just created with a named volume.


version: '3.8'
services:

  trino:
    volumes:
      - ./modules/catalog/my-postgres/resources/trino/postgres.properties:/etc/starburst/catalog/postgres.properties

  postgres:
    image: postgres:${POSTGRES_VER}
    container_name: postgres
    env_file:
      - ./modules/catalog/my-postgres/resources/postgres/postgres.env
    labels: # These labels are applied to the service/container
      - com.starburst.tests=minitrino
      - com.starburst.tests.module.my-postgres=catalog-my-postgres

volumes:
  postgres-data:
    labels: # These labels are applied to the named volume
      - com.starburst.tests=minitrino
      - com.starburst.tests.module.my-postgres=catalog-my-postgres

Note: Certain modules will only extend the parent Trino service and do not actually define any new services/containers. See the biac module for an example of this. For these modules, the only label requirement is to add the module-specific label to the Trino service in the relevant docker-compose.yml file.


Test the New Catalog

We are all finished up. We can test our new catalog through the Minitrino CLI:

minitrino provision -m my-postgres

We can now open a shell session in the trino container and run some tests:

docker exec -it trino bash 
trino-cli
trino> show catalogs;

Bootstrap Scripts

Minitrino supports container bootstrap scripts. These scripts do not replace the entrypoint (or default command) for a given container. The script is copied from the Minitrino library to the container, executed, and then removed from the container. Containers are restarted after each bootstrap script execution, so the bootstrap scripts themselves should not restart the container's service.

If a bootstrap script has already executed in a container and the volume associated with the container still exists, Minitrino will not re-execute the bootstrap script unless the contents of the script have changed. The is useful after running minitrino down --keep (which does not remove unnamed container volumes), so that the subsequent provision command will not re-execute the same bootstrap script(s).

If a bootstrap script is updated, it is recommended to destroy the associated container(s) via minitrino down and then to re-provision.

To add a bootstrap script, add a resources/bootstrap/ directory in any given module, create a shell script, and then reference the script name in the Compose YAML file:

version: '3.8'
services:

  trino:
    environment:
      MINITRINO_BOOTSTRAP: bootstrap.sh

The elasticsearch module is a good example of this.

Installing Shell Packages for Bootstrap Scripts

If you need to install a shell package for a bootstrap script, it is recommended that the package be added at the Dockerfile level instead of within the bootstrap script. This is to ensure compatibility between SEP Trino-based releases.

To add the necessary package, simply update shell dependencies in lib/dockerfile-resources/configure.sh.

Managing Trino's config.properties and jvm.config Files

Many modules may change the Trino config.properties and jvm.config files. There are two supported ways to modify these files with within the trino container.

Preferable Method: Environment Variables

Minitrino has special support for two Trino-specific environment variables: CONFIG_PROPERTIES and JVM_CONFIG. Below is an example of setting these variables:

trino:
  environment:
    CONFIG_PROPERTIES: |-
      insights.jdbc.url=jdbc:postgresql://postgresdb:5432/insights
      insights.jdbc.user=admin
      insights.jdbc.password=password
      insights.persistence-enabled=true
    JVM_CONFIG: |-
      -Xlog:gc:/var/log/sep-gc-%t.log:time:filecount=10

Secondary Method: Bootstrap Scripts

The config.properties and jvm.config files can also be modified directly with a Trino bootstrap script.


Troubleshooting

  • If you experience issues executing a Minitrino command, re-run it with the -v option for verbose output. This will often reveal the issue's root cause.
  • If you experience an issue with a particular Docker container, consider running these commands:
    • docker logs ${CONTAINER_NAME}: Print the logs for a given container to the terminal
    • docker ps: Show all running Docker containers and associated statistics
    • docker inspect ${CONTAINER_NAME} to see various details about a container
  • If you experience issues with a library module, check that that module is structured correctly according to the module tutorial, and ensure the library and the CLI versions match
  • Sometimes, a lingering persistent volume can cause problems (i.e. a stale Hive metastore database volume from a previous module deployment), so you can run:
    • minitrino down
    • minitrino -v remove --volumes to remove all existing Minitrino volumes. Alternatively, run minitrino -v remove --volumes --label <your label> to specify a specific module for which to remove volumes. See the removing resources section for more information.

If none of these troubleshooting tips help to resolve your issue, please file a GitHub issue and provide as much information as possible.


Reporting Bugs and Contributing

To report bugs, please file a GitHub issue on the Minitrino repository. Bug reports should:

  • Contain any relevant log messages (if the bug is tied to a command, running with the -v flag will make debugging easier)
  • Describe what the expected outcome is
  • Describe the proposed code fix (optional)

Contributors have two options:

  1. Fork the repository, then make a PR to merge your changes
  2. If you have been added as a contributor, you can go with the method above or you can create a feature branch, then submit a PR for that feature branch when it is ready to be merged.

In either case, please provide a comprehensive description of your changes with the PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

minitrino-2.0.4.tar.gz (41.4 kB view hashes)

Uploaded Source

Built Distribution

minitrino-2.0.4-py3-none-any.whl (40.2 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page