Gridworks Data Infrastructure
Project description
Gridworks Data
This project contains code related to working with databases in the GridWorks ecosystem.
Our platform is PostgreSQL, with the TimescaleDB (+toolkit) extensions for efficiently handling time-series data.
Prerequisites
- Docker Engine — to run PostgreSQL+TimescaleDB locally.
(If you have previously set up the RabbitMQ server from
gridworks-base, you should already have Docker.) psqlPostgreSQL command-line client (any modern version; tested with 14.x and 18.x). Install via your package manager (e.g.brew install libpq && brew link --force libpqon macOS,sudo apt install postgresql-clienton Linux). Check withpsql --version. (Optional:pgAdminas a GUI for inspecting the database.)- Python ≥ 3.12 and
uv(per the rest of the GridWorks toolchain).
Database Setup
The following steps will get you set up to run with the database, either locally or on a managed Tiger Cloud instance.
Local Pre-requisite: PostgreSQL+TimescaleDB container
If running locally, you'll need to start by pulling the official TimescaleDB image and starting a container. You need the -ha variant for PostgreSQL v18 and TimescaleDB v2.25 — without -ha you'll get the "lite" version of TimescaleDB which is missing required functionality.
If you don't already have a PostgreSQL listener on port 5432, map 5432:5432. If you do (native install, or another Docker postgres), pick a free port and map <HOST_PORT>:5432 instead (e.g. 5433:5432).
Pick a POSTGRES_PASSWORD — this becomes the password for the built-in postgres superuser; record it locally (e.g. in .env), don't commit it.
Concrete docker run example (uses 5433 and a placeholder password):
docker run -d \
--name gw-data-pg \
-e POSTGRES_PASSWORD=changeme \
-p 5433:5432 \
timescale/timescaledb-ha:pg18-ts2.25
Verify the container is healthy: docker ps should show it up, and PGPASSWORD=changeme psql -h 127.0.0.1 -p 5433 -U postgres -c "SELECT version();" should return PostgreSQL 18.x.
Detailed instructions and alternative configurations: https://www.tigerdata.com/docs/self-hosted/latest/install/installation-docker
Database Setup
This repo includes a numbered sequence of scripts in the src/gw_data/db/scripts folder that will get you up and running.
0. (Local-Only) Create the Database
This step is not required (or even possible) on a Tiger Cloud managed server.
Run 0_db_create.sql as the postgres user to create the tsdb database and add the timescaledb extension.
psql -d "postgresql://127.0.0.1:<PORT>/" -U postgres -f src/gw_data/db/scripts/0_server_init.psql
- Replace
<PORT>with the host port you mapped to 5432 (e.g.5433). - You'll be prompted for the
postgresuser password (i.e., the one you previously set asPOSTGRES_PASSWORDin yourdocker runcommand).
1. Create the Users
Run 1_db_user_setup.psql as follows to create the users we need. This script can be run as the postgres user locally, or the tsdbadmin user in Tiger Cloud.
psql -d "postgresql://<SERVER>:<PORT>/" -U <postgres|tsdbadmin> -f src/gw_data/db/scripts/1_db_user_setup.psql
- Replace
<SERVER>with the database server address (e.g.,127.0.0.1when running locally). - Again, replace
<PORT>with your mapped host port. - Again, you'll be prompted for the
postgresuser password.
This script creates three user roles: gw_admin (full ownership), gw_journalkeeper (insert/update/delete), and gw_visualizer (select-only).
Note: the script uses psql's interactive \password meta-command three times (once for each user). It must be run interactively — it will prompt for each password as it runs and cannot be piped or run via CI:
You'll be prompted for new passwords for each of the users. Pick whatever you like and record them somewhere.
For non-interactive setups (CI, scripted bootstraps), apply the equivalent SQL with explicit passwords; see the script as the canonical reference.
Reset shortcut: if your local DB ever gets into a weird state, you can wipe it and start over with:
psql -d "postgresql://127.0.0.1:<PORT>/" -U postgres -f src/gw_data/db/scripts/_XX_drop_all.sql
2. Create the gridworks Schema
Run 2_db_schema_setup.sql as follows to create our private gridworks schema and apply appropriate permissions. This should be run as the gw_admin user.
psql -d "postgresql://<SERVER>:<PORT>/" -U gw_admin -f src/gw_data/db/scripts/2_db_schema_setup.psql
- Again, replace
<SERVER>with the database server address. - Again, replace
<PORT>with your mapped host port. - This time you'll be prompted for the
gw_adminuser password.
3. Run the Alembic Migrations
Next we need to run the database migrations we've defined with Alembic to create the tables, etc. that we need.
But first we need to update our .env file. Copy template.env (at the repo root) to .env:
cp template.env .env
Then, edit .env as follows:
- Replace
<SERVER>with the database server address (e.g.,127.0.0.1when running locally). - Replace
<%PASSWORD%>inGW_DATA_DB_URLwith thegw_adminpassword you just set. - Replace
<%PORT%>with the host port you mapped to 5432 (e.g.5433).
Now we can run 3_db_alembic_upgrade.sh as follows:
`sh src/gw_data/db/scripts/3_db_alembic_upgrade.sh`
This should create 12 tables in the gridworks schema: alembic_version, connectivity_edges, customers, g_nodes, installations, installers, messages, position_points, reading_channels, readings, user_installation_roles, users.
Verify with psql -d "postgresql://<SERVER>:<PORT>/tsdb" -U gw_admin -c "\dt gridworks.*".
4. Seed the Database
With the database created, gw_admin ready, and .env filled in, you can seed some initial data:
uv run python ./src/gw_data/db/scripts/1_db_seed.py
Note: the seed script is also interactive — it uses Python's getpass to prompt for passwords for two seeded users (admin and beech-user). It must be run from a real terminal (no piping). The seed populates a small set of dev users, a customer, an installation, and one g_node.
In the future we will have a more comprehensive seeding process that will ingest some actual message data.
Best Practices for Databases
The following are some best practices that we should follow when at all possible:
- Primary Keys should be UUIDs, stored as the DB-native UUID type
- Dates/Times should be stored as
TIMESTAMPTZ - Migrations should be encouraged and done frequently to provide new functionality. Database schema is always temporary.
- Each application that uses that database should have its own dedicated user, with the minimal set of permissions. (In particular,
postgresshould never be used at all, andgw_adminshould only be used for migrations and other tasks that require full ownership of the database.)
TimescaleDB Performance
TimescaleDB does two main things to improve performance:
- Separates time-series tables (which it calls "hypertables") into "chunks", each of which cover a certain timeframe.
- Allows time-series data to be stored in column order with compression, which vastly improves performance for data that changes very little over time. This compression happens in a scheduled job, and only for data older than a configured threshold.
We have column-store compression enabled on the readings table for data older than 2 weeks, with compression segmented by the channel ID.
Useful Queries
The following queries are useful for analyzing TimescaleDB performance:
-- Display the size in MB of our two main tables
SELECT pg_size_pretty(hypertable_size('readings')) as "Readings Table Size", pg_size_pretty(hypertable_size('messages')) as "Messages Table Size";
-- Display the size of all database tables in order.
-- This will show each TimescaleDB chunk as a separate table.
SELECT
table_schema || '.' || table_name AS table_full_name,
pg_size_pretty(pg_total_relation_size('"' || table_schema || '"."' || table_name || '"')) AS size
FROM information_schema.tables
ORDER BY
pg_total_relation_size('"' || table_schema || '"."' || table_name || '"') DESC;
-- Display compression stats for each chunk in the readings table
SELECT
chunk_name,
pg_size_pretty(before_compression_total_bytes) AS size_before,
pg_size_pretty(after_compression_total_bytes) AS size_after,
100 - (after_compression_total_bytes::float / before_compression_total_bytes * 100) AS compression_ratio_pct
FROM chunk_compression_stats('readings');
-- Get the ID of the policy_compression job so you can manually run `CALL run_job` with it (e.g. after a bulk import).
SELECT job_id, proc_name, hypertable_name, config
FROM timescaledb_information.jobs;
-- Get info (table, time range, etc.) about the TimescaleDB chunks.
SELECT * FROM timescaledb_information.chunks
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gw_data-0.3.0.tar.gz.
File metadata
- Download URL: gw_data-0.3.0.tar.gz
- Upload date:
- Size: 45.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c453b479bccc6a17b1d0ba40b1cdafd3a1026c47ba2a32c594f1b7e81f3493a0
|
|
| MD5 |
99c7aa48c6c4b7faf9729252484f2993
|
|
| BLAKE2b-256 |
520719d75bfc3777cad6342e9868018e52f328e5ef4cd6f6cc2b21cb39985516
|
File details
Details for the file gw_data-0.3.0-py3-none-any.whl.
File metadata
- Download URL: gw_data-0.3.0-py3-none-any.whl
- Upload date:
- Size: 40.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.11.19 {"installer":{"name":"uv","version":"0.11.19","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
016eb81f40c91743be58cefa61399bfef830070365d0e0067131f86260928b2c
|
|
| MD5 |
d21fe52b7ae46b7de3a09978654b31b9
|
|
| BLAKE2b-256 |
ccb50178dd6171761824182323c1c4139b52a324eb3ce87b9f72c8a0264e2a3a
|