Skip to main content

Database models and migrations for the Local News Lab's article recommendation system

Project description

This branch houses the effort to move our DB code to a SQLAlchemy-based stack and is under active development in parallel with the move to Google Analytics 4. For the current production code, see the main branch.

article-rec-db

Python version Code style with black More style with flake8 Imports with isort Type checking with mypy License

Database models and migrations for the Local News Lab's article recommendation system.

Usage

A note before continuing: A lot of the commands you'll see below are wrapped inside Poe tasks defined in pyproject.toml. Poe is installed as a dev-dependency; run poe --help to get a list of tasks and their descriptions. It's entirely possible to run commands without using Poe, but if you decide to use Poe, make sure to read through the tasks to understand what they do.

As a package

We use SQLModel, a layer on top of SQLAlchemy with Pydantic, to define our tables. This is useful because we can import this package to interact with the tables AND have Pydantic objects in Python that correspond to a row in the table.

To install the package from PyPi, run: pip install article-rec-db. Check existing versions here.

Database management

We use Terraform to manage cluster entities such as databases, roles and extensions. The code is in the terraform directory. Stages (dev and prod) are represented as different databases. To make changes to an existing database,

  1. Make changes inside terraform/modules.
  2. Run poe terraform [stage] plan to see the changes that will be applied to the corresponding database.
  3. At this point, if you're happy, you can run poe terraform [stage] apply yourself, but we prefer a CI/CD approach. Merging a PR to the dev branch will trigger a plan to be applied to the dev database, and the same for the prod branch. We always merge to dev first, then do another merge from dev to prod.

Table and column migrations

So you made some changes to what tables there are, what columns there are, indices, etc. and you'd like to update the databases. This is what alembic is for! (And notice the difference between Terraform and alembic: Terraform manages database entities that are not specific to a database, like roles and extensions, while alembic manages database entities that are specific to a database, like tables and columns.)

To generate a new revision after you've updated the models:

  1. Run this from the root of the project: DB_CONNECTION_STRING='postgresql://user:password@host:port/db_name' alembic revision --autogenerate -m "message". (There's a Poe task for this: run poe rmtdiff -d db_name -m "message")
  2. Check the /alembic/versions/ directory for the new revision and verify that it does what you want it to. Run TYPE=alembic poe test to test models against a local DB initialized via Alembic, and resolve issues as needed.
  3. Run this from the root of the project: DB_CONNECTION_STRING='postgresql://user:password@host:port/db_name' alembic upgrade head. Note that you only need to generate the revision file (step 1) once because we want the same content in each environment's database, but you do need to run the upgrade head command once for each database (change the DB_NAME to the desired target). (There's a Poe task for this: run poe rmtupgrade -d db_name)

Similar to database management, we let our CI/CD handle Step 3.

Note to LNL devs: Our automated deployment process will run the Terraform changes first, then the Alembic changes. So, for example, using Terraform to create a new database and Alembic to create a new table in that database will work in just one PR, but creating a new table with Alembic and using Terraform to grant a role access to that table won't. Best to divide changes into atomic units, each handled by a single PR.

Development

This project uses Poetry to manage dependencies. It also helps with pinning dependency and python versions. We also use pre-commit with hooks for isort, black, and flake8 for consistent code style and readability. Note that this means code that doesn't meet the rules will fail to commit until it is fixed.

We use mypy for static type checking. This can be run manually, and the CI runs it on PRs to the main branch. We also use pytest to run our tests. This can be run manually and the CI runs it on PRs to the main branch.

Setup

  1. Install Poetry.
  2. Run poetry install --no-root
  3. Run source $(poetry env list --full-path)/bin/activate && pre-commit install && deactivate to set up pre-commit

You're all set up! Your local environment should include all dependencies, including dev dependencies like black. This is done with Poetry via the poetry.lock file.

Run Code Format and Linting

pre-commit run --all-files runs isort, black, and flake8 all in one go, and is also run on every commit.

poe format does what pre-commit run --all-files does and also formats the Terraform code.

Run Static Type Checking

To manually run mypy, simply run mypy from the root directory of the project. It will use the default configuration specified in pyproject.toml.

Update Dependencies

To update dependencies in your local environment, make changes to the pyproject.toml file then run poetry update from the root directory of the project.

To update Terraform dependencies, make changes to versions.tf files as necessary.

Run Tests

To manually run rests, you need to have a Postgres instance running locally on port 5432. One way to do this is to run a Docker container, then run the tests while it is active.

  1. (If you don't already have the image locally) Run docker pull ankane/pgvector:v<version used in your remote db>
  2. Run docker run --rm --name postgres -e POSTGRES_PASSWORD=postgres -e POSTGRES_HOST_AUTH_METHOD=trust -p 127.0.0.1:5432:5432/tcp postgres
  3. Run pytest tests from the root directory of the project. Explore the pytest docs (linked above)

Note that if you decide to run the Postgres container with different credentials (a different password, port, etc.) or via a different method, you will likely need to update the test file to point to the correct Postgres instance.

Additionally, if you want to re-run the tests, you want to make sure you start over from a fresh Postgres instance. If you run Postgres via Docker, you can simply ctrl-C to stop the image and start a new one.

Steps 2 and 3 can be combined into one Poe task: poe test, which also stops the container after the tests are done, even if tests fail. In addition, you can also run poe lclstart to just start the container, and poe lclstop to stop it whenever you're done. poe lclconnect will connect you to the container via psql so you can poke around.

poe test, by default, is equivalent to TYPE=sqlmodel poe test, which tests the models against a local DB initialized via SQLModel. You can also run TYPE=alembic poe test to test the models against a local DB initialized via Alembic. The first is more convenient and is good for development, the second reflects the production environment more closely and is good for testing Alembic revisions once you're about to submit a PR.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

article_rec_db-0.0.11.tar.gz (12.3 kB view details)

Uploaded Source

Built Distribution

article_rec_db-0.0.11-py3-none-any.whl (11.2 kB view details)

Uploaded Python 3

File details

Details for the file article_rec_db-0.0.11.tar.gz.

File metadata

  • Download URL: article_rec_db-0.0.11.tar.gz
  • Upload date:
  • Size: 12.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Linux/6.2.0-1018-azure

File hashes

Hashes for article_rec_db-0.0.11.tar.gz
Algorithm Hash digest
SHA256 16249f5e39ee39b335ebcd8bf5f15139bd9d5fbf4cc7199e9e48426a00dd61bb
MD5 abd9e64c2de1f8be7be20451d1df17de
BLAKE2b-256 56dee8090effc21dbc0650a1f8d93119a956eb750d6265bea1863e3ca73c04f2

See more details on using hashes here.

File details

Details for the file article_rec_db-0.0.11-py3-none-any.whl.

File metadata

  • Download URL: article_rec_db-0.0.11-py3-none-any.whl
  • Upload date:
  • Size: 11.2 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.7.0 CPython/3.11.6 Linux/6.2.0-1018-azure

File hashes

Hashes for article_rec_db-0.0.11-py3-none-any.whl
Algorithm Hash digest
SHA256 d706ec6cec3a5f7a90449a4abfe27b436d09423195abccd12cee42605d11d8fd
MD5 08de2e44abb01b1a9e49b4c2df5b4a01
BLAKE2b-256 6400735aec05da1dc9ee6fbdb1f7735be62145e2663da73c439453aba3b4286e

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page