Skip to main content

A project manager for Python based extractors

Project description

cogex

cogex is a tool for managing extractors for Cognite Data Fusion written in Python. It provides utilities for initializing a new extractor project and building self-contained executables of Python based extractors.

Important note for users running pyenv

pyenv is a neat tool for managing Python installations.

Since cogex uses PyInstaller to build executables, we need Python to be installed with a shared instance of libpython, which pyenv does not do by default. To fix this, make sure to add the --enable-shared flag when installing new Python versions with pyenv, like so:

env PYTHON_CONFIGURE_OPTS="--enable-shared" pyenv install 3.9.0

You can read more about it in the PyInstaller documentation

Overview of features

Start a new extractor project

To start a new extractor project, move to the desired directory and run

cogex init

You will first be prompted for some information, before cogex will initialize a new project.

Add dependencies

Extractor projects initiated with cogex will use poetry for managing dependencies. Running cogex init will automatically install the Cognite SDK and extractor-utils framework, but if your extractor needs any other dependency, simply add them using poetry, like so:

poetry add requests

Type checking and code style

It is recommended that you run code checkers on your extractor, in particular:

  • black is an opinionated code style checker that will enforce a consistent code style throughout your project. This is useful to avoid unecessary changes and minimizing PR diffs.
  • isort is a tool that sorts your imports, also contributing to a consistent code style and minimal PR diffs.
  • mypy is a static type checker for Python which ensures that you are not making any type errors in your code that would go unnoticed before suddently breaking your extractor in production.

cogex will install all of these, and automatically run them on every commit. If you for some reason need to perform a commit despite one of these failing, you can run git commit --no-verify, although this is not recommended.

Build and package an extractor project

Packaging a binary of your extractor

It is not always an option to rely on a Python installation at the machine your extractor will be deployed at. For those scenarios it is useful to package the extractor, including its dependencies and the Python runtime, into a single self-contained executable. To do this, run

cogex build

This will create a new executable (for the operating system you ran cogex build from) in the dist directory.

Making docker images

To build a docker image, you first need to add a [tools.cogex.docker] section to your pyproject file. The required fields are

  • tags: A list of tags to tag the resulting image with. These support some simple templating, if you include {version} in your tag, it will be replaced with the current version of the extractor. {major} will be replaced with the current major version.
  • If your [tool.poetry.scripts] includes multiple entries, you need to specify which one to use in the docker image with the entrypoint field

In addition, you have some additional fields:

  • base-image: Which base image to use. By default, the debian-slim based python image for the python version currently running with be chosen.
  • install-dir if you want to specify where in the image the extractor should be installed
  • preamble which can contain additional dockerimage statements to run in the beginning of the dockerfile.

Minimal example:

[tool.cogex.docker]
tags = ["cognite/my-extractor:{version}"]

Larger example (from the DB Extractor):

[tool.cogex.docker]
base-image = "python:3.10"
preamble = """
RUN apt-get update \
    && apt-get dist-upgrade -y dirmngr gnupg gnupg-l10n gnupg-utils gpg gpg-agent gpg-wks-client gpg-wks-server \
    && gpgconf gpgsm gpgv libssl-dev libssl1.1 openssl
RUN apt-get install -y apt-utils build-essential
RUN apt-get install -y unixodbc-dev unixodbc
"""
tags = [
    "eu.gcr.io/cognite-registry/db-extractor-base:latest",
    "eu.gcr.io/cognite-registry/db-extractor-base:{version}",
    "cognite/db-extractor-base:{version}",
]

You can now build and tag docker images with

cogex build --dockerimage

If you just want to see the generated dockerfile, instead run

cogex build --dockerfile

Creating a new version of your extractor

To keep track of which version of the code base is running at a given deployment it is very useful to version your extractor. When releasing a new version, run

poetry version [patch/minor/major]

To automatically bump the corresponding version number. Note that this only updates the version number in pyproject.toml. When running cogex build this new version number will be propagated through the rest of the code base.

Any extractor project should follow semantic versioning, which means you should bump

  • patch for any minor bug fixes or improvements
  • minor for new features or bigger improvements that doesn't break compatability
  • major for new feature or improvements that breaks compatability with previous versions, in other words for those scenarios where the new version is not a drop-in replacement for an old version. For example:
    • When adding a new required config field
    • When removing a config field
    • When changing defaults in a way that could break existing deployments

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

cognite_extractor_manager-1.0.1.tar.gz (16.4 kB view hashes)

Uploaded Source

Built Distribution

cognite_extractor_manager-1.0.1-py3-none-any.whl (18.8 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page