A CLI tool to define event schemas, lint them, interact with schema registries, and build corresponding data artifacts (e.g., dbt package).

Project description


A CLI tool to help Data, Engineering, and Product teams:

  • Define event schemas as code using JSONschema, providing a version controlled source of truth.
  • Lint schemas to enforce agreed-upon conventions (configurable). Run reflekt lint in a CI/CD pipeline to check:
    • Naming conventions (snake_case, camelCase, Title Case, etc.)
    • Descriptions are always included.
    • Required metadata is defined.
  • Interact with schema registries
    • Push schema(s) from a Reflekt project to a schema registry where they can be used for event data validation.
    • Pull schema(s) from a schema registry into a Reflekt project to build corresponding data artifacts.
  • Build data artifacts (e.g., dbt packages) based on schemas that model and document event data.
    • Keep data artifacts in sync with instrumentation - ready for use by engineers, analysts, and the business.
    • Reduce errors, improve data quality, and automate important (but boring) data tasks.

Table of Contents


Reflekt is available on PyPI. Install with pip:

pip install reflekt


A list of CLI commands and arguments can be accessed by running reflekt --help. Each Command has a --help flag to provide command details (arguments, options, etc.). All commands (except init) can be run against a single or multiple schema(s). The command examples below give an overview of the syntax.

See the argument syntax section for more details on selecting schemas, specifying sources and SDKs used to collect event data.


Initialize a Reflekt project.

reflekt init --dir /path/to/project


Pull schemas from a schema registry and create the corresponding structure in project schemas/ directory.

# Pull all schemas from 'ecommerce' tracking plan in Segment to schemas/segment/ecommerce/
reflekt pull --select segment/ecommerce/

Supported registries: Segment, Avo


Push schemas in project schemas/ directory to a schema registry.

# Push all schemas in schemas/segment/ecommerce/ to Segment tracking plan 'ecommerce'
reflekt push --select segment/ecommerce/CartViewed

Supported registries: Segment


Lint schemas in project schemas/ directory.

# Lint a single schema (.json is optional)
reflekt lint --select segment/ecommerce/CartViewed/1-0.json

Linting checks include:

  • Event and property names match the configured naming conventions in reflekt_project.yml.
  • Only valid data types are used (e.g., disallow null or any types).
  • Descriptions are included for all events and properties.
  • Event schema validates against the meta-schema schemas/.reflekt/event-meta/1-0.json, enforcing any required metadata.


Build a data artifacts based on events schemas. Save time, reduce errors, and improve data quality by ensuring models and documentation are always up-to-date with the latest version of event schemas.

# Build a dbt package for:
#   - Events collected using the Segment SDK
#   - Event schemas defined in my_reflekt_project/schemas/segment/ecommerce/
#   - Raw event data stored at specified source (snowflake.raw.segment_prod)
reflekt build dbt --select segment/ecommerce --source snowflake.raw.segment_prod --sdk segment

Supported data artifacts:

  • dbt packages - defines dbt sources, models, and documentation for selected schemas and event data found in the specified --source.

Reflekt Project Setup

Project Structure

A Reflekt project is a Git repo with the following directory structure:

├── .logs/                # Reflekt command logs
├── .reflekt_cache/       # Local cache used by Reflekt
├── artifacts/            # Data artifacts are built here
├── schemas/              # Event schemas are defined here
├── .gitignore
└── reflekt_project.yml   # Project configuration

You can use the reflekt init command to create a new Reflekt project. Sync the project to Github to enable collaboration and version control amongst your teams.

Configuration Files

There are 2 configuration files required to run Reflekt.


General project settings, schema & linting conventions, data artifacts configuration.

example_reflekt_project.yml(click to expand)
# Example reflekt_project.yml
# GENERAL CONFIG ----------------------------------------------------------------------
version: 1.0

name: reflekt_demo               # Project name
vendor: com.company_name         # Default vendor for schemas in reflekt project
default_profile: dev_reflekt     # Default profile to use from reflekt_profiles.yml
# profiles_path: optional/path/to/reflekt_profiles.yml  # Optional, defaults to ~/.reflekt/reflekt_profiles.yml

# SCHEMAS CONFIG ----------------------------------------------------------------------
schemas:                        # Define schema conventions
      casing: title             # title | snake | camel | any
      capitalize_camel: true    # Only used if 'casing: camel'
      numbers: false            # Allow numbers in event names
      reserved: []              # Reserved event names
      casing: snake             # title | snake | camel | any
      capitalize_camel: true    # Only used if 'casing: camel'
      numbers: false            # Allow numbers in property names
      reserved: []              # Reserved property names
    data_types: [               # Allowed data types
        string, integer, number, boolean, object, array, any, 'null'

# REGISTRY CONFIG ---------------------------------------------------------------------
registry:                       # Additional config for schema registry if needed
  avo:                          # Avo specific config
    branches:                   # Provide ID for Avo branches for `reflekt pull` to work
      staging: AbC12dEfG        # Safe to version control (See Avo docs to find branch ID:
      main: main                # 'main' always refers to the main branch

# ARTIFACTS CONFIG -----------------------------------------------------------------------
artifacts:                      # Configure how data artifacts are built
  dbt:                          # dbt package config
      prefix: __src_            # Source files start with this prefix
      prefix: stg_              # Model files start with this prefix
      prefix: _stg_             # Docs files start with this prefix
      in_folder: false          # Docs files in separate folder?
      tests:                    # Add generic dbt tests for columns found in schemas
        id: [unique, not_null]


Defines connection to schema registries and sources where event data is stored.

example_reflekt_profile.yml(click to expand)
# Example reflekt_profiles.yml
version: 1.0

dev_reflekt:                                              # Profile name (multiple profiles can be defined)
  registry:                                               # Define connections to schema registries (multiple allowed)
    - type: segment
      api_token: segment_api_token                        #
    - type: avo
      workspace_id: avo_workspace_id                      #
      service_account_name: avo_service_account_name      #
      service_account_secret: avo_service_account_secret

  source:                                                 # Define connections to data warehouses where event data is stored (multiple TYPES allowed. Cannot have sources of the same TYPE)
    - type: snowflake                                     # Snowflake DWH. Credentials follow.
      account: abc12345
      database: raw
      warehouse: transforming
      role: transformer
      user: reflekt_user
      password: reflekt_user_password

    - type: redshift                                      # Redshift DWH. Credentials follow.
      database: analytics
      port: 5439
      user: reflekt_user
      password: reflekt_user_password


Required metadata can be globally defined for all events in a project by modifying the metadata object in the schemas/.reflekt/event-meta/1-0.json schema. This is optional and by default no metadata is required.

schemas/.reflekt/event-meta/1-0.json (click to expand example)
    "$schema": "",
    "$id": ".reflekt/event-meta/1-0.json",
    "description": "Meta-schema for all Reflekt events",
    "self": {
        "vendor": "reflekt",
        "name": "meta",
        "format": "jsonschema",
        "version": "1-0"
    "type": "object",
    "allOf": [
            "$ref": ""
            "properties": {
                "self": {
                    "type": "object",
                    "properties": {
                        "vendor": {
                            "type": "string",
                            "description": "The company, application, team, or system that authored the schema (e.g.,,,"
                        "name": {
                            "type": "string",
                            "description": "The schema name. Describes what the schema is meant to capture (e.g., pageViewed, clickedLink)"
                        "format": {
                            "type": "string",
                            "description": "The format of the schema",
                            "const": "jsonschema"
                        "version": {
                            "type": "string",
                            "description": "The schema version, in MODEL-ADDITION format (e.g., 1-0, 1-1, 2-3, etc.)",
                            "pattern": "^[1-9][0-9]*-(0|[1-9][0-9]*)$"
                    "required": ["vendor", "name", "format", "version"],
                    "additionalProperties": false
                "metadata": {  // EXAMPLE: Defining required metadata ( code_owner, product_owner, stakeholders)
                    "type": "object",
                    "description": "Required metadata for all event schemas",
                    "properties": {
                        "code_owner": {
                            "type": "string"
                        "product_owner": {
                            "type": "string"
                        "stakeholders": {
                            "type": "array",
                            "items": {"type": "string"}
                    "required": ["code_owner", "product_owner"],
                    "additionalProperties": false
                "properties": {},
                "tests": {},
                "metrics": {
                    "type": "object",
                    "properties": {
                        "dimensions": {
                            "type": "array",
                            "description": "Schema properties to be used as dimensions",
                            "items": {"type": "string"}
                        "measures": {
                            "type": "array",
                            "description": "Schema properties to be used as measures",
                            "items": { "type": "string"}
                    "required": ["dimensions", "measures"],
                    "additionalProperties": false
            "required": ["self", "metadata", "properties"]


Event schemas stored as JSON files in the schemas/ directory of a project. Behind the scenes, Reflekt understands how different schema registries store and structure schemas, creating a common codified representation using JSONschema. When pulling/pushing schemas from/to a schema registry, Reflekt handles the conversion between the registry's format and JSON Schema.

Schema $id

Schemas are identified in Reflekt by their $id property, equal to their relative path to the schemas/ directory. For example, the schema at my_reflekt_project/schemas/segment/ecommerce/CartViewed/1-0.json has the $id of segment/ecommerce/CartViewed/1-0.json.

See the --select syntax docs for more details on selecting schemas when running commands.

Schema Versions

Schema changes are captured using a MAJOR-MINOR version spec (inspired by SchemaVer). Schema versions start at 1-0 and are incremented as follows:

  • MAJOR - Breaking schema changes incompatible with previous data. Examples:
    • Add/remove/rename a required property
    • Change a property from optional to required
    • Change a property's type
  • MINOR - Non-breaking schema changes compatible with previous data. Examples:
    • Add/remove/rename an optional property
    • Change a property from required to optional

Schema Registries

Reflekt supports the following schema registries. While Reflekt uses the MAJOR-MINOR versioning spec, registries handle schema versions differently. Compatibility with Reflekt's MAJOR-MINOR spec is included in the table below.

Schema Registry MODEL ADDITION Notes
Segment Protocols Only supports MODEL (breaking changes).
Avo Schema changes managed in Avo branches - "version": "1-0"(always).
Avo customers can build data artifacts based on their Avo tracking plan using reflekt pull + reflekt build.

Example schema

An example ProductClicked event schema, based on the Segment Ecommerce Spec, is shown below.

my_reflekt_project/schemas/segment/ecommerce/ProductClicked/1-0.json (click to expand)
  "$id": "segment/ecommerce/ProductClicked/1-0.json",
  "$schema": "",
  "self": {
      "vendor": "com.company_name",
      "name": "ProductClicked",
      "format": "jsonschema",
      "version": "1-0"
  "metadata": {
      "code_owner": "engineering/ecommerce-squad",
      "product_owner": "",
  "type": "object",
  "properties": {
      "product_id": {
          "type": "string",
          "description": "Database id of the product being viewed"
      "sku": {
          "type": "string",
          "description": "Sku of the product being viewed"
      "category": {
          "type": "string",
          "description": "Category of the product being viewed"
      "name": {
          "type": "string",
          "description": "Name of the product being viewed"
      "brand": {
          "type": "string",
          "description": "Brand of the product being viewed"
      "variant": {
          "type": "string",
          "description": "Variant of the product being viewed"
      "price": {
          "type": "number",
          "description": "Price of the product ($) being viewed"
      "quantity": {
          "type": "integer",
          "description": "Quantity of the product being viewed"
      "coupon": {
          "type": "string",
          "description": "Coupon code associated with a product (for example, MAY_DEALS_3)"
      "position": {
          "type": "integer",
          "description": "Position in the product list (ex. 3)"
      "url": {
          "type": "string",
          "description": "URL of the product being viewed"
      "image_url": {
          "type": "string",
          "description": "URL of the product image being viewed"
  "required": [],
  "additionalProperties": false,

Using Data Artifacts

dbt packages

To use a private dbt package built by Reflekt in a downstream dbt project, add it to the packages.yml of the project (see examples below) and then run dbt deps to import it.


  - git: "<your_user_or_org>/<your_repo>"  # Replace with Github repo URL for your Reflekt project
    subdirectory: "dbt-packages/<reflekt_dbt_package_name>"
    revision: v0.1.0___DBT_PKG_NAME_  # Example tag. Replace with branch, tag, or commit (full 40-character hash)


  - git: ""https://{{env_var('DBT_ENV_SECRET_GITHUB_PAT')}}<your_user_or_org>/<your_repo>.git""  # Replace with your PAT and Github repo URL for your Reflekt project
    subdirectory: "dbt-packages/<reflekt_dbt_package_name>"
    revision: v0.1.0___DBT_PKG_NAME_  # Example tag. Replace with branch, tag, or commit (full 40-character hash)

To use with dbt-cloud, you will need to create a Github personal access token (e.g., DBT_ENV_SECRET_GITHUB_PAT) and configure it as an environment variable in your dbt-cloud account.

