Skip to main content

Middleware that propagates HTTP header correlation IDs to project logs

Project description

pypi test codecov

ASGI Correlation ID middleware

Middleware for generating or propagating correlation IDs, making it possible to connect each of your logs to a single HTTP request.

Correlation IDs are propagated when HTTP requests contain the Correlation-ID HTTP header key, and generated when no header is present. The key value of the HTTP header can be customized, and if you're, e.g., on a platform like Heroku, you should use X-Request-ID instead.

In addition to the core functionality, the package supports forwarding correlation IDs to Sentry events and Celery tasks. See the extensions section below for details.

Table of contents

Installation

pip install asgi-correlation-id

Setting up the middleware

To set up the package, you need to add the middleware and configure logging.

Adding the middleware

The middleware can be added like this

app = FastAPI(middleware=[Middleware(CorrelationIdMiddleware)])

or this

app = FastAPI()
app.add_middleware(CorrelationIdMiddleware)

or any other way your framework allows.

For Starlette apps, just substitute FastAPI with Starlette in the example above.

Middleware settings

The middleware has a few settings. These are the defaults:

class CorrelationIdMiddleware(
    header_name='Correlation-ID',
    validate_guid=True,
    uuid_length=32,
)

Each individual setting is described below:

header_name

The HTTP header key to read IDs from.

In additon to Correlation-ID, another popular choice for header name is X-Request-ID. Among other things, this is the standard header value for request IDs on Heroku.

Defaults to Correlation-ID.

validate_guid

By default, the middleware validates correlation IDs as valid UUIDs. If turned off, any string will be accepted.

An invalid header is discarded, and a fresh UUID is generated in its place.

Defaults to True.

uuid_length

Lets you optionally trim the length of correlation IDs. Probably not needed in most cases, but for, e.g., local development having 32-length UUIDs in every log output to your console can be excessive.

Defaults to 32.

Configuring logging

To get a benefit from the middleware, you'll want to configure your logging setup to log the correlation ID in some form or another. This way logs can be correlated to a single request - which is largely the point of the middleware.

To set up logging of the correlation ID, you simply have to implement the filter supplied by the package.

If your logging config looks something like this:

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'formatters': {
        'web': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
            'format': '%(levelname)s ... %(name)s %(message)s',
        },
    },
    'handlers': {
        'web': {
            'class': 'logging.StreamHandler',
            'formatter': 'web',
        },
    },
    'loggers': {
        'my_project': {
            'handlers': ['web'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

You simply have to add a log filter, like this

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
+   'filters': {
+       'correlation_id': {'()': CorrelationId},
+   },
    'formatters': {
        'web': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
+           'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
        },
    },
    'handlers': {
        'web': {
            'class': 'logging.StreamHandler',
+           'filters': ['correlation_id'],
            'formatter': 'web',
        },
    },
    'loggers': {
        'my_project': {
            'handlers': ['web'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

And your log output will go from this

INFO ... project.views This is a DRF view log, and should have a GUID.
WARNING ... project.services.file Some warning in a function
INFO ... project.views This is a DRF view log, and should have a GUID.
INFO ... project.views This is a DRF view log, and should have a GUID.
WARNING ... project.services.file Some warning in a function
WARNING ... project.services.file Some warning in a function

to containing a correlation ID, connecting each log to a single request

INFO ... [773fa6885e03493498077a273d1b7f2d] project.views This is a DRF view log, and should have a GUID.
WARNING ... [773fa6885e03493498077a273d1b7f2d] project.services.file Some warning in a function
INFO ... [0d1c3919e46e4cd2b2f4ac9a187a8ea1] project.views This is a DRF view log, and should have a GUID.
INFO ... [99d44111e9174c5a9494275aa7f28858] project.views This is a DRF view log, and should have a GUID.
WARNING ... [0d1c3919e46e4cd2b2f4ac9a187a8ea1] project.services.file Some warning in a function
WARNING ... [99d44111e9174c5a9494275aa7f28858] project.services.file Some warning in a function

If you're using things like a json-formatter, just add correlation-id: %(correlation_id)s to your list of properties.

Extensions

We've added a couple of (we think) nice extensions to extend the scope of correlation IDs.

Sentry

If your project has sentry-sdk installed, correlation IDs will automatically be added to Sentry events as a transaction_id.

See this blogpost for a little bit of detail.

Celery

Calling task.delay() in the context of a HTTP request would normally mean that you lose the correlation ID completely, because your work is picked up by a separate worker in a separate context.

To make sure correlation IDs persist across this jump, we load Celery signal hooks that transfer and receive correlation IDs from and to task headers.

This behavior is enabled as long as celery is installed.

Extending Celery further

Loading correlation IDs from a HTTP request to a background task is enabled by default.

In addition to this though, the package provides one more set of hooks you can use to improve tracing in Celery.

The feature

In the case of a HTTP request spawning a background task, we have full information about the sequence of events. But what happens if that background task spawns more background tasks, or retries and rejections are added to the mix? As soon as more than one task is spawned, the correlation ID is reduced to an "origin ID" -> the ID of the HTTP request that spawned the first worker.

In the same way correlation IDs are nice, because it connects logs to a single HTTP request, we would like something to give us the sequence of events when things get complicated. For this purpose we can extend the concept of a correlation id by adding a few more IDs. We therefore provide two more log filters:

  • A current_id, which is a generated UUID, unique to each new worker process
  • A parent_id which is the current_id of the process that issued the current task.

So to summarize, if you add all Celery hooks, you would end up with:

  • A correlation_id: The ID of an originating HTTP request or a generated ID in the case of scheduled tasks
  • A current_id: The ID of the current worker process
  • A parent_id: The ID of the parent worker process if one existed.

This means all logs can be correlated to a single correlation ID, and the sequence of events can be completely reconstructed, since each new task will have a reference to which process issued it.

Adding Celery event hooks

Setting up the event hooks is simple, just import configure_celery_current_and_parent_id and run it during startup.

from fastapi import FastAPI

from asgi_correlation_id import configure_celery_current_and_parent_id

app = FastAPI()

app.add_event_handler('startup', configure_celery_current_and_parent_id)

You can look over the event hooks here.

Celery event hook settings

The setup function has a single setting.

uuid_length

Lets you optionally trim the length of IDs. Probably not needed in most cases, but for, e.g., local development having 32-length UUIDs in every log output to your console can be excessive.

Defaults to 32.

Configuring logging

If this is your logging config after setting up the middleware

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'correlation_id': {'()': CorrelationId},
    },
    'formatters': {
        'web': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
            'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
        },
    },
    'handlers': {
        'web': {
            'class': 'logging.StreamHandler',
            'filters': ['correlation_id'],
            'formatter': 'web',
        },
    },
    'loggers': {
        'my_project': {
            'handlers': ['web'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

You simply need to add these lines of code to start logging the current_id and parent_id

LOGGING = {
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'correlation_id': {'()': CorrelationId},
+       'celery_tracing': {'()': CeleryTracingIds},
    },
    'formatters': {
        'web': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
            'format': '%(levelname)s ... [%(correlation_id)s] %(name)s %(message)s',
        },
+       'celery': {
+           'class': 'logging.Formatter',
+           'datefmt': '%H:%M:%S',
+           'format': '%(levelname)s ... [%(correlation_id)s] [%(celery_parent_id)s-%(celery_current_id)s] %(name)s %(message)s',
+       },
    },
    'handlers': {
        'web': {
            'class': 'logging.StreamHandler',
            'filters': ['correlation_id'],
            'formatter': 'web',
        },
+       'celery': {
+           'class': 'logging.StreamHandler',
+           'filters': ['correlation_id', 'celery_tracing'],
+           'formatter': 'celery',
+       },
    },
    'loggers': {
        'my_project': {
+           'handlers': ['celery' if any('celery' in i for i in sys.argv) else 'web'],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

This example implements separate logging handlers and formatters for Celery and non-celery processes, but that's not strictly necessary. Using a JSON-formatter is probably also desired once you get past a small number of log filters, since logs quickly become pretty cluttered.

During development though, using the UUID length can be useful for limiting noise. Something like this is what we would use ourselves:

{
    'version': 1,
    'disable_existing_loggers': False,
    'filters': {
        'correlation_id': {'()': CorrelationId},
        'celery_tracing': {'()': CeleryTracingIds},
    },
    'formatters': {
        'dev': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
            'format': '%(levelname)s:\t\b%(asctime)s %(name)s:%(lineno)d [%(correlation_id)s] %(message)s',
        },
        'dev-celery': {
            'class': 'logging.Formatter',
            'datefmt': '%H:%M:%S',
            'format': (
                '%(levelname)s:\t\b%(asctime)s %(name)s:%(lineno)d [%(correlation_id)s]'
                ' [%(celery_parent_id)s-%(celery_current_id)s] %(message)s'
            ),
        },
        'json': {
            '()': 'pythonjsonlogger.jsonlogger.JsonFormatter',
            'format': """
                asctime: %(asctime)s
                created: %(created)f
                filename: %(filename)s
                funcName: %(funcName)s
                levelname: %(levelname)s
                level: %(levelname)s
                levelno: %(levelno)s
                lineno: %(lineno)d
                message: %(message)s
                module: %(module)s
                msec: %(msecs)d
                name: %(name)s
                pathname: %(pathname)s
                process: %(process)d
                processName: %(processName)s
                relativeCreated: %(relativeCreated)d
                thread: %(thread)d
                threadName: %(threadName)s
                exc_info: %(exc_info)s
                correlation-id: %(correlation_id)s
                celery-current-id: %(celery_current_id)s
                celery-parent-id: %(celery_parent_id)s
            """,
            'datefmt': '%Y-%m-%d %H:%M:%S',
        },
    },
    'handlers': {
        'dev': {
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stdout',
            'filters': ['correlation_id'],
            'formatter': 'console',
        },
        'dev-celery': {
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stdout',
            'filters': ['correlation_id', 'celery_tracing'],
            'formatter': 'console-celery',
        },
        'json': {
            'class': 'logging.StreamHandler',
            'stream': 'ext://sys.stdout',
            'filters': ['correlation_id'],
            'formatter': 'json',
        },
    },
    'loggers': {
        'my_project': {
            'handlers': [
                'json' if settings.ENVIRONMENT != 'dev'
                else 'dev-celery' if any('celery' in i for i in sys.argv)
                else 'dev'
            ],
            'level': 'DEBUG',
            'propagate': True,
        },
    },
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

asgi-correlation-id-0.1.4.tar.gz (14.2 kB view details)

Uploaded Source

Built Distribution

asgi_correlation_id-0.1.4-py3-none-any.whl (10.7 kB view details)

Uploaded Python 3

File details

Details for the file asgi-correlation-id-0.1.4.tar.gz.

File metadata

  • Download URL: asgi-correlation-id-0.1.4.tar.gz
  • Upload date:
  • Size: 14.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/1.1.11 CPython/3.9.7 Linux/5.8.0-1042-azure

File hashes

Hashes for asgi-correlation-id-0.1.4.tar.gz
Algorithm Hash digest
SHA256 9722bd5ed6a548ed74fd9fef5f92e7a1dbb2e675c7ceebbb3a1d2fd3a840ca8a
MD5 e4f5d75cf6e5c516c4b2d4a66b81c1b9
BLAKE2b-256 ece93d99cee967c9fd421c6efa8621319ce319f263ca760d2378a2b0e1344aa2

See more details on using hashes here.

File details

Details for the file asgi_correlation_id-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for asgi_correlation_id-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 5f4c65c07a7f6b5ba570ff74d9fd4c4c2781f492fe049d3a3d2dbfbd7e9a4fce
MD5 d5514c86c16274d60bfa06b75efef143
BLAKE2b-256 f9f695fc00d215c15fe1cebf9a4ac8879f5974eddd632abda0f56f65e4f28acb

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page