Python HTTP logging server
Project description
Chronologer
Chronologer is a counterpart of Python stdlib’s logging HTTPHandler [1] and more. It provides RESTful API for accepting Python logging HTTP POST requests and optional UI for browsing and searching the logs. The idea is the same as for database backends of logging software, like rsyslog-mysql [2].
UI features are described in the frontend branch and released as ChronologerUI [17] package.
Scope
Chronologer is meant for small- and medium-sized logging workloads and audit logging. Practically it’s limited by its backend’s write throughput and capacity. In case of MySQL backend vertical scaling can suffice many types of applications.
Especially it’s useful for microservice architecture where file logging is no longer practical.
Installation
Chronologer is available as a Python package [3] and as a Docker image [4] (includes UI). The former can be installed like pip install chronologer[server,mysql,ui]. The latter can be used like in the following docker-compose.yml for deployment with MySQL database.
version: '2.4'
services:
web:
image: saaj/chronologer
environment:
CHRONOLOGER_STORAGE_DSN: mysql://chronologer:pass@mysql/chronologer
CHRONOLOGER_SECRET: some_long_random_string
CHRONOLOGER_USERNAME: logger
CHRONOLOGER_PASSWORD: another_long_random_string
depends_on:
mysql:
condition: service_healthy
ports:
- 8080:8080
mysql:
image: mysql:5.7
environment:
MYSQL_DATABASE: chronologer
MYSQL_USER: chronologer
MYSQL_PASSWORD: pass
MYSQL_ROOT_PASSWORD: pass
healthcheck:
test: mysqladmin ping --protocol=tcp --password=pass --silent
interval: 5s
retries: 4
It can be run like the following. The second line applies database migrations.
docker-compose up -d
docker-compose run --rm web python -m chronologer -e production migrate
To open the UI navigate to http://localhost:8080/.
Chronologer’s configuration can be fine-tuned with several environment variables defined in envconf [5]. Default envconf can be overridden completely, see python -m chronologer --help.
For examples of scaling the application server with docker-compose see perftest/stack directory [22]. There are examples for Nginx and Traefik.
Quickstart
Having Chronologer server running as described above, client logging configuration may look like the following. It requires chronologer package installed on the client as well (i.e. pip install chronologer).
import logging.config
config = {
'version' : 1,
'disable_existing_loggers' : False,
'handlers' : {
'http' : {
'class' : 'chronologer.client.QueueProxyHandler',
'queue' : {'()': 'queue.Queue', 'maxsize': 10 ** 4},
'target' : 'ext://chronologer.client.BatchJsonHandler',
'prefix' : 'appname',
'capacity' : 128,
'host' : 'chronologer_host:8080',
'url' : '/api/v1/record',
'credentials' : ('logger', 'another_long_random_string'),
'flushLevel' : 'ERROR',
'flushTimeout' : 30,
},
},
'root' : {
'handlers' : ['http'],
'level' : 'INFO'
}
}
logging.config.dictConfig(config)
The http handler buffers records for efficiency. It flushes its buffer to the server when one of the following occurs:
the buffer, of 128 records, has been filled,
a logging record with level ERROR or above has been logged,
while logging a record there’s a record in the buffer created earlier then 30 seconds ago.
chronologer.client itself doesn’t have dependencies but Python standard library. For working only with standard library logging.handlers.HTTPHandler read below.
Path of the logging handler
This section starts with logging.handlers.HTTPHandler and explains why chronologer.client builds on it and beyond. The naive imperative logging configuration looks like:
import logging.handlers
chrono = logging.handlers.HTTPHandler(
'localhost:8080', '/api/v1/record', 'POST', credentials = ('logger', ''))
handlers = [logging.StreamHandler(), chrono]
logging.basicConfig(level = logging.DEBUG, handlers = handlers)
The same can be expressed declaratively:
import logging.config
conf = {
'version' : 1,
'disable_existing_loggers' : False,
'handlers' : {
'console' : {
'class' : 'logging.StreamHandler',
},
'http' : {
'class' : 'logging.handlers.HTTPHandler',
'host' : 'localhost:8080',
'url' : '/api/v1/record',
'method' : 'POST',
'credentials' : ('logger', ''),
'secure' : False
},
},
'root' : {
'handlers' : ['console', 'http'],
'level' : 'DEBUG'
}
}
logging.config.dictConfig(conf)
This configuration is called naive because the handler is blocking. It may work in trivial cases but generally it’s discouraged because the network is not reliable [6]. Instead Python provides logging queueing in stdlib [7]:
Along with QueueHandler class, QueueListener is used to let handlers do their work on a separate thread. This is important for web and other applications where threads serving clients need to respond as quickly as possible, while any potentially slow, and especially complementary operations are done in background.
Here follows imperative configuration with memory queueing.
chrono = logging.handlers.HTTPHandler(
'localhost:8080', '/api/v1/record', 'POST', credentials = ('logger', ''))
q = queue.Queue(maxsize = 4096)
qh = logging.handlers.QueueHandler(q)
ql = logging.handlers.QueueListener(q, chrono)
ql.start()
handlers = [logging.StreamHandler(), qh]
logging.basicConfig(level = logging.DEBUG, handlers = handlers)
# somewhere on shutdown
ql.stop()
Because the queue listener’s shutdown procedure is inconvenient this way and it’s hard to express declaratively, QueueProxyHandler is suggested.
import logging.handlers
import logging.config
class QueueProxyHandler(logging.handlers.QueueHandler):
'''Queue handler which creates its own ``QueueListener`` to
proxy log records via provided ``queue`` to ``target`` handler.'''
_listener = None
'''Queue listener'''
def __init__(self, queue, target = logging.handlers.HTTPHandler, **kwargs):
# user-supplied factory is not converted by default
if isinstance(queue, logging.config.ConvertingDict):
queue = queue.configurator.configure_custom(queue)
super().__init__(queue)
self._listener = logging.handlers.QueueListener(queue, target(**kwargs))
self._listener.start()
def close(self):
super().close()
self._listener.stop()
conf = {
'version' : 1,
'disable_existing_loggers' : False,
'handlers' : {
'console' : {
'class' : 'logging.StreamHandler',
},
'http' : {
'class' : 'somemodule.QueueProxyHandler',
'queue' : {'()': 'queue.Queue', 'maxsize': 4096},
'host' : 'localhost:8080',
'url' : '/api/v1/record',
'method' : 'POST',
'credentials' : ('logger', ''),
'secure' : False
},
},
'root' : {
'handlers' : ['console', 'http'],
'level' : 'DEBUG'
}
}
logging.config.dictConfig(conf)
Client
For convenience reasons, the above is available as chronologer.client.QueueProxyHandler.
In addition it has logger name prefixing and suffixing capability, and some edge case resilience. prefix is passed to QueueProxyHandler on creation. It allows many applications logging into the same Chronologer instance to have separate logger namespaces (e.g. including aiohttp logging whose namespace is fixed). suffix is an extra attribute of LogRecord which allows to fine-tune the logger namespace for easier search of the records.
import logging.config
conf = {
'version' : 1,
'disable_existing_loggers' : False,
'handlers' : {
'console' : {
'class' : 'logging.StreamHandler',
},
'http' : {
'class' : 'chronologer.client.QueueProxyHandler',
'queue' : {'()': 'queue.Queue', 'maxsize': 4096},
'prefix' : 'appname',
'host' : 'localhost:8080',
'url' : '/api/v1/record',
'method' : 'POST',
'credentials' : ('logger', ''),
'secure' : False
},
},
'root' : {
'handlers' : ['console', 'http'],
'level' : 'DEBUG'
}
}
logging.config.dictConfig(conf)
logging.getLogger('some').info(
'Chronologer!', extra = {'suffix': 'important.transfer'})
The LogRecord corresponding to the last line will have name equal to 'appname.some.important.transfer'. If name is modified the original is saved as origname.
But this is unfortunately not it. Looking at logging.handlers.HTTPHandler carefully we can see a few flaws, including but not limited to:
it doesn’t validate response codes, say 403 Forbidden, and will silently ignore the error, i.e. not calling logging.Handler.handleError, will leads to data loss,
it doesn’t support request retries,
it doesn’t support buffering to improve throughput,
it doesn’t support other serialisation formats but application/x-www-form-urlencoded.
chronologer.client.BatchJsonHandler tries to address these issues, see Quickstart.
JSON input support
Besides application/x-www-form-urlencoded of HTTPHandler Chronologer supports application/json of the same structure. It also supports application/x-ndjson [19] for bulk ingestion.
JSON of arbitrary structure can be ingested in the raw mode. In the mode Chronologer will not classify input into logging meta, data and error and will not insist on presence of Python logging-specific keys. For example, a file containing newline separated JSON entries can be sent to Chronologer like:
curl -H "content-type: application/x-ndjson" --user logger: \
--data-binary @/path/to/some/file.ndjson localhost:8080/api/v1/record?raw=1
Record retention
When CHRONOLOGER_RETENTION_DAYS is set, daily, around midnight a background thread will purge records older than given number of days.
Authentication
Chronologer does not provide (neither intends to) a user management. The intent is to delegate authentication. The credentials and roles used by the server can be provided by the following environment variables:
CHRONOLOGER_USERNAME
CHRONOLOGER_PASSWORD
CHRONOLOGER_ROLES – space separated role list (see below)
Alternatively a JSON file located by CHRONOLOGER_AUTHFILE of the following structure can be used to authenticate multiple users:
[
{
"username": "bob",
"pbkdf2": "f57ef1e3e8f90cb367dedd44091f251b5b15c9c36ddd7923731fa7ee41cbaa82",
"hashname": "sha256",
"salt": "c0139cff",
"iterations": 32,
"roles": ["writer"]
}, {
"username": "obo",
"pbkdf2": "ff680a9237549f698da5345119dec1ed314eb4fdefe59837d0724d747c3169089ae45...",
"hashname": "sha384",
"salt": "9230dbdd5a13f009",
"iterations": 4096,
"roles": ["basic-reader", "query-reader"]
}
]
The value of pbkdf2 and keys hashname, salt, iterations correspond to Python hashlib.pbkdf2_hmac [21].
API
By default Chronologer listens port 8080 and is protected by HTTP Basic Authentication, username “logger” without password (see environment variables to override these).
Chronologer provides Record resource.
Create record
URL |
/api/v1/record |
Method |
POST |
Request content-type |
application/x-www-form-urlencoded, application/json, application/x-ndjson |
Request body |
Representation of logging.LogRecord |
Response content-type |
application/json |
Response body |
Representation of created model.Record, except for application/x-ndjson input where only a list of insert record identifiers is returned |
Successful response code |
201 Created |
Optional raw mode, accepting arbitrary JSON documents, is supported by passing raw=1 into the query string.
application/x-ndjson request body can produce 207 Multi-Status response when a successful chunk is followed by a failed chunk, say that contained malformed a JSON line. Multi-status body looks like:
{
"multistatus": [
{"status": 201, "body": [1, 2, "..."]},
{"status": 400, "body": "Invalid JSON document on line 2012"},
]
}
Retrieve record count
URL |
/api/v1/record |
Method |
HEAD |
Query string |
Optional filtering fields (see details below):
|
Response headers |
|
Successful response code |
200 OK |
Retrieve record timeline
URL |
/api/v1/record |
Method |
HEAD |
Query string |
Required fields:
Optional filtering fields (see details below):
|
Response headers |
|
Successful response code |
200 OK |
Retrieve record range
URL |
/api/v1/record |
Method |
GET |
Query string |
Required fields:
Optional filtering fields (see details below):
|
Response content-type |
application/json |
Response body |
|
Successful response code |
200 OK |
Retrieve record
URL |
/api/v1/record/{id} |
Method |
GET |
Response content-type |
application/json |
Response body |
logrec has two nested dictionaries. data has what was passed to extra [16] and meta has internal fields of logging.LogRecord. |
Successful response code |
200 OK |
Error representation
Errors for HTTP method requests that allow a response body are represented like:
{
"error" : {
"type" : "HTTPError",
"message" : "Nothing matches the given URI"
}
}
Errors for HTTP method requests that don’t allow a response body are represented in the headers:
X-Error-Type: StorageQueryError
X-Error-Message: Make sure the query filter is a valid WHERE expression
Response encoding
Chronologer supports Gzip and Brotli response body encoding. The latter takes precedence because it provides significant improvement for verbose logging records.
Filtering
Filter fields have the following semantics:
after – ISO8601 timestamp. The predicate is true for a record which was created after given timestamp.
before – ISO8601 timestamp. The predicate is true for a record which was created before given timestamp.
level – integer logging level. The predicate is true for a record whose severity level is greater or equal to given level.
name – logging record prefix. Optionally can be a comma-separated list of prefixes. The predicate is true for a record whose logger name starts with any of given prefixes.
query – storage-specific expression. Requires the user to have query-reader role. See JSON path description below.
MySQL
Chronologer relies on a compressed InnoDB table which provides good compromise between reliability, data modelling, search features, performance and size of logged data. The data of logging records are written into logrec JSON field (see the initial migration [9] and examples above).
It is a good idea to have dedicated MySQL instance for Chronologer. Then, for instance, it is possible to fine-tune MySQL’s ACID guarantees, namely innodb_flush_log_at_trx_commit = 0 allow MySQL to write 1-second batches [10]. Disabling performance schema [11] by setting performance_schema = 0 is also recommended, because it has significant overhead. Basic InnoDB settings should be reasonably configured:
JSON path query
query passes a storage-specific expression. Particularly, it’s useful to write post-filtering conditions for logrec JSON field using JSONPath expressions and -> operator [15]. It may look like the following, though arbitrary WHERE clause expressions are possible.
"logrec->'$.data.foo' = 387 AND logrec->'$.meta.lineno' = 20"
"logrec->'$.meta.threadName' != 'MainThread'"
Note that connection to MySQL works in ANSI_QUOTES mode [18], so " cannot be used to form string literals. ' must be used instead.
Compression tuning
Initial migration [9] sets KEY_BLOCK_SIZE = 4. It may be sub-optimal for the shape of your log records. MySQL provides guidelines for choosing KEY_BLOCK_SIZE [23] and monitoring “compression failures” at runtime [24].
If you want to change KEY_BLOCK_SIZE for record table, you can provide your own database migration. Chronologer uses yoyo-migrations [25] for database migrations. For example, to switch to KEY_BLOCK_SIZE = 8 migration file, named 20190803T1404_key_size.py, will look like:
from yoyo import step
step('ALTER TABLE record KEY_BLOCK_SIZE = 8')
It can be mounted into the migration directory of Chonologer’s container in your docker-compose.yml like:
volumes:
- ./20190803T1404_key_size.py:/opt/chronologer/chronologer/migration/mysql/20190803T1404_key_size.py
Then re-apply migrations with migrate or run serve with -m command line flag.
Size limit
Chronologer’s MySQL schema has the following size limits.
logger name – 127 characters
text message – 255 characters
The structured part of a log record, logrec column, as a JSON column in general, has a limit.
It is important to keep in mind that the size of any JSON document stored in a JSON column is limited to the value of the max_allowed_packet system variable.
Also note that if a query with a JSON value fits max_allowed_packet bytes, it doesn’t necessary mean the JSON value fits max_allowed_packet bytes in its MySQL serialised representation [26].
SQLite
SQLite is supported for very simple, one-off or evaluation cases. Also it doesn’t support compression. JSON1 extension [20] is required for JSON Path queries.
"json_extract(logrec, '$.data.foo') = 387 AND json_extract(logrec, '$.meta.lineno') = 20"
"json_extract(logrec, '$.meta.threadName') = 'MainThread'"
A one-off Chronologer container with SQLite storage can be run on port 8080 like:
docker run --rm -it -p 8080:8080 -v /tmp/db \ -e CHRONOLOGER_STORAGE_DSN=sqlite:////tmp/db/chrono.sqlite \ -e CHRONOLOGER_SECRET=some_long_random_string \ saaj/chronologer \ python3.7 -m chronologer -e production serve -u www-data -g www-data -m
Two things to note:
-m to serve runs migrations before starting the server,
SQLite needs permissions to the directory where a database file resides, to write its temporary files.
R&D roadmap
See the roadmap issue.
Credits
Logo is contributed by lightypaints.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.