Some Kafka utility functions and patterns for the nubium project
Project description
Nubium Utils
Environment Variables
Nubium-Utils now has a prefix for every environment variable it uses. Every environment
variable related to Nubium-Utils starts with NU_
.
Similarly, anything DUDE-related will start with DUDE_
.
Important Environment Variables
Nubium-Utils relies on having these environment variables defined:
NU_SCHEMA_REGISTRY_URL
: schema registry urlNU_SCHEMA_REGISTRY_USERNAME
: schema registry username (if applicable)NU_SCHEMA_REGISTRY_PASSWORD
: schema registry password (if applicable)NU_KAFKA_CLUSTERS_CONFIGS_JSON
: json of clusters and their respective connection settings. EX:'{"cluster_0": {"url": "url", "username": "un", "password": "pw"}, "cluster_1": {"url": "url", "username": "un", "password": "pw"}}'
NU_TOPIC_CONFIGS_JSON
: json of topics and their respective cluster name (will reference NU_KAFKA_CLUSTERS) + (optional) create settings. EX:'{"topic_a": {"cluster": "cluster_0", "configs": {"num_partitions": 2, "replication_factor": 2, "config": {"segment.ms": 120000}}}, "topic_b": {"cluster": cluster_1}}'
NU_HOSTNAME
: used for identifying unique consumer instancesNU_APP_NAME
: used to define your consumer groupNU_MP_PROJECT
: used for openshift deployment
Confluent-Kafka GTFO App Classes
Overview
Nubium Utils now has an app framework for managing exactly-once processing confluent-kafka applications, named GTFO
.
The idea is to simplify confluent-kafka down to a kafka-streams-like interface, where consumer management is largely
handled for you under the hood. This is particularly nice for having to manage exactly-once processing,
which the class implements and uses by default.
There are some other subclasses for some more specific use cases, namely GtfoBatchApp
and GtfoTableApp
, but
the base class GtfoApp
has been written with the intention of being easily extensible. Further details in
terms of the recommended approaches of this will be described below.
The Transaction
and Gtfo
classes
There are two essential classes you should understand before you dive in.
The Transaction
classes are actually the heart of the GTFO
framework; they are what your business
logic methods will actually be interacting with on a message-to-message basis. That being said, for most
use cases, you wont even need to do much with them other than Transaction.messages()
to get the currently consumed
message, and Transaction.produce()
to send a new message out.
The Gtfo
-based classes generally wrap/interface with the Transaction
objects, generating a new one for
every new transaction (which generally consists of consuming a message, produce desired messages, commit
consumed message.) In general, you will really only use the Gtfo
class to initialize everything, and it doesn't
take much.
Finally, as a general rule of thumb for both, there are not many methods to interact with on either class... on purpose! The functionality outlined in here will likely cover >90% of use cases.
Initializing/running a GtfoApp
: basic example
NOTE: there is a lot that is managed for you via environment variables, so definitely take a look at the
"Important Environment Variables" section to see what you should have defined before trying to run a GTFO
app.
There are two basic components you need to initialize an app at all:
-
app_function
: the business logic of your application, which should use exactly 1 argument: theTransaction
objects that will be handed to it for every message. -
consume_topics_list
: the topics you are consuming. Can be comma-separated string or python list.
That's it! That being said, this is if your app only consumes. To produce, you will additionally need, at minimum:
produce_topic_schema_dict
: a dict that maps {'topic_name': schema_obj}, where the schema_obj is a valid avro schema dict.
Then, to run the app, do:
GtfoApp.run()
Altogether, that might look something like this:
from nubium_utils.confluent_utils.transaction_utils import GtfoApp
useless_schema = {
"name": "CoolSchema",
"type": "record",
"fields": [
{
"name": "cool_field",
"type": "string"
}
]
}
def useless_example_func(transaction, thing_inited_at_runtime):
msg = transaction.messages() # can also do transaction.value(); uses the confluent-kafka method with single messages
cool_message_out = msg.value()
cool_message_out['cool_field'] = thing_inited_at_runtime # 'cool value'
transaction.produce({'topic':'cool_topic_out', 'key': msg.key(), 'value': cool_message_out, 'headers': msg.headers()})
init_at_runtime_thing = 'cool value' # can hand off objects you need to init at runtime, like an api session object
gtfo_app = GtfoApp(
app_function=useless_example_func,
consume_topics_list=['test_topic_1', 'test_topic_2'],
produce_topic_schema_dict={'cool_topic_out': useless_schema},
app_function_arglist = [init_at_runtime_thing]) # optional! Here to show functionality.
gtfo_app.run()
Using GtfoBatchApp
(plus BatchTransaction
)
Sometimes, you want to handle multiple messages at once, such as doing a bulk upload of data to an API.
In this case, treating each message as a separate transaction doesn't make much sense! For this, we have GtfoBatchApp
!
We still rely on much of what Gtfo
and Transaction
lays out, but now we can specify how many messages should
be consumed by default for a given transaction. Additionally, you can consume more messages on demand with the
BulkTransaction
object via BulkTransaction.consume()
in your app_function
, in case you'd like to do the normal
consume pattern most of the time, but might need that to change on demand.
Biggest thing to note here: all messages for that transaction will (just like Transaction
) be accessible via
BatchTransaction.messages()
. Also, all of them get committed when you finish with that instance of app_function
just
like a typical singleton run of app_function
, so keep that in mind!
You can tweak the default batch size via NU_CONSUMER_DEFAULT_BATCH_CONSUME_MAX_COUNT
; defaulted to 1.
Here is an example:
from nubium_utils.confluent_utils.transaction_utils import GtfoBatchApp
from cool_api import bulk_upload
useless_schema = {
"name": "CoolSchema",
"type": "record",
"fields": [
{
"name": "cool_field",
"type": "string"
}
]
}
def prep_data(msg_val):
# do stuff
pass
def useless_example_func(bulk_transaction):
msg = transaction.messages()
last_msg = msg[-1]
# I originally set my max to 10 via the env var...but with some logic, if a message says I should do a bigger batch...
if last_msg.headers()['needs_bigger_bulk'] == 'true':
bulk_transaction.consume(consume_max_count=500) # allow me to raise that max to 500, and consume up to that (will consume up to 490 more)!
bulk_upload([prep_data(msg.value()) for msg in bulk_transaction.messages()]) # push all 500 messages
gtfo_app = GtfoBatchApp(
app_function=useless_example_func,
consume_topic='test_topic_in',
produce_topic_schema_dict={'cool_topic_out': useless_schema})
gtfo_app.run()
Using GtfoTableApp
(plus TableTransaction
)
One of the more complex components of GTFO is the GtfoTableApp
. It allows you to store state based on a
kafka topic via a localized datastore. Basically, you can store whatever you want with respect to a given kafka key,
and later reference/compare that data against a new version of that key.
There are some functional caveats/limitations that come along with this feature set:
- The app can only consume from one topic (you can make an upstream "funnel" app as needed).
- You must make a topic named the same as
{NU_APP_NAME}__changelog
, with the same partition count as the topic it would consume from. - You can only store/reference things based on the same key as the current message.
- Each instance of your application needs to use the same volume.
- Data needs to be stored as a json/dict (it is stored as a string and
json.load
-ed at runtime)
With that in mind, set up is almost exactly the same as GtfoApp
; here is an example:
from nubium_utils.confluent_utils.transaction_utils import GtfoTableApp
useless_schema = {
"name": "CoolSchema",
"type": "record",
"fields": [
{
"name": "cool_field",
"type": "string"
}
]
}
def useless_example_func(transaction):
msg = transaction.messages() # can also do transaction.value(); uses the confluent-kafka method with single messages
cool_message_out = msg.value()
previous_message = transaction.read_table_entry()
# lets only do stuff if our previous version for the same key was "just cool"!
if previous_message['cool_field'] == 'just cool':
cool_message_out['cool_field'] = 'very cool now'
transaction.update_table_entry(cool_message_out) # we want to update the table with our new value. It does not do this automatically!
transaction.produce({'topic':'cool_topic_out', 'key': msg.key(), 'value': cool_message_out, 'headers': msg.headers()})
gtfo_app = GtfoTableApp(
app_function=useless_example_func,
consume_topic='test_topic_in',
produce_topic_schema_dict={'cool_topic_out': useless_schema})
gtfo_app.run()
Extending GtfoApp
and Transaction
Of course, you can never cover every use case! As such, each class was designed with extensibility in mind!
Most often, what needs more customization is your consumption pattern (hence why there was already a bulk class!), and there is a relatively painless way to address that with minimal alterations.
There is an init
argument on GtfoApp
named transaction_type
; this allows you to easily pass an augmented
version of Transaction
with your changed consumption pattern, potentially without changing the Transaction
class
at all!
Otherwise, hopefully things have been compartimentalized enough that you can just replace methods as needed, but
in general, usually you'll need to mess a little with both classes, but likely mostly Transaction
.
Monitoring
The monitoring utils enable metrics to be surfaced from the kafka applications so the Prometheus server can scrape them. The Prometheus server can't dynamically figure out pod IPs and scrape the services directly, so we're using a metrics cache instead.
The metrics cache is a StatefulSet with 2 services assigned to it. One service is a normal service, with a unique cluster IP address. The prometheus server scrapes this service endpoint. The other service doesn't have a cluster IP, which means that the monitoring utility can find the IP addresses of each of the backing pods, and send metrics to all of the pods. This setup gives us high-availability guarantees.
The Monitoring utils are organized into two classes, MetricsPusher
and MetricsManager
.
The MetricsManager
is a container for all of the metrics for the app,
and contains convenience methods for the 3 standardized metrics.
These metrics are
messages_consumed
: The number of messages consumed by the appmessages_produced
: The number of messages produced by the appmessage_errors
: The number of exceptions caught in the app (labeled by type)
The MetricsPusher
handles pushing the applications metrics to the metrics cache.
It determines the list of IP addresses for all of the metrics cache pods,
and sends the current metrics values for all of the metrics.
Metric names and labels
The names of the metrics in Prometheus are the same as their names as parameters
messages_consumed
messages_produced
message_errors
Two labels exist for every metric:
app
: The name of the microservice the metric came fromjob
: The name of the individual pod the metric came from Themessage_errors
metric also has another label:exception
: The name of the exception that triggered the metric
Monitoring Setup Examples
The initialization and update loop for application monitoring will differ from application to application based on their architecture. The following examples should cover the standard designs we use.
Default Kafka Client Application
A Kafka application that directly relies on interacting with Producer or Consumer clients should have it's monitoring classes set up and its pushing thread started in the main run function and passed to the loop, as follows:
import os
from confluent_kafka import Consumer, Producer
from nubium_utils.metrics import MetricsManager, MetricsPusher, start_pushing_metrics
def run_function():
consumer = Consumer()
producer = Producer()
metrics_pusher = MetricsPusher(
os.environ['HOSTNAME'],
os.environ['METRICS_SERVICE_NAME'],
os.environ['METRICS_SERVICE_PORT'],
os.environ['METRICS_POD_PORT'])
metrics_manager = MetricsManager(job=os.environ['HOSTNAME'], app=os.environ['APP_NAME'], metrics_pusher=metrics_pusher)
start_pushing_metrics(metrics_manager, int(os.environ['METRICS_PUSH_RATE']))
try:
while True:
loop_function(consumer, producer, metrics_manager=metrics_manager)
finally:
consumer.close()
The consume_message()
function from this library expects a metrics_manager object
as an argument, so that it can increment the messages_consumed
metric.
The application itself needs to increment the messages_produced
metric
needs to be incremented as necessary by the application itself
whenever a Kafka message is produced. The convenience method on the metrics_manager
inc_messages_produced()
makes this easier,
since it automatically adds the necessary labels to the metric.
The application also needs to be set to increment the message_errors
metric
whenever an exception is caught.
An example loop function might look like this:
import os
import logging
from nubium_utils import consume_message
from nubium_utils.custom_exceptions import NoMessageError
def loop_function(consumer, producer, metrics_manager):
try:
message = consume_message(consumer, int(os.environ['CONSUMER_POLL_TIMEOUT']), metrics_manager)
outgoing_key = message.value()['email_address']
producer.produce(topic='outgoing_topic',key=outgoing_key,value=message.value())
metrics_manager.inc_messages_produced(1)
except NoMessageError:
pass
except KeyError as error:
metrics_manager.inc_message_errors(error)
logging.debug('Message missing email address')
Flask Kafka Application
The setup becomes a little bit different with a Flask application. The metrics_manager should be accessible through the app's configuration, so that it can be accessed in route functions.
The preferred method for error monitoring is to hook into the built in
flask error handling loop, using the @app.errorhandler
decorator.
Here is an example create_app()
function
import flask
from werkzeug.exceptions import HTTPException
from .forms_producer_app import forms_producer
from .util_blueprint import app_util_bp
def create_app(config):
"""
Creates app from config and needed blueprints
:param config: (Config) object used to configure the flask app
:return: (flask.App) the application object
"""
app = flask.Flask(__name__)
app.config.from_object(config)
app.register_blueprint(forms_producer)
app.register_blueprint(app_util_bp)
@app.errorhandler(HTTPException)
def handle_exception(e):
"""
Increment error gauge on metrics_manager before returning error message
"""
response = e.get_response()
response.data = f'{e.code}:{e.name} - {e.description}'
app.config['MONITOR'].inc_message_errors(e)
return response
@app.errorhandler(Exception)
def unhandled_exception(error):
app.logger.error(f'Unhandled exception: {error}')
app.config['MONITOR'].inc_message_errors(error)
return f'Unhandled exception: {error}', 500
return app
The route functions for produced messages should increase the messages_produced
metric when necessary.
Example:
@forms_producer.route('/', methods=["POST"])
@AUTH.login_required
def handle_form():
"""
Ingests a dynamic form from Eloqua and produces it to the topic
"""
values = request.json
string_values = {key: str(value) for key, value in values.items()}
LOGGER.debug(f'Processing form: {values}')
current_app.config['PRODUCER'].produce(
topic=current_app.config['TOPIC'],
key=values['C_EmailAddress'],
value={'form_data': string_values},
on_delivery=produce_message_callback
)
current_app.config['MONITOR'].inc_messages_produced(1)
return jsonify(success=True)
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file nubium-utils-4.2.0.tar.gz
.
File metadata
- Download URL: nubium-utils-4.2.0.tar.gz
- Upload date:
- Size: 37.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0be95f913777154c7541649be9f7e0dcb0f7567afc0a7aa4637a6f3115fbcc89 |
|
MD5 | af7b528c5e4917d38e39ec1ae535872d |
|
BLAKE2b-256 | 24909c0cef4d47caf1d9bdc3e53c16edd2322592941cc79baacf73948110c0e9 |
File details
Details for the file nubium_utils-4.2.0-py3-none-any.whl
.
File metadata
- Download URL: nubium_utils-4.2.0-py3-none-any.whl
- Upload date:
- Size: 43.4 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.8.8
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 530dd4ef0deec15efd5d6beb855588b39c2101cbfdd0784863de0cb53ff81e98 |
|
MD5 | 8675fed3c212998c07d917039b7b7ee3 |
|
BLAKE2b-256 | f86b9f1f363d91c1653785698c53b619f581eacf0eea48fdf313770407d261b7 |