Skip to main content

Versatile Data Kit SDK troubleshooting plugin to assist in troubleshooting deployed data jobs.

Project description

VDK-JOBS-TROUBLESHOOTING Plugin

monthly download count for vdk-jobs-troubleshooting

The VDK JOB Troubleshooting plugin provides the ability to add various troubleshooting utilities which can be accessed during the data job runtime.

Generally it's quite hard to produce a thread dump of a python process running inside a kubernetes pod. So this plugin provides a thread-dump utility

Adding the plugin to your data job

Before you can use the plugin you should add it to your data job or custom-sdk, by adding the following line to your requirements.txt file:

vdk-jobs-troubleshooting

Next you need to add the list of utilities to use to your data job configuration. For example, to enable the thread-dump utility you have to add the following to your data job's config.ini file.

[vdk]

TROUBLESHOOT_UTILITIES_TO_USE=thread-dump

Getting a thread dump from a running job

During the startup of the data job the troubleshooting utility will log a message with the port it is running on. Example:

Troubleshooting utility server will start on port 8783.

So in order to get a thread dump from the running data job do the following.

  1. Review the logs (using the kubectl logs pod/ command ) and find the troubleshooting utility port
  2. Start a port forward from your local machine to the target pod and port for example
kubectl port-forward pods/my-problematic-job-1691419320-jcsn7 8783:8783
  1. In a new/different console execute a curl command, to the pod. Example
curl localhost:8783/threads

The thread-dump will be printed in the console.

Sample output
Thread:MainThread alive:True daemon:False
Thread:troubleshooting_utility alive:True daemon:True
Thread:payload-aggregator alive:True daemon:True
Thread:payload-poster0 alive:True daemon:True
Thread:ThreadPoolExecutor-0_0 alive:True daemon:True
Thread:payload-aggregator alive:True daemon:True
Thread:payload-poster0 alive:True daemon:True
...
Thread:payload-poster9 alive:True daemon:True
Thread:troubleshooting_utility alive:True daemon:True
 # ThreadID: 140056075323136
 /usr/local/lib/python3.7/threading.py::890::_bootstrap::self._bootstrap_inner()
 /usr/local/lib/python3.7/threading.py::926::_bootstrap_inner::self.run()
 /usr/local/lib/python3.7/threading.py::870::run::self._target(*self._args, **self._kwargs)
 /usr/local/lib/python3.7/socketserver.py::232::serve_forever::ready = selector.select(poll_interval)
 /usr/local/lib/python3.7/selectors.py::415::select::fd_event_list = self._selector.poll(timeout)
 # ThreadID: 140054872303360
...
 # ThreadID: 140055860274944
 /usr/local/lib/python3.7/threading.py::890::_bootstrap::self._bootstrap_inner()
 /usr/local/lib/python3.7/threading.py::926::_bootstrap_inner::self.run()
 /usr/local/lib/python3.7/threading.py::870::run::self._target(*self._args, **self._kwargs)
 /usr/local/lib/python3.7/socketserver.py::237::serve_forever::self._handle_request_noblock()
 /usr/local/lib/python3.7/socketserver.py::316::_handle_request_noblock::self.process_request(request, client_address)
 /usr/local/lib/python3.7/socketserver.py::347::process_request::self.finish_request(request, client_address)
 /usr/local/lib/python3.7/socketserver.py::360::finish_request::self.RequestHandlerClass(request, client_address, self)
 /usr/local/lib/python3.7/socketserver.py::720::__init__::self.handle()
 /usr/local/lib/python3.7/http/server.py::434::handle::self.handle_one_request()
 /usr/local/lib/python3.7/http/server.py::422::handle_one_request::method()
 /vdk/site-packages/vdk/plugin/jobs_troubleshoot/troubleshoot_utilities/thread_dump.py::36::do_GET::self._log_thread_dump()
 /vdk/site-packages/vdk/plugin/jobs_troubleshoot/troubleshoot_utilities/thread_dump.py::58::_log_thread_dump::for filename, lineno, name, line in traceback.extract_stack(stack):
 # ThreadID: 140056193533760
 /vdk/vdk::8::::sys.exit(main())
 /vdk/site-packages/vdk/internal/cli_entry.py::186::main::command_line_args=sys.argv[1:],
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/cli_entry.py::140::vdk_main::program_name=program_name,
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/cli_entry.py::100::vdk_cli_execute::obj=core_context,
 /vdk/site-packages/click/core.py::1157::__call__::return self.main(*args, **kwargs)
 /vdk/site-packages/click/core.py::1078::main::rv = self.invoke(ctx)
 /vdk/site-packages/click/core.py::1688::invoke::return _process_result(sub_ctx.command.invoke(sub_ctx))
 /vdk/site-packages/click/core.py::1434::invoke::return ctx.invoke(self.callback, **ctx.params)
 /vdk/site-packages/click/core.py::783::invoke::return __callback(*args, **kwargs)
 /vdk/site-packages/click/decorators.py::33::new_func::return f(get_current_context(), *args, **kwargs)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/cli_run.py::221::run::context, pathlib.Path(data_job_directory), arguments
 /vdk/site-packages/vdk/internal/builtin_plugins/run/cli_run.py::143::create_and_run_data_job::execution_result = job.run(args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::312::run::return self._plugin_hook.run_job(context=job_context)
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::142::run_job::context=context, step=current_step
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::73::run_step::step_executed = step.runner_func(step, context.job_input)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::103::run_python_step::StepFuncFactory.invoke_run_function(func, job_input, step.name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::139::invoke_run_function::func(**actual_arguments)
 /job/starshot-prod-processing-csp-jira/common_library/send_slack_msg_on_job_failure.py::12::run::return func(job_input) /job/starshot-prod-processing-csp-jira/processing-csp-jira.py::8::run::load_dw_objects(job_input, dw_objects_to_load)
 /job/starshot-prod-processing-csp-jira/common_library/processing_templates.py::60::load_dw_objects::additional_params=additional_params
 /job/starshot-prod-processing-csp-jira/common_library/processing_templates.py::36::load_dw_object::template_args=template_parameters
 /vdk/site-packages/supercollider/vdk/telemetry/telemetry_plugin.py::83::execute_template::return core_execute_template(template_name, template_args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/job_input.py::167::execute_template::result = self.__templates.execute_template(template_name, template_args)
 /vdk/site-packages/vdk/internal/builtin_plugins/templates/template_impl.py::53::execute_template::result = template_job.run(template_args, name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::312::run::return self._plugin_hook.run_job(context=job_context)
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::142::run_job::context=context, step=current_step
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::73::run_step::step_executed = step.runner_func(step, context.job_input)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::103::run_python_step::StepFuncFactory.invoke_run_function(func, job_input, step.name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::139::invoke_run_function::func(**actual_arguments)
 /vdk/site-packages/vdk/plugin/impala/templates/load/dimension/scd1/02-handle-quality-checks.py::59::run::job_input.execute_query(insert_into_target)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/job_input.py::127::execute_query::return connection.execute_query(query)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_connection_base.py::120::execute_query::cur.execute(query)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_cursor.py::96::execute::result = self._execute_operation(managed_operation)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_cursor.py::168::_execute_operation::execution_cursor=execution_cursor
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/connection_hooks.py::33::db_connection_execute_operation::native_result = execution_cursor.execute(managed_operation.get_operation())
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/pep249/interfaces.py::64::execute::return self._cursor.execute(operation)
 /vdk/site-packages/impala/hiveserver2.py::343::execute::self._wait_to_finish()  # make execute synchronous
 /vdk/site-packages/impala/hiveserver2.py::438::_wait_to_finish::time.sleep(self._get_sleep_interval(loop_start))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk-jobs-troubleshooting-0.2.1190994517.tar.gz (10.6 kB view details)

Uploaded Source

File details

Details for the file vdk-jobs-troubleshooting-0.2.1190994517.tar.gz.

File metadata

File hashes

Hashes for vdk-jobs-troubleshooting-0.2.1190994517.tar.gz
Algorithm Hash digest
SHA256 cfdf788e6ad7f58467978b73a91a11384af6a3546fffdf043b9f0a32eb470162
MD5 9faf7b4858d2070a0fa6f8e6ae6089c8
BLAKE2b-256 accd1190249cd9e0286298e8897b5d5e3e8792ce1efee021ebf2404c0a724b06

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page