Skip to main content

Versatile Data Kit SDK troubleshooting plugin to assist in troubleshooting deployed data jobs.

Project description

VDK-JOBS-TROUBLESHOOTING Plugin

monthly download count for vdk-jobs-troubleshooting

The VDK JOB Troubleshooting plugin provides the ability to add various troubleshooting utilities which can be accessed during the data job runtime.

Generally it's quite hard to produce a thread dump of a python process running inside a kubernetes pod. So this plugin provides a thread-dump utility

Adding the plugin to your data job

Before you can use the plugin you should add it to your data job or custom-sdk, by adding the following line to your requirements.txt file:

vdk-jobs-troubleshooting

Next you need to add the list of utilities to use to your data job configuration. For example, to enable the thread-dump utility you have to add the following to your data job's config.ini file.

[vdk]

TROUBLESHOOT_UTILITIES_TO_USE=thread-dump

Getting a thread dump from a running job

During the startup of the data job the troubleshooting utility will log a message with the port it is running on. Example:

Troubleshooting utility server will start on port 8783.

So in order to get a thread dump from the running data job do the following.

  1. Review the logs (using the kubectl logs pod/ command ) and find the troubleshooting utility port
  2. Start a port forward from your local machine to the target pod and port for example
kubectl port-forward pods/my-problematic-job-1691419320-jcsn7 8783:8783
  1. In a new/different console execute a curl command, to the pod. Example
curl localhost:8783/threads

The thread-dump will be printed in the console.

Sample output
Thread:MainThread alive:True daemon:False
Thread:troubleshooting_utility alive:True daemon:True
Thread:payload-aggregator alive:True daemon:True
Thread:payload-poster0 alive:True daemon:True
Thread:ThreadPoolExecutor-0_0 alive:True daemon:True
Thread:payload-aggregator alive:True daemon:True
Thread:payload-poster0 alive:True daemon:True
...
Thread:payload-poster9 alive:True daemon:True
Thread:troubleshooting_utility alive:True daemon:True
 # ThreadID: 140056075323136
 /usr/local/lib/python3.7/threading.py::890::_bootstrap::self._bootstrap_inner()
 /usr/local/lib/python3.7/threading.py::926::_bootstrap_inner::self.run()
 /usr/local/lib/python3.7/threading.py::870::run::self._target(*self._args, **self._kwargs)
 /usr/local/lib/python3.7/socketserver.py::232::serve_forever::ready = selector.select(poll_interval)
 /usr/local/lib/python3.7/selectors.py::415::select::fd_event_list = self._selector.poll(timeout)
 # ThreadID: 140054872303360
...
 # ThreadID: 140055860274944
 /usr/local/lib/python3.7/threading.py::890::_bootstrap::self._bootstrap_inner()
 /usr/local/lib/python3.7/threading.py::926::_bootstrap_inner::self.run()
 /usr/local/lib/python3.7/threading.py::870::run::self._target(*self._args, **self._kwargs)
 /usr/local/lib/python3.7/socketserver.py::237::serve_forever::self._handle_request_noblock()
 /usr/local/lib/python3.7/socketserver.py::316::_handle_request_noblock::self.process_request(request, client_address)
 /usr/local/lib/python3.7/socketserver.py::347::process_request::self.finish_request(request, client_address)
 /usr/local/lib/python3.7/socketserver.py::360::finish_request::self.RequestHandlerClass(request, client_address, self)
 /usr/local/lib/python3.7/socketserver.py::720::__init__::self.handle()
 /usr/local/lib/python3.7/http/server.py::434::handle::self.handle_one_request()
 /usr/local/lib/python3.7/http/server.py::422::handle_one_request::method()
 /vdk/site-packages/vdk/plugin/jobs_troubleshoot/troubleshoot_utilities/thread_dump.py::36::do_GET::self._log_thread_dump()
 /vdk/site-packages/vdk/plugin/jobs_troubleshoot/troubleshoot_utilities/thread_dump.py::58::_log_thread_dump::for filename, lineno, name, line in traceback.extract_stack(stack):
 # ThreadID: 140056193533760
 /vdk/vdk::8::::sys.exit(main())
 /vdk/site-packages/vdk/internal/cli_entry.py::186::main::command_line_args=sys.argv[1:],
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/cli_entry.py::140::vdk_main::program_name=program_name,
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/cli_entry.py::100::vdk_cli_execute::obj=core_context,
 /vdk/site-packages/click/core.py::1157::__call__::return self.main(*args, **kwargs)
 /vdk/site-packages/click/core.py::1078::main::rv = self.invoke(ctx)
 /vdk/site-packages/click/core.py::1688::invoke::return _process_result(sub_ctx.command.invoke(sub_ctx))
 /vdk/site-packages/click/core.py::1434::invoke::return ctx.invoke(self.callback, **ctx.params)
 /vdk/site-packages/click/core.py::783::invoke::return __callback(*args, **kwargs)
 /vdk/site-packages/click/decorators.py::33::new_func::return f(get_current_context(), *args, **kwargs)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/cli_run.py::221::run::context, pathlib.Path(data_job_directory), arguments
 /vdk/site-packages/vdk/internal/builtin_plugins/run/cli_run.py::143::create_and_run_data_job::execution_result = job.run(args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::312::run::return self._plugin_hook.run_job(context=job_context)
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::142::run_job::context=context, step=current_step
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::73::run_step::step_executed = step.runner_func(step, context.job_input)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::103::run_python_step::StepFuncFactory.invoke_run_function(func, job_input, step.name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::139::invoke_run_function::func(**actual_arguments)
 /job/starshot-prod-processing-csp-jira/common_library/send_slack_msg_on_job_failure.py::12::run::return func(job_input) /job/starshot-prod-processing-csp-jira/processing-csp-jira.py::8::run::load_dw_objects(job_input, dw_objects_to_load)
 /job/starshot-prod-processing-csp-jira/common_library/processing_templates.py::60::load_dw_objects::additional_params=additional_params
 /job/starshot-prod-processing-csp-jira/common_library/processing_templates.py::36::load_dw_object::template_args=template_parameters
 /vdk/site-packages/supercollider/vdk/telemetry/telemetry_plugin.py::83::execute_template::return core_execute_template(template_name, template_args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/job_input.py::167::execute_template::result = self.__templates.execute_template(template_name, template_args)
 /vdk/site-packages/vdk/internal/builtin_plugins/templates/template_impl.py::53::execute_template::result = template_job.run(template_args, name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::312::run::return self._plugin_hook.run_job(context=job_context)
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::142::run_job::context=context, step=current_step
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/data_job.py::73::run_step::step_executed = step.runner_func(step, context.job_input)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::103::run_python_step::StepFuncFactory.invoke_run_function(func, job_input, step.name)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/file_based_step.py::139::invoke_run_function::func(**actual_arguments)
 /vdk/site-packages/vdk/plugin/impala/templates/load/dimension/scd1/02-handle-quality-checks.py::59::run::job_input.execute_query(insert_into_target)
 /vdk/site-packages/vdk/internal/builtin_plugins/run/job_input.py::127::execute_query::return connection.execute_query(query)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_connection_base.py::120::execute_query::cur.execute(query)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_cursor.py::96::execute::result = self._execute_operation(managed_operation)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/managed_cursor.py::168::_execute_operation::execution_cursor=execution_cursor
 /vdk/site-packages/pluggy/_hooks.py::433::__call__::return self._hookexec(self.name, self._hookimpls, kwargs, firstresult)
 /vdk/site-packages/pluggy/_manager.py::112::_hookexec::return self._inner_hookexec(hook_name, methods, kwargs, firstresult)
 /vdk/site-packages/pluggy/_callers.py::80::_multicall::res = hook_impl.function(*args)
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/connection_hooks.py::33::db_connection_execute_operation::native_result = execution_cursor.execute(managed_operation.get_operation())
 /vdk/site-packages/vdk/internal/builtin_plugins/connection/pep249/interfaces.py::64::execute::return self._cursor.execute(operation)
 /vdk/site-packages/impala/hiveserver2.py::343::execute::self._wait_to_finish()  # make execute synchronous
 /vdk/site-packages/impala/hiveserver2.py::438::_wait_to_finish::time.sleep(self._get_sleep_interval(loop_start))

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

vdk_jobs_troubleshooting-0.2.1431637373.tar.gz (10.7 kB view details)

Uploaded Source

File details

Details for the file vdk_jobs_troubleshooting-0.2.1431637373.tar.gz.

File metadata

File hashes

Hashes for vdk_jobs_troubleshooting-0.2.1431637373.tar.gz
Algorithm Hash digest
SHA256 60514c455da26e6415df53559eea02886027072039fc61dc7051ea075189a59c
MD5 ed9a9b83b34b14d6f8b157657fe7c328
BLAKE2b-256 8885a27a7a1e6a2fbf892b2a2e2cfe2e9c27fbf2dc5da2929f7e6d7599c6f821

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page