This project build pipelines for resolution score for Take BLiP
Project description
TakeResolution
Gabriel Salgado and Moises Mendes
Overview
Here is presented these content:
Intro
This project proposes to try to answer this: how much is resolution on this chatbot?
To discover the solution, analysed data includes bot structure and interactions events. These data are obtained from Spark database on Databricks cluster. A single run for this project intends to analyse data of a single bot on this database.
There are so far two pipelines: bot flow and bot events.
Bot Flow Pipeline
The first step of bot flow pipeline is collect bot data from Spark database. Bot data is, at all, a table with bot identity, flow described as JSON and others information. Then, defined bot flow is selected on this step.
Second is extract bot flow as graph.
Used tool here is networkx
to represent bot flow as a directional graph.
Bot Events Pipeline
We begin this pipeline by extracting bot events from a Spark database. From the events database, we select the following columns for a specific bot identity and time period:
- Category: name given to some tracked point in the bot flow.
- Action: subgroups within Category.
- Extras: extra information saved.
- ContactIdentity: user identity.
- OwnerIdentity: bot identity.
- StorageDateBR: datetime when event is saved.
DAUs Pipeline
We also get DAUs as total users by day.
Resolution Rate
Finally, with events on resolutive states and total users on day, resolution rate is calculated by each day. The final rate is the mean of daily resolution rate weighted by DAUs, such that days with more users are considered more important.
As long as we progress on this project, this description will include more details.
Configure
Here are shown recommended practices to configure project on local.
Virtual environment
This step can be done with commands or on PyCharm.
On commands
It is recommended to use virtual environment:
pip install venv
Create a virtual environment:
python -m venv venv
Enter to virtual environment (Windows):
./venv/Scripts/activate
Enter to virtual environment (Linux):
./venv/bin/activate
To exit virtual environment:
deactivate
On PyCharm
Open File/Settings...
or press Ctrl+Alt+S
.
This opens settings window.
Open Project: ResolutionAnalysis/Project Interpreter
on left menu.
Open Project Interpreter
combobox and click on Show All...
.
This opens a window with Python interpreters.
Click on +
or press Alt+Insert
.
This opens a window to create a new Python interpreter.
We will choose default options that create a new virtual environment into project.
Click on Ok
button.
Click on Ok
button again.
And again.
Configuring on PyCharm
If you are using PyCharm its better show PyCharm where is source code on project.
Right click on src
folder in Project
window at left side.
This opens context menu.
Choose Mark Directory as/Sources Root
option.
This marks src
as source root directory.
It will appears as blue folder on Project
navigator.
Install
The take_resolution
package can be installed from PyPI:
pip install take_resolution
Or from setup.py
, located at src
folder:
cd src
pip install . -U
cd ..
Installing take_resolution
also installs all required libraries.
But we can intended to only install dependencies or maybe update our environment if requirements changed.
All dependencies are declared in src/requirements.txt
.
Install dependencies can be done on command or on PyCharm.
On command
To install dependencies on environment, run:
python commands.py install
On PyCharm
After you created virtual environment or on open PyCharm, it will ask if you want to install requirements.
Choose Install
.
Test
You can test on commands or on PyCharm. It is being build.
On commands
First enter to virtual environment. Then run kedro tests:
python commands.py test
When this feature is built:
See coverage results at htmlcov/index.html
.
On PyCharm
Click on Edit Configurations...
beside Run
icon.
This opens Run/Debug Configurations window.
Click on +
or press Alt+Insert
.
Choose Python tests/pytest
option.
Fill Target
field with path to tests folder as <path to project>/src/tests
.
Click on Ok
button.
Click on Run
icon.
This run the tests.
Open Terminal
window and run command to generate HTML report:
coverage html
See coverage results at htmlcov/index.html
.
Package
First enter to virtual environment.
To package this project into .egg
and .whell
:
python commands.py package
Generated packages will be in folder src/dist
.
Each new package, do not forget to increase version at src/take_resolution/__init__.py
Upload
To upload build package to PyPI:
python commands.py upload
This upload the latest build version. After, package can be downloaded and installed by pip in any place with python and pip:
pip install take_resolution
Pipelines
Pipelines are described on a conf file conf/base/pipelines.json
.
See an example for content:
{
"pipeline_1": {
"nodes": [
{
"input": [
"input.number",
"params.a",
"params.b"
],
"output": "output_1",
"function": "my_module.function_1"
},
{
"input": [
"output_1",
[
"params.x1",
"params.x2"
],
[
"params.y1",
"params.y2"
]
],
"output": "output_2",
"function": "my_module.function_2"
},
{
"input": [
"output_2"
],
"output": "output_3",
"function": "my_module.function_3"
}
],
"output": {
"raw": [
"output_1"
],
"intermediate": [
"output_2"
],
"primary": [
"output_3"
]
}
},
"pipeline_2": {
"nodes": [
{
"input": [
"input.number",
"params.q"
],
"output": "output_4",
"function": "my_module.function_4"
}
],
"output": {
"raw": [
"output_4"
]
}
}
}
Run
To run a given pipeline:
import take_resolution as tr
input = {'number': 12}
tr.run('pipeline_1', **input)
Where 'pipeline_1'
is pipeline name and this pipeline is described on pipelines.json
.
To run all pipelines described on pipelines.json
:
import take_resolution as tr
input = {'number': 12}
tr.run(**input)
Notebooks
Packaging this project is intended to be installed on a specific Databricks cluster.
This is the cluster where we work with ML experiments using mlflow
.
And an experiment is done as example notebooks on shared
, that is like:
import mlflow as ml
import take_resolution as tr
with ml.start_run():
# experiment code using our pipelines
input = {}
output = tr.run('pipeline_1', **input)
# logging our parameters
params = tr.load_params()
ml.log_params(params)
# logging some value on output
output_3 = output['primary']['output_3']
ml.log_metric('output_3', output_3)
Tips
In order to maintain the project:
- Do not remove or change any lines from the
.gitignore
unless you know what are you doing. - When developing experiments and production, follow data standard related to suitable layers.
- When developing experiments, put them into notebooks, following code policies.
- Write notebooks on Databricks and synchronize it to this repository into particular sub-folder in folder
notebooks
and commit them. - Do not commit any data.
- Do not commit any log file.
- Do not commit any credentials or local configuration.
- Keep all credentials or local configuration in folder
conf/local/
. - Do not commit any generated file on testing or building processes.
- Run test before pull request to make sure that has no bug.
- Follow git flow practices:
- Create feature branch for new feature from
dev
branch. Work on this branch with commits and pushes. Send a pull request todev
branch when terminate the work. - When terminate a set of features to release, merge
dev
branch totest
branch. Apply several and strict tests to be sure that all are fine. On find errors, fix all and apply tests again. When all are ok, merge fromtest
tomaster
increasing release version and uploading to PyPI. - If some bug is found on production,
master
branch, create hotfix branch frommaster
. Correct all errors and apply tests like intest
branch. When all are ok, merge from hotfix branch tomaster
and then, merge frommaster
todev
.
- Create feature branch for new feature from
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distributions
Hashes for take_resolution-0.12.1-py3.7.egg
Algorithm | Hash digest | |
---|---|---|
SHA256 | aa2a7b221573ec897b9b9ed83a13ee60d7275a36c7dc3fe39d9142af38f72161 |
|
MD5 | 674cdea16be6b563a2b3974a1a818099 |
|
BLAKE2b-256 | bbe2865b3a0b28854b0fdf62ca13fcbd8bde7849527d9a42902f6cd621e85d40 |
Hashes for take_resolution-0.12.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a2458929f0c1c01712f5d2226adf087ace19cf11c53a299aeb2154b98be470c4 |
|
MD5 | eb5777d8d050d47a59e6c88e4aa63628 |
|
BLAKE2b-256 | fcec9c06431949a484c0dc9288beca85b568db60cc3e8fa1fca4855b6a9bbd99 |