Skip to main content

This project build pipelines for resolution score for Take BLiP

Project description

TakeResolution

Gabriel Salgado and Moises Mendes

Overview

Here is presented these content:

Intro

This project proposes to try to answer this: how much is resolution on this chatbot?

To discover the solution, analysed data includes bot structure and interactions events. These data are obtained from Spark database on Databricks cluster. A single run for this project intends to analyse data of a single bot on this database.

There are so far two pipelines: bot flow and bot events.

Bot Flow Pipeline

The first step of bot flow pipeline is collect bot data from Spark database. Bot data is, at all, a table with bot identity, flow described as JSON and others information. Then, defined bot flow is selected on this step.

Second is extract bot flow as graph. Used tool here is networkx to represent bot flow as a directional graph.

Bot Events Pipeline

We begin this pipeline by extracting bot events from a Spark database. From the events database, we select the following columns for a specific bot identity and time period:

  • Category: name given to some tracked point in the bot flow.
  • Action: subgroups within Category.
  • Extras: extra information saved.
  • ContactIdentity: user identity.
  • OwnerIdentity: bot identity.
  • StorageDateBR: datetime when event is saved.

As long as we progress on this project, this description will include more details.

Configure

Here are shown recommended practices to configure project on local.

Virtual environment

This step can be done with commands or on PyCharm.

On commands

It is recommended to use virtual environment:

pip install venv

Create a virtual environment:

python -m venv venv

Enter to virtual environment (Windows):

./venv/Scripts/activate

Enter to virtual environment (Linux):

./venv/bin/activate

To exit virtual environment:

deactivate

On PyCharm

Open File/Settings... or press Ctrl+Alt+S. This opens settings window.

Open Project: ResolutionAnalysis/Project Interpreter on left menu.

Open Project Interpreter combobox and click on Show All.... This opens a window with Python interpreters.

Click on + or press Alt+Insert. This opens a window to create a new Python interpreter.

We will choose default options that create a new virtual environment into project. Click on Ok button.

Click on Ok button again. And again.

Configuring on PyCharm

If you are using PyCharm its better show PyCharm where is source code on project. Right click on src folder in Project window at left side. This opens context menu.

Choose Mark Directory as/Sources Root option. This marks src as source root directory. It will appears as blue folder on Project navigator.

Install

The take_resolution package can be installed from PyPI:

pip install take_resolution

Or from setup.py, located at src folder:

cd src
pip install . -U
cd ..

Installing take_resolution also installs all required libraries. But we can intended to only install dependencies or maybe update our environment if requirements changed.

All dependencies are declared in src/requirements.txt. Install dependencies can be done on command or on PyCharm.

On command

To install dependencies on environment, run:

python commands.py install

On PyCharm

After you created virtual environment or on open PyCharm, it will ask if you want to install requirements. Choose Install.

Test

You can test on commands or on PyCharm. It is being build.

On commands

First enter to virtual environment. Then run kedro tests:

python commands.py test

When this feature is built: See coverage results at htmlcov/index.html.

On PyCharm

Click on Edit Configurations... beside Run icon. This opens Run/Debug Configurations window.

Click on + or press Alt+Insert.

Choose Python tests/pytest option.

Fill Target field with path to tests folder as <path to project>/src/tests.

Click on Ok button.

Click on Run icon. This run the tests.

Open Terminal window and run command to generate HTML report:

coverage html

See coverage results at htmlcov/index.html.

Package

First enter to virtual environment. To package this project into .egg and .whell:

python commands.py package

Generated packages will be in folder src/dist. Each new package, do not forget to increase version at src/take_resolution/__init__.py

Upload

To upload build package to PyPI:

python commands.py upload

This upload the latest build version. After, package can be downloaded and installed by pip in any place with python and pip:

pip install take_resolution

Pipelines

Pipelines are described on a conf file conf/base/pipelines.json. See an example for content:

{
    "pipeline_1": {
        "nodes": [
            {
                "input": [
                    "input.number",
                    "params.a",
                    "params.b"
                ],
                "output": "output_1",
                "function": "my_module.function_1"
            },
            {
                "input": [
                    "output_1",
                    [
                        "params.x1",
                        "params.x2"
                    ],
                    [
                        "params.y1",
                        "params.y2"
                    ]
                ],
                "output": "output_2",
                "function": "my_module.function_2"
            },
            {
                "input": [
                    "output_2"
                ],
                "output": "output_3",
                "function": "my_module.function_3"
            }
        ],
        "output": {
            "raw": [
                "output_1"
            ],
            "intermediate": [
                "output_2"
            ],
            "primary": [
                "output_3"
            ]
        }
    },
    "pipeline_2": {
        "nodes": [
            {
                "input": [
                    "input.number",
                    "params.q"
                ],
                "output": "output_4",
                "function": "my_module.function_4"
            }
        ],
        "output": {
            "raw": [
                "output_4"
            ]
        }
    }
}

Run

To run a given pipeline:

import take_resolution as tr
input = {'number': 12}
tr.run('pipeline_1', **input)

Where 'pipeline_1' is pipeline name and this pipeline is described on pipelines.json.

To run all pipelines described on pipelines.json:

import take_resolution as tr
input = {'number': 12}
tr.run(**input)

Notebooks

Packaging this project is intended to be installed on a specific Databricks cluster. This is the cluster where we work with ML experiments using mlflow. And an experiment is done as example notebooks on shared, that is like:

import mlflow as ml
import take_resolution as tr


with ml.start_run():
    # experiment code using our pipelines
    input = {}
    output = tr.run('pipeline_1', **input)

    # logging our parameters
    params = tr.load_params()
    ml.log_params(params)

    # logging some value on output
    output_3 = output['primary']['output_3']
    ml.log_metric('output_3', output_3)

Tips

In order to maintain the project:

  • Do not remove or change any lines from the .gitignore unless you know what are you doing.
  • When developing experiments and production, follow data standard related to suitable layers.
  • When developing experiments, put them into notebooks, following code policies.
  • Write notebooks on Databricks and synchronize it to this repository into particular sub-folder in folder notebooks and commit them.
  • Do not commit any data.
  • Do not commit any log file.
  • Do not commit any credentials or local configuration.
  • Keep all credentials or local configuration in folder conf/local/.
  • Do not commit any generated file on testing or building processes.
  • Run test before pull request to make sure that has no bug.
  • Follow git flow practices:
    • Create feature branch for new feature from dev branch. Work on this branch with commits and pushes. Send a pull request to dev branch when terminate the work.
    • When terminate a set of features to release, merge dev branch to test branch. Apply several and strict tests to be sure that all are fine. On find errors, fix all and apply tests again. When all are ok, merge from test to master increasing release version and uploading to PyPI.
    • If some bug is found on production, master branch, create hotfix branch from master. Correct all errors and apply tests like in test branch. When all are ok, merge from hotfix branch to master and then, merge from master to dev.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Built Distributions

take_resolution-0.10.0-py3.7.egg (30.9 kB view hashes)

Uploaded 3 7

take_resolution-0.10.0-py3-none-any.whl (15.1 kB view hashes)

Uploaded py3

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page