Skip to main content

Global utilities for Humankind data science

Project description

Global Utilities

last modified 8 March 2023 by Colleen Treado

The utilities-hki repository contains the common utilities required by multiple other humankind-datascience repositories. Unlike the old utilities repo, this package contains no encrypted files, and credentials are now passed into the utility functions as input arguments.

Current status

This repository is the code repository for the utilities-hki pip package, which, together with the new credentials repository, replaces the current utilities submodule in the other repositories used for data science at Humankind. The package was created by following this guide and the package can be found on PyPI. Most of our repositories have been updated to import the new utilities-hki pip package and call the updated utility functions, passing in the credentials from the new credentials repo, instead. The repositories that still need to be updated are

  • daily_volume_predict
  • volume_predict (but this is not in use)
  • facebook_ads

Installation and setup

For first-time setup, clone the repository into a fresh work area:

# cloning via ssh is preferred but requires an ssh key connection in your account
git clone git@github.com:humankind-datascience/utilities-hki.git

The code requires a number of Python packages to run, which should be installed inside of a dedicated virtual environment. The preferred virtual environment tool is virtualenvwrapper.

To install the required packages in a new virtual environment, run the following command from the top-level directory of the git repository:

pip install -r requirements.txt

If additional packages need to be installed upon changes to the code, add them to the requirements-top-level.txt file. Then run the below commands to install (and upgrade) the top-level dependencies and update the requirements.txt file for future use.

pip install -r requirements-top-level.txt --upgrade
pip freeze -r requirements-top-level.txt > requirements.txt

Additionally, the AWS Command Line Interface (AWS CLI) is required for use of the botocore library, which is used in database utilitify functions to read from and write to the AWS RDS databases. See the AWS CLI documentation for installation instructions.

Now you can run the top-level scripts:

python <utilities-script.py>

The utilities-hki repository contains only testing top-level scripts, designed to test the utility functions during package development.

Code updates

When making changes to the code, follow GitHub flow, i.e. create a new branch, make changes on that branch, frequently committing and pushing those changes to that branch, and then create a pull request to merge those changes into master upon review and approval.

Utility code overview

The utilities-hki repository contains common utility functions used across repositories in the Humankind Data Science code base. The utility functions are grouped by type into separate modules, as outlined below.

  • analy_utils: analysis utility functions, including cleaning procedures for and assignment of engagement types to the visit-level data;
  • db_utils: database utility functions;
  • email_utils: email utility functions;
  • fb_utils: Facebook Ads utility functions.

Standard cleaning of the visit-level data should be implemented at the start of any analysis and can be achieved by calling the analy_utils.clean_visits function (see the docstrings for more details.

Sample code for applying the trained clustering model and assigning the letter/numeric grades to the engagement types for each visit is provided below, where visit is a DataFrame. Read the docstrings for assign_cluster() and get_cluster_grades() for more details.

from utilities-hki import analy_utils
# assumes visit data has already been pulled or loaded into visit

engagement = analy_utils.assign_cluster(visit)
engagement = engagement.reset_index().merge(
    analy_utils.get_cluster_grades(), how='left', on='engagement_type')
visit = visit.merge(engagement, how='inner', on='visit_id')

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

utilities_hki-0.1.24.tar.gz (1.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

utilities_hki-0.1.24-py3-none-any.whl (153.0 kB view details)

Uploaded Python 3

File details

Details for the file utilities_hki-0.1.24.tar.gz.

File metadata

  • Download URL: utilities_hki-0.1.24.tar.gz
  • Upload date:
  • Size: 1.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.10.1 urllib3/1.26.9 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.9

File hashes

Hashes for utilities_hki-0.1.24.tar.gz
Algorithm Hash digest
SHA256 6d4510a13bbac5c4baaed55533dece1c31735a6cc29d55ab9c01f9cec518914c
MD5 1675e763a504094d17cf390a5d759b77
BLAKE2b-256 c2325f60235783db86cccdfbacda8b9df79f592af42e27df610ea1c5cefca96b

See more details on using hashes here.

File details

Details for the file utilities_hki-0.1.24-py3-none-any.whl.

File metadata

  • Download URL: utilities_hki-0.1.24-py3-none-any.whl
  • Upload date:
  • Size: 153.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.8.0 pkginfo/1.8.3 readme-renderer/34.0 requests/2.27.1 requests-toolbelt/0.10.1 urllib3/1.26.9 tqdm/4.64.1 importlib-metadata/4.8.3 keyring/23.4.1 rfc3986/1.5.0 colorama/0.4.5 CPython/3.6.9

File hashes

Hashes for utilities_hki-0.1.24-py3-none-any.whl
Algorithm Hash digest
SHA256 4fc523745ed0ef4976522a962acb9d515946c25459dbd78548d0e257984022d3
MD5 3a9b7765526ee20d1cb61c4400c5843c
BLAKE2b-256 f09a9066a8f17bfa0efe42b35e1f6fcc889292146f0dd908f75bc2a90dea52fb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page