Skip to main content

Risk modeling and prediction

Project description

Triage

Data Science Toolkit for Social Good and Public Policy Problems

image image image

Building data science systems requires answering many design questions, turning them into modeling choices, which in turn run machine learning models. Questions such as cohort selection, unit of analysis determination, outcome determination, feature (explanantory variables) generation, model/classifier training, evaluation, selection, and list generation are often complicated and hard to choose apriori. In addition, once these choices are made, they have to be combined in different ways throughout the course of a project.

Triage is designed to:

  • Guide users (data scientists, analysts, researchers) through these design choices by highlighting critical operational use questions.
  • Provide an integrated interface to components that are needed throughout a data science project workflow.

Quick Links

Installation

To install Triage, you need:

  • Python 3.6
  • A PostgreSQL 9.6+ database with your source data (events, geographical data, etc) loaded.
    • NOTE: If your database is PostgreSQL 11+ you will get some speed improvements. We recommend to update to a recent version of PostgreSQL.
  • Ample space on an available disk, (or for example in Amazon Web Services's S3), to store the needed matrices and models for your experiments

We recommend starting with a new python virtual environment (with Python 3.6 or greater) and pip installing triage there.

$ virtualenv triage-env
$ . triage-env/bin/activate
(triage-env) $ pip install triage

Data

Triage needs data in a postgres database and a configuration file that has credentials for the database. The Triage CLI defaults database connection information to a file stored in 'database.yaml' (example in example/database.yaml).

Configure Triage for your project

Triage is configured with a config.yaml file that has parameters defined for each component. You can see some sample configuration with explanations to see what configuration looks like.

Using Triage

  1. Via CLI:
triage experiment example/config/experiment.yaml
  1. Import as a python package:
from triage.experiments import SingleThreadedExperiment

experiment = SingleThreadedExperiment(
    config=experiment_config, # a dictionary
    db_engine=create_engine(...), # http://docs.sqlalchemy.org/en/latest/core/engines.html
    project_path='/path/to/directory/to/save/data' # could be an S3 path too: 's3://mybucket/myprefix/'
)
experiment.run()

There are a plethora of options available for experiment running, affecting things like parallelization, storage, and more. These options are detailed in the Running an Experiment page.

Development

Triag was initially developed at University of Chicago's Center For Data Science and Public Policy and is now being maintained at Carnegie Mellon University.

To build this package (without installation), its dependencies may alternatively be installed from the terminal using pip:

pip install -r requirement/main.txt

Testing

To add test (and development) dependencies, use test.txt:

pip install -r requirement/test.txt [-r requirement/dev.txt]

Then, to run tests:

pytest

Development Environment

To quickly bootstrap a development environment, having cloned the repository, invoke the executable develop script from your system shell:

./develop

A "wizard" will suggest set-up steps and optionally execute these, for example:

(install) begin

(pyenv) installed

(python-3.6.2) installed

(virtualenv) installed

(activation) installed

(libs) install?
1) yes, install {pip install -r requirement/main.txt -r requirement/test.txt -r requirement/dev.txt}
2) no, ignore
#? 1

Contributing

If you'd like to contribute to Triage development, see the CONTRIBUTING.md document.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

triage-4.4.0.tar.gz (3.8 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

triage-4.4.0-py2.py3-none-any.whl (227.3 kB view details)

Uploaded Python 2Python 3

File details

Details for the file triage-4.4.0.tar.gz.

File metadata

  • Download URL: triage-4.4.0.tar.gz
  • Upload date:
  • Size: 3.8 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.7.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for triage-4.4.0.tar.gz
Algorithm Hash digest
SHA256 ec339f4eb0c8fbdf4f7506d9afe13a1f8742d2c476914ef917c15dde363b0a67
MD5 f8e90ae69830f5a2d7c2638f13a8fae9
BLAKE2b-256 fe0d73ecec53ae981a99e82ce6c8c80e8aaa59b096e26a1f759acaea366442b0

See more details on using hashes here.

File details

Details for the file triage-4.4.0-py2.py3-none-any.whl.

File metadata

  • Download URL: triage-4.4.0-py2.py3-none-any.whl
  • Upload date:
  • Size: 227.3 kB
  • Tags: Python 2, Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.4.2 importlib_metadata/4.7.1 pkginfo/1.7.1 requests/2.26.0 requests-toolbelt/0.9.1 tqdm/4.62.2 CPython/3.9.6

File hashes

Hashes for triage-4.4.0-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 bdfe08712fcf57ce9b1da4f02830f85f0b8063df355fd7879a9f15c781481bcf
MD5 5fb89398b7536e0dd80cfd338b42aede
BLAKE2b-256 921067251e15b97f67226c778c5043af4e807c895d37e03731622dc2b48639d0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page