Skip to main content

Tools for healthcare machine learning

Project description

# healthcareai

[![Appveyor build status](https://ci.appveyor.com/api/projects/status/github/HealthCatalyst/healthcareai-py?branch=master&svg=true)](https://ci.appveyor.com/project/CatalystAdmin/healthcareai-py/branch/master)
[![Build Status](https://travis-ci.org/HealthCatalyst/healthcareai-py.svg?branch=master)](https://travis-ci.org/HealthCatalyst/healthcareai-py)
[![Anaconda-Server Badge](https://anaconda.org/catalyst/healthcareai/badges/version.svg)](https://anaconda.org/catalyst/healthcareai)
[![Anaconda-Server Badge](https://anaconda.org/catalyst/healthcareai/badges/installer/conda.svg)](https://conda.anaconda.org/catalyst)
[![PyPI version](https://badge.fury.io/py/healthcareai.svg)](https://badge.fury.io/py/healthcareai)
[![GitHub license](https://img.shields.io/badge/license-MIT-blue.svg)](https://raw.githubusercontent.com/HealthCatalyst/healthcareai-py/master/LICENSE)

The aim of **healthcareai** is to streamline machine learning in healthcare. The package has two main goals:

- Allow one to easily create models based on tabular data, and deploy a best model that pushes predictions to a database such as MSSQL, MySQL, SQLite or csv flat file.
- Provide tools related to data cleaning, manipulation, and imputation.

## Installation

### Windows

- If you haven't, install 64-bit Python 3.5 via [the Anaconda distribution](https://repo.continuum.io/archive/Anaconda3-4.2.0-Windows-x86_64.exe)
- **Important** When prompted for the **Installation Type**, select **Just Me (recommended)**. This makes permissions later in the process much simpler.
- Open the terminal (i.e., CMD or PowerShell, if using Windows)
- Run `conda install pyodbc`
- Upgrade to latest scipy (note that upgrade command took forever)
- Run `conda remove scipy`
- Run `conda install scipy`
- Run `conda install scikit-learn`
- Install healthcareai using **one and only one** of these three methods (ordered from easiest to hardest).
1. **Recommended:** Install the latest release with conda by running `conda install -c catalyst healthcareai`
2. Install the latest release with pip run `pip install healthcareai`
3. If you know what you're doing, and instead want the bleeding-edge version direct from our github repo, run `pip install https://github.com/HealthCatalyst/healthcareai-py/zipball/master`

#### Why Anaconda?

We recommend using the Anaconda python distribution when working on Windows. There are a number of reasons:
- When running anaconda and installing packages using the `conda` command, you don't need to worry about [dependency hell](https://en.wikipedia.org/wiki/Dependency_hell), particularly because packages aren't compiled on your machine; `conda` installs pre-compiled binaries.
- A great example of the pain the using `conda` saves you is with the python package **scipy**, which, by [their own admission](http://www.scipy.org/scipylib/building/windows.html) *"is difficult"*.

### Linux

You may need to install the following dependencies:
- `sudo apt-get install python-tk`
- `sudo pip install pyodbc`
- Note you'll might run into trouble with the `pyodbc` dependency. You may first need to run `sudo apt-get install
unixodbc-dev` then retry `sudo pip install pyodbc`. Credit [stackoverflow](http://stackoverflow.com/questions/2960339/unable-to-install-pyodbc-on-linux)

Once you have the dependencies satisfied run `pip install healthcareai` or `sudo pip install healthcareai`

### macOS

- `pip install healthcareai` or `sudo pip install healthcareai`

### Linux and macOS (via docker)

- Install [docker](https://docs.docker.com/engine/installation/)
- Clone this repo (look for the green button on the repo main page)
- cd into the cloned directory
- run `docker build -t healthcareai .`
- run the docker instance with `docker run -p 8888:8888 healthcareai`
- You should then have a jupyter notebook available on `http://localhost:8888`.

### Verify Installation

To verify that *healthcareai* installed correctly, open a terminal and run `python`. This opens an interactive python
console (also known as a [REPL](https://en.wikipedia.org/wiki/Read%E2%80%93eval%E2%80%93print_loop)). Then enter this
command: `from healthcareai import SupervisedModelTrainer` and hit enter. If no error is thrown, you are ready to rock.

If you did get an error, or run into other installation issues, please [let us know](http://healthcare.ai/contact.html)
or better yet post on [Stack Overflow](http://stackoverflow.com/questions/tagged/healthcare-ai) (with the healthcare-ai
tag) so we can help others along this process.

## Getting started

- Visit [healthcare.ai](http://healthcareai-py.readthedocs.io/en/latest/) to read the docs and find examples.
* Including this [notebook](notebooks/Example1.ipynb)
- Open Sphinx (which installed with Anaconda) and copy the examples into a new file
- Modify the queries and parameters to match your data
- If you plan on deploying a model (ie, pushing predictions to SQL Server), run this in SSMS beforehand:
```sql
CREATE TABLE [SAM].[dbo].[HCAIClassificationBASE] (
[BindingID] [int] ,
[BindingNM] [varchar] (255),
[LastLoadDTS] [datetime2] (7),
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
[PredictedProbNBR] [decimal] (38, 2),
[Factor1TXT] [varchar] (255),
[Factor2TXT] [varchar] (255),
[Factor3TXT] [varchar] (255))

CREATE TABLE [SAM].[dbo].[HCAIPredictionRegressionBASE] (
[BindingID] [int],
[BindingNM] [varchar] (255),
[LastLoadDTS] [datetime2] (7),
[PatientEncounterID] [decimal] (38, 0), --< change to your grain col
[PredictedValueNBR] [decimal] (38, 2),
[Factor1TXT] [varchar] (255),
[Factor2TXT] [varchar] (255),
[Factor3TXT] [varchar] (255))
```

## For Issues

- Double check that the code follows the examples [here](http://healthcareai-py.readthedocs.io/en/latest/)
- If you're still seeing an error, create a post in [Stack Overflow](http://stackoverflow.com/questions/tagged/healthcare-ai) (with the healthcare-ai tag) that contains
* Details on your environment (OS, database type, R vs Py)
* Goals (ie, what are you trying to accomplish)
* Crystal clear steps for reproducing the error
- You can also log a new issue in the GitHub repo by clicking [here](https://github.com/HealthCatalyst/healthcareai-py/issues/new)

## PyPI Package Creation and Updating

**Note these instructions are for maintainers only.**

First, read this [Packaging and Distributing Projects](https://packaging.python.org/distributing/) guide.

It's also worth noting that while this *should* be done on the [pypi test site](https://testpypi.python.org/pypi), I've
run into a great deal of trouble with conflicting guides authenticating to the test site. So be smart about this.

1. **Build a source distribution**: from python3 (ran in windows anaconda python 3) run `python setup.py sdist`
2. **Register the package** by using the[form on pypi](https://pypi.python.org/pypi?%3Aaction=pkg_edit&name=healthcareai).
Upload your `PKG-INFO` that was generated inside the `.egg` file.
3. **Upload the package** using [twine](https://pypi.python.org/pypi/twine)
- `twine upload dist/healthcareai-<version>.tar.gz`
- **NOTE** You can only ever upload a file name **once**. To get around this I was adding a *rc* number to the
version in `setup.py`. However, this **will break the appveyor build**, so you'll need to remove the `.rc` before
you push to github.
4. Verify install on all three platforms (linux, macOS, windows) by:
1. `pip uninstall healthcareai`
2. `pip install healthcareai`
3. From a python console, type `from healthcareai import SupervisedModelTrainer`

### Release process (Including Read The Docs)

1. update all version numbers
- `setup.py`
2. update CHANGELOG
- Move all items under **unreleased** to a new release number
- Leave the template under **unreleased**
3. merge in the PR
4. create release on github releases (making sure this matches the release number in `setup.py`)
5. Create and upload the new pypi release (see above)
6. update readthedocs settings
- **Admin** > **Versions**
- Ensure that the new release number is checked for **public**
7. Manually build new read the docs
- **Builds** > **Build version <new release>**
8. verify the new version builds and is viewable at the public url

### Conda Packaging and Distribution

Creating a conda package is much easier if you have already built the PyPI package.

1. Install prerequisites (only needed once)
+ Install conda build `conda install conda-build`
+ Install anaconda cli `conda install anaconda-client`
+ Login to anaconda.org with `anaconda login`
2. Configure conda
+ `conda config --set always_yes true`
+ `conda config --set anaconda_upload no`
3. Create the skeleton conda recipe from the existing PyPI package
+ `conda skeleton pypi healthcareai`
4. Build the conda package for the main python versions
+ `conda build --python 2.7 healthcareai`
+ `conda build --python 3.4 healthcareai`
+ `conda build --python 3.5 healthcareai`
+ `conda build --python 3.6 healthcareai`
5. Convert the existing builds to work on all platforms (win32, win64, osx62, linux32, linux64). Note this can take a while.
+ `conda convert --platform all win-64/healthcareai-*-py*.tar.bz2 -o <PATH_TO_BUILD_DIRECTORY>`
6. Upload to anaconda using the anaconda cli
+ Note that you'll have to keep track of where the builds are put!
+ `anaconda upload <PATH_TO_BUILD_DIRECTORY>/**/healthcareai*.tar.bz2`
7. Clean up the mess
+ `conda build purge`

##### Helpful Resources

- Conda [Building Packages](https://conda.io/docs/building/build.html)
- [Anaconda.org dashboard](https://anaconda.org/catalyst/healthcareai)
- Taken from the excellent [conda.io docs](https://conda.io/docs/build_tutorials/pkgs.html)
- Also, some taken from this [Travis CI build](https://gist.github.com/yoavram/05a3c04ddcf317a517d5)#


## Sphinx Progress

Ideally, this project will have a user guide, (currently in the form of the docs folder) and method level documentation generated by sphinx.

1. Install sphinx
2. install

From the `dox/_build` (you may need to create it if it doesn't exist) directory, run`sphinx-apidoc.exe -f -o ../ ../../healthcareai && sphinx-build.exe -b html ../ ./ && python -m http.server 8888 --bind 127.0.0.1`

### Sphinx resources

- [An idiot’s guide to Python documentation with Sphinx and ReadTheDocs](https://samnicholls.net/2016/06/15/how-to-sphinx-readthedocs/)
- [First Steps with Sphinx](http://www.sphinx-doc.org/en/stable/tutorial.html)
- [Napoleon - Marching toward legible docstrings](https://sphinxcontrib-napoleon.readthedocs.io/en/latest/)
- [napoleon configuration](http://www.sphinx-doc.org/en/stable/ext/napoleon.html#configuration)

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

healthcareai-1.0.tar.gz (58.6 kB view details)

Uploaded Source

File details

Details for the file healthcareai-1.0.tar.gz.

File metadata

  • Download URL: healthcareai-1.0.tar.gz
  • Upload date:
  • Size: 58.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for healthcareai-1.0.tar.gz
Algorithm Hash digest
SHA256 5efcd9551e6276e51c2af3c4e7268b2d42c133a704f10bb2a3871f2452900456
MD5 e40f4e947a750c8e225c11d2a2a822c1
BLAKE2b-256 d16499352f8a44c482c8eb844c790f79fe9314245bc163e605b7c34945afbb08

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page