Skip to main content

A general tool to create dashboards for manual review

Project description

AnnoMate

A package for using and creating interactive dashboards for manual review.

Purity AnnoMate Reviewer

Quick Start

Install

Set up Conda Environment

This is highly recommended to manage different dependencies required by different reviewers.

  1. Install conda

    Credit to Raymond Chu this article: https://medium.com/google-cloud/set-up-anaconda-under-google-cloud-vm-on-windows-f71fc1064bd7

    sudo apt-get update
    sudo apt-get install bzip2 libxml2-dev
    
    wget https://repo.anaconda.com/miniconda/Miniconda3-latest-Linux-x86_64.sh
    bash Miniconda3-latest-Linux-x86_64.sh
    rm Miniconda3-latest-Linux-x86_64.sh
    source .bashrc
    conda install scikit-learn pandas jupyter ipython
    
  2. Create a conda environment

    If you do not already have a designated environment:

    conda create --name <your_env> python==<py_version>
    

    <your_env> is the name of your environment (ie purity_review_env). Check the corresponding reviewer's setup.py file to get the proper python version for py_version.

  3. Add conda environment to ipykernel

    Credit to Nihal Sangeeth from StackOverflow: https://stackoverflow.com/questions/53004311/how-to-add-conda-environment-to-jupyter-lab.

    conda activate <your_env>
    conda install ipykernel
    ipython kernel install --user --name=<your_env>
    conda deactivate
    

    When you open a jupyter notebook, you can change the environment the notebook cells are run in to <your_env>

Install AnnoMate with pip

If you are developing a brand new reviewer, you can install from PyPi

conda activate <your_env>
pip install AnnoMate

Install with Git

AnnoMate and most prebuilt reviewers can be downloaded with git.

git clone git@github.com:getzlab/AnnoMate.git
cd AnnoMate
pip install -e .

Tutorials and Documentation

See a more detailed tutorial in tutorial_notebooks/Intro_to_Reviewers.ipynb.

View the catalog of existing reviewers at catalog/ReviewerCatalog.ipynb.

For developers, see tutorial_notebooks/Developer_Jupyter_Reviewer_Tutorial.ipynb.

Why AnnoMate

Why and how we review data

Part of any study is ensuring data are consistent and drawing conclusions about the data from multiple sources. Studies are often novel, so frequently there are steps along the way that do not have existing, validated automation techniques. Therefore, we must perform manual review.

Typically, the person reviewing all this data opens a bunch of windows to view data from different places (a clinical information spreadsheet from a collaborator, a few outputs from a Terra workflow, and/or previous notes from another reviewer, etc.). Next they look at all the data and keep notes in yet a separate document, such as a spreadsheet or digital/physical notes. Then, they go row by row, sample by sample, until they finish.

Why we need something better

While straightforward to do in theory, this review method is very brittle, error prone, and very time consuming.

Reviewing can take a very long time, such as reviewing large datasets on the order of hundreds to thousands of data points, or if the review needs to be repeated multiple times due to changes in processes upstream.

Some review processes are iterative, or new information is gained from some other source to inform the review process, or we need to pass off the review process to someone else. We should be able to easily incorporate old data with new data, and share that history and information with others.

Some reviews require calculations, or exploring the the data in ways that a static plot cannot provide. Some Terra workflows do produce some interactive html files, but this is rare. Sometimes, a reviewer realizes mid-way through the review process that a different kind of plot could be very informative. It should be easy to generate such a plot on the fly without having to modify or create a new Terra workflow, or opening a new notebook to calculate manually.

Lastly, humans are humans, and we make mistakes. It can be very tedious to maintain and update a large spreadsheet with hundreds of rows and multiple columns to annotate. Annotations are difficult to enforce in this setting, and changes (intentional or accidental) are difficult to track.

The Solution: Jupyter notebook and Plotly-Dash!

Most ACBs use jupyter notebooks for their analysis. So why not keep the review process in jupyter notebooks too? Additionally, there already exist great tools for making interactive figures and dashboards. We can use these packages to help automatically consildate information and create figures that will make it easier to review, enforce annotation standards, and track changes over time.

The AnnoMate package makes it simple to create dashboards for reviewing data. Developers and users can easily customize their dashboards to incorpate any data they like, and automatically provides a reviewer an easy way to annotate their data, track changes, and share their annotations with others.

Get Started

See tutorial_notebooks/ for documentation and tutorials.

For AnnoMate Developers

New features to AnnoMate are pushed to dev_branch. Any separate large features being developed simultaneously can be done in separate branches, and merged to dev_branch first.

Only when we merge the dev_branch to master, we also push to pypi. At this time we decide the new version number.

Semantic versioning (https://packaging.python.org/en/latest/discussions/versioning/) The idea of semantic versioning (or SemVer) is to use 3-part version numbers, major.minor.patch, where the project author increments:

  • major when they make incompatible API changes,
  • minor when they add functionality in a backwards-compatible manner, and
  • patch, when they make backwards-compatible bug fixes.

For AnnoMate:

  • Major: Not backwards compatible
  • Minor: up to 3 functionality changes with backwards compatibility within a year
    • pickle versioning in reviewdatainterface
    • hot keys
    • Adding components or new custom pre-built reviewers
  • Patch: up to 5 bug fixes within 6 months

Github continuous integration: https://docs.github.com/en/actions/automating-builds-and-tests/building-and-testing-python

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

annomate-1.0.0.tar.gz (46.9 kB view details)

Uploaded Source

Built Distribution

AnnoMate-1.0.0-py3-none-any.whl (53.1 kB view details)

Uploaded Python 3

File details

Details for the file annomate-1.0.0.tar.gz.

File metadata

  • Download URL: annomate-1.0.0.tar.gz
  • Upload date:
  • Size: 46.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for annomate-1.0.0.tar.gz
Algorithm Hash digest
SHA256 4dec98a8c1cd2d74d5bb225390481d4df3f8a21f058b857407c6508bfc335f35
MD5 42c95c3a9fe492698a0100785bafa052
BLAKE2b-256 07cb803a89aab34616b73a625c6f1dabe7b5cd289fcaab2bec9ace99d0b94d2d

See more details on using hashes here.

File details

Details for the file AnnoMate-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: AnnoMate-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 53.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.8.8

File hashes

Hashes for AnnoMate-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 1100e9e19f4650fb1be7e2e181fbdb7b8c005a64e19a8bd6c328999a43c4345c
MD5 2c67bd6f2833a06329730b8f0a66a791
BLAKE2b-256 745752d01d73195b7c96a327844494eff5871d9277f48f6a4b8ac72999fcaba3

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page