Skip to main content

package for detecting change in time-series data

Project description

Change detection in prescribing data

Detects changes in time series with a python wrapper around the R package gets (https://cran.r-project.org/web/packages/gets/index.html). Uses a combination of Google BigQuery and Python to query data, which is then fed to the R change detection code. Outputs a table containing results.

Installation

pip install change_detection

Anaconda users may have to conda install rpy2 and conda install geopandas if not already installed.

Usage

See https://github.com/ebmdatalab/change_detection/blob/master/examples/examples.ipynb for examples of use.

Data flow

  1. Get data, by:
    • using a csv in data/<name>, which must have only the fields code, month, numerator and denominator
    • creating a BigQuery SQL query in the same folder as the notebook that you're using, query must produce a table with only the fields code, month, numerator and denominator
    • querying any number of the OpenPrescribing measures in BigQuery
  2. Reshapes data with Pandas
  3. Splits data into chunks and passes each chunk to the R change detection code
  4. The resulting output is then extracted with further R code
  5. The R outputs are then concatenated

Options

  • name specifies either the name of the custom SQL file, or the name of the BigQuery measure to be queried
  • verbose makes the R output more verbose to help with bug fixing default = False
  • sample for testing purposes, takes a random sample of 100 entities, to reduce processing time default = False
  • measure specifies that the name specified refers to a measure, rather than custom SQL default = False
  • direction specifies which direction to look for changes, may be 'up', 'down', or 'both', default = 'both'
  • use_cache passes the use_cache option to bq.cached_read default = True
  • csv_name to specify a .csv file to be used in the change detection, rather than getting the data from BigQuery
  • overwrite forces reprocessing of the change detection, default behaviour is to not re-run if the output files exist default = False
  • draw_figures draw an R plot for each of the time-series, along with plotting regression lines/breaks. These are stored in the figures folder. Options are 'no' or 'yes' default = 'no'

Output table

Timing Measures

is.tfirst First negative break is.tfirst.pknown First negative break after a known intervention date is.tfirst.pknown.offs First negative break after a known intervention date not offset by a XX% increase is.tfirst.offs First negative break not offset by a XX% increase is.tfirst.big Steepest break as identified by is.slope.ma

Slope Measures

is.slope.ma Average slope over steepest segment contributing at least XX% of total drop is.slope.ma.prop Average slope as proportion to prior level is.slope.ma.prop.lev Percentage of the total drop the segment used to evaluate the slope makes up

Level Measures

is.intlev.initlev Pre-drop level is.intlev.finallev End level is.intlev.levd Difference between pre and end level is.intlev.levdprop Proportion of drop

Requirements

Python with an associated install of R. Python dependencies should be dealt with on installation (though for my install, I had to install rpy2 separately. R packages should be installed with the package is first loaded.

Python installation requires:

R installation requires:

  • zoo
  • caTools
  • gets

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

change_detection-0.3.5-py2.py3-none-any.whl (13.1 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page