package for detecting change in time-series data
Project description
Change detection in prescribing data
Detects changes in time series with a python wrapper around the R package gets (https://cran.r-project.org/web/packages/gets/index.html). Uses a combination of Google BigQuery and Python to query data, which is then fed to the R change detection code. Outputs a table containing results.
Installation
pip install change_detection
Anaconda users may have to conda install rpy2 and conda install geopandas if not already installed.
Usage
See https://github.com/ebmdatalab/change_detection/blob/master/examples/examples.ipynb for examples of use.
Data flow
- Get data, by:
- using a csv in
data/<name>, which must have only the fieldscode,month,numeratoranddenominator - creating a BigQuery SQL query in the same folder as the notebook that you're using, query must produce a table with only the fields
code,month,numeratoranddenominator - querying any number of the OpenPrescribing measures in BigQuery
- using a csv in
- Reshapes data with Pandas
- Splits data into chunks and passes each chunk to the R change detection code
- The resulting output is then extracted with further R code
- The R outputs are then concatenated
Options
namespecifies either the name of the custom SQL file, or the name of the BigQuery measure to be queriedverbosemakes the R output more verbose to help with bug fixing default = Falsesamplefor testing purposes, takes a random sample of 100 entities, to reduce processing time default = Falsemeasurespecifies that thenamespecified refers to a measure, rather than custom SQL default = Falsedirectionspecifies which direction to look for changes, may be'up','down', or'both', default = 'both'use_cachepasses theuse_cacheoption tobq.cached_readdefault = Truecsv_nameto specify a .csv file to be used in the change detection, rather than getting the data from BigQueryoverwriteforces reprocessing of the change detection, default behaviour is to not re-run if the output files exist default = Falsedraw_figuresdraw an R plot for each of the time-series, along with plotting regression lines/breaks. These are stored in thefiguresfolder. Options are'no'or'yes'default = 'no'
Output table
Timing Measures
is.tfirst First negative break
is.tfirst.pknown First negative break after a known intervention date
is.tfirst.pknown.offs First negative break after a known intervention date not offset by a XX% increase
is.tfirst.offs First negative break not offset by a XX% increase
is.tfirst.big Steepest break as identified by is.slope.ma
Slope Measures
is.slope.ma Average slope over steepest segment contributing at least XX% of total drop
is.slope.ma.prop Average slope as proportion to prior level
is.slope.ma.prop.lev Percentage of the total drop the segment used to evaluate the slope makes up
Level Measures
is.intlev.initlev Pre-drop level
is.intlev.finallev End level
is.intlev.levd Difference between pre and end level
is.intlev.levdprop Proportion of drop
Requirements
Python with an associated install of R. Python dependencies should be dealt with on installation (though for my install, I had to install rpy2 separately. R packages should be installed with the package is first loaded.
Python installation requires:
- ebmdatalab library https://github.com/ebmdatalab/datalab-pandas
- rpy2 (to install R and the below libraries)
- pandas
- pandas-gbq
- numpy
R installation requires:
- zoo
- caTools
- gets
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file change_detection-0.3.5-py2.py3-none-any.whl.
File metadata
- Download URL: change_detection-0.3.5-py2.py3-none-any.whl
- Upload date:
- Size: 13.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: python-requests/2.22.0
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e98002e2ea809993f607b338c73097d8495aa9ac03de4ff0a6c8b6d24936937d
|
|
| MD5 |
fdc8ad48e7cc8cb775dcfe2d7399de50
|
|
| BLAKE2b-256 |
775896cfcc6f22266be5fd70d3116f5eb455bc49eed0900b28bc5fe3a2b76abf
|