Skip to main content

Implements temporal validation for machine learning/

Project description

Copyright ©2017,. The University of Chicago (“Chicago”). All Rights Reserved.

Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.

Created by Data Science and Public Policy, University of Chicago

The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.

IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Description: # timechop
Generate temporal cross-validation time windows for matrix creation


[![Build Status](https://travis-ci.org/dssg/timechop.svg?branch=master)](https://travis-ci.org/dssg/timechop)

[![codecov](https://codecov.io/gh/dssg/timechop/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/timechop)

[![codeclimate](https://codeclimate.com/github/dssg/timechop.png)](https://codeclimate.com/github/dssg/timechop)

In predictive analytics, temporal validation can be complicated. There are a variety of questions to balance: How frequently to retrain model? Should the time between rows for the same entity in the train and test matrices be different? Keeping track of how to create matrix time windows that successfully answer all of these questions is difficult.

That's why we created timechop. Timechop takes in high-level time configuration (e.g. lists of train label windows, update windows) and returns all matrix time definitions. Here's an example:

```
chopper = Timechop(
beginning_of_time = datetime.datetime(1990, 1, 1, 0, 0),
modeling_start_time = datetime.datetime(2010, 1, 1, 0, 0),
modeling_end_time = datetime.datetime(2010, 1, 16, 0, 0),
update_window = '5 days',
train_example_frequency = '1 days',
test_example_frequency = '3 days',
train_durations = ['5 days'],
test_durations = ['5 days'],
train_label_windows=['1 day'],
test_label_windows=['3 months']
)
result = chopper.chop_time()
print(result)
```

```
[
{
'beginning_of_time': datetime.datetime(1990, 1, 1, 0, 0),
'modeling_start_time': datetime.datetime(2010, 1, 1, 0, 0),
'modeling_end_time': datetime.datetime(2010, 1, 16, 0, 0),
'train_matrix': {
'matrix_start_time': datetime.datetime(2010, 1, 1, 0, 0),
'matrix_end_time': datetime.datetime(2010, 1, 6, 0, 0),
'as_of_times': [
datetime.datetime(2010, 1, 1, 0, 0),
datetime.datetime(2010, 1, 2, 0, 0),
datetime.datetime(2010, 1, 3, 0, 0),
datetime.datetime(2010, 1, 4, 0, 0),
datetime.datetime(2010, 1, 5, 0, 0)
],
'label_window': '1 day',
'example_frequency': '1 days',
'train_duration': '5 days'
},
'test_matrices': [{
'matrix_start_time': datetime.datetime(2010, 1, 6, 0, 0),
'matrix_end_time': datetime.datetime(2010, 1, 11, 0, 0),
'as_of_times': [
datetime.datetime(2010, 1, 6, 0, 0),
datetime.datetime(2010, 1, 9, 0, 0),
],
'label_window': '3 months',
'example_frequency': '3 days'
}]
},
{
'beginning_of_time': datetime.datetime(1990, 1, 1, 0, 0),
'modeling_start_time': datetime.datetime(2010, 1, 1, 0, 0),
'modeling_end_time': datetime.datetime(2010, 1, 16, 0, 0),
'train_matrix': {
'matrix_start_time': datetime.datetime(2010, 1, 6, 0, 0),
'matrix_end_time': datetime.datetime(2010, 1, 11, 0, 0),
'as_of_times': [
datetime.datetime(2010, 1, 6, 0, 0),
datetime.datetime(2010, 1, 7, 0, 0),
datetime.datetime(2010, 1, 8, 0, 0),
datetime.datetime(2010, 1, 9, 0, 0),
datetime.datetime(2010, 1, 10, 0, 0)
],
'label_window': '1 day',
'example_frequency': '1 days',
'train_duration': '5 days'
},
'test_matrices': [{
'matrix_start_time': datetime.datetime(2010, 1, 11, 0, 0),
'matrix_end_time': datetime.datetime(2010, 1, 16, 0, 0),
'as_of_times': [
datetime.datetime(2010, 1, 11, 0, 0),
datetime.datetime(2010, 1, 14, 0, 0)
],
'label_window': '3 months',
'example_frequency': '3 days'
}]
}
]
```

Timechop currently works with the following:

- `beginning_of_time` - earliest date at which features are calculated
- `modeling_start_time` - earliest date in any model
- `modeling_end_time` - all dates in any model are < this date
- `update_window` - how frequently to retrain models
- `train_example_frequency` - time between rows for same entity in train matrix
- `test_example_frequency` - time between rows for same entity in test matrix
- `train_durations` - keep creating rows in train matrix for this duration
- `test_durations` - keep creating rows in test matrix for this duration
- `train_label_windows` - how much time is included in a label in the train matrix
- `test_label_windows` - how much time is included in a label in the test matrix

The output of Timechop works as input to the [architect.Planner](https://github.com/dssg/architect/blob/master/architect/planner.py).

Keywords: timechop
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: License :: OSI Approved :: MIT License
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

timechop-0.1.3.tar.gz (5.6 kB view details)

Uploaded Source

Built Distributions

timechop-0.1.3-py3-none-any.whl (9.3 kB view details)

Uploaded Python 3

timechop-0.1.3-py2-none-any.whl (9.3 kB view details)

Uploaded Python 2

File details

Details for the file timechop-0.1.3.tar.gz.

File metadata

  • Download URL: timechop-0.1.3.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No

File hashes

Hashes for timechop-0.1.3.tar.gz
Algorithm Hash digest
SHA256 914cdf1bf61f472915b466c40e0ce46313f1c1c5627bcda8b61264287f7c3aca
MD5 e27fd56dcf313713e0878de007ef9726
BLAKE2b-256 b300359523b24fdf6f1a8fa540b8b6d7251ea0ed1cbdb13b677d5aed7a47e54e

See more details on using hashes here.

File details

Details for the file timechop-0.1.3-py3-none-any.whl.

File metadata

File hashes

Hashes for timechop-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 63bcf5fcf99855746bd03c8393721d33978de8f09d748efcb128ab373d71f07a
MD5 8bc05debf3f3e41ee7090c941c8b064b
BLAKE2b-256 b1c1fff23c25b45a01197774c62de523d83d3fbb66d3354e65e459ac42ac121c

See more details on using hashes here.

File details

Details for the file timechop-0.1.3-py2-none-any.whl.

File metadata

File hashes

Hashes for timechop-0.1.3-py2-none-any.whl
Algorithm Hash digest
SHA256 0040a242f4a78f447560700d0ab4347856e09c6b846d793c62ab32f74f87cb5c
MD5 bd7ff282c53dcb0a90573b4e064c3dc7
BLAKE2b-256 2480cb184f735072c33a1d8e9e61f6f089b9f212804aa41a6fbe1f6c3d08b60f

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page