Skip to main content

Store/Read train and test matrices

Project description

Copyright ©2017. The University of Chicago (“Chicago”). All Rights Reserved.

Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.

Created by Data Science and Public Policy, University of Chicago

The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.

IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Description: # metta-data
Train Matrix and Test Matrix Storage

[![Build Status](https://travis-ci.org/dssg/metta-data.svg?branch=master)](https://travis-ci.org/dssg/metta-data)
[![codecov](https://codecov.io/gh/dssg/metta-data/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/metta-data)


## Description

Python library for storing and recalling meta data, and DataFrames of training and
testing sets.

## Installation
To get the latest stable version:
```
pip install metta-data
```

To get the current master branch:
```
pip install git+git://github.com/dssg/metta-data.git
```


## How-to

`metta` expects you to hand it a dictionary for each dataframe with the following keys:
- `beginning_of_time` (date.datetime): The earliest time that enters your covariate calculations.
- `end_time` (date.dateime): The last time that enters your covariate calculations.
- `label_window` (str): The length of the labeling window you are using in this matrix eg: '1y', '6m'
- `label_name` (str): The outcome variable's column name. This column must be in the last position in your dataframe.
- `matrix_id` (str): Human readable id for the dataset

### Storing a train and test pair
```
import metta


train_config = {'beginning_of_time': datetime.date(2012, 12, 20),
'end_time': datetime.date(2016, 12, 20),
'label_window': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2012',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date'] }


test_config = {'beginning_of_time': datetime.date(2015, 12, 20),
'end_time': datetime.date(2016, 12, 21),
'label_window': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary'
'matrix_id': 'CDPH_2015',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'inidces': ['entity_id', 'as_of_date'] }


metta.archive_train_test(train_config,
X_train,
test_config,
X_test,
directory='./old_matrices',
format='hd5',
overwrite=False)
```

### Storing a train and multiple test sets
```
import metta


train_config = {'beginning_of_time': datetime.date(2012, 12, 20),
'end_time': datetime.date(2016, 12, 20),
'label_window': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2012',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date'] }


base_test_config = {'beginning_of_time': datetime.date(2015, 12, 20),
'end_time': datetime.date(2016, 12, 21),
'label_window': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2015',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date']}

train_uuid = metta.archive_matrix(train_config, X_train, directory='./matrices')

test_uuids = []

for years in range(1, 5):
test_config = base_test_config.copy()
test_config['beginning_of_time'] += relativedelta(years=years)
test_config['end_time'] += relativedelta(years=years)
test_config['matrix_id'] = 'CDPH_{}'.format(test_config['end_time'].year)
test_uuids.append(metta.archive_matrix(
test_config,
df_data,
directory='./matrices',
overwrite=False,
format='csv',
train_uuid=train_uuid
))

```


### Uploading to S3
```
dict_config = yaml.load(open('aws_keys.yaml'))

metta.upload_to_s3(access_key_id=dict_config['AWSAccessKey'],
secret_access_key=dict_config['AWSSecretKey'],
bucket=dict_config['Bucket'],
folder=dict_config['Folder'],
directory='./old_matrices')

```



Keywords: metta
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

metta-data-0.2.2.tar.gz (8.3 kB view hashes)

Uploaded Source

Built Distribution

metta_data-0.2.2-py2.py3-none-any.whl (10.7 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page