Skip to main content

Store/Read train and test matrices

Project description

Copyright ©2017. The University of Chicago (“Chicago”). All Rights Reserved.

Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.

Created by Data Science and Public Policy, University of Chicago

The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.

IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.

Description: # metta-data
Train Matrix and Test Matrix Storage

[![Build Status](https://travis-ci.org/dssg/metta-data.svg?branch=master)](https://travis-ci.org/dssg/metta-data)
[![codecov](https://codecov.io/gh/dssg/metta-data/branch/master/graph/badge.svg)](https://codecov.io/gh/dssg/metta-data)


## Description

Python library for storing and recalling meta data, and DataFrames of training and
testing sets.

## Installation
To get the latest stable version:
```
pip install metta-data
```

To get the current master branch:
```
pip install git+git://github.com/dssg/metta-data.git
```


## How-to

`metta` expects you to hand it a dictionary for each dataframe with the following keys:
- `feature_start_time` (date.datetime): The earliest time that enters your covariate calculations.
- `end_time` (date.dateime): The last time that enters your covariate calculations.
- `label_timespan` (str): The length of the labeling window you are using in this matrix eg: '1y', '6m'
- `label_name` (str): The outcome variable's column name. This column must be in the last position in your dataframe.
- `matrix_id` (str): Human readable id for the dataset

### Storing a train and test pair
```
import metta


train_config = {'feature_start_time': datetime.date(2012, 12, 20),
'end_time': datetime.date(2016, 12, 20),
'label_timespan': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2012',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date'] }


test_config = {'feature_start_time': datetime.date(2015, 12, 20),
'end_time': datetime.date(2016, 12, 21),
'label_timespan': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary'
'matrix_id': 'CDPH_2015',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'inidces': ['entity_id', 'as_of_date'] }


metta.archive_train_test(train_config,
X_train,
test_config,
X_test,
directory='./old_matrices',
format='hd5',
overwrite=False)
```

### Storing a train and multiple test sets
```
import metta


train_config = {'feature_start_time': datetime.date(2012, 12, 20),
'end_time': datetime.date(2016, 12, 20),
'label_timespan': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2012',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date'] }


base_test_config = {'feature_start_time': datetime.date(2015, 12, 20),
'end_time': datetime.date(2016, 12, 21),
'label_timespan': '3m',
'label_name': 'inspection_1yr',
'label_type': 'binary',
'matrix_id': 'CDPH_2015',
'feature_names': ['break_last_3yr', 'soil', 'pressure_zone'],
'indices': ['entity_id', 'as_of_date']}

train_uuid = metta.archive_matrix(train_config, X_train, directory='./matrices')

test_uuids = []

for years in range(1, 5):
test_config = base_test_config.copy()
test_config['feature_start_time'] += relativedelta(years=years)
test_config['end_time'] += relativedelta(years=years)
test_config['matrix_id'] = 'CDPH_{}'.format(test_config['end_time'].year)
test_uuids.append(metta.archive_matrix(
test_config,
df_data,
directory='./matrices',
overwrite=False,
format='csv',
train_uuid=train_uuid
))

```


### Uploading to S3
```
dict_config = yaml.load(open('aws_keys.yaml'))

metta.upload_to_s3(access_key_id=dict_config['AWSAccessKey'],
secret_access_key=dict_config['AWSSecretKey'],
bucket=dict_config['Bucket'],
folder=dict_config['Folder'],
directory='./old_matrices')

```



Keywords: metta
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 2
Classifier: Programming Language :: Python :: 2.6
Classifier: Programming Language :: Python :: 2.7
Classifier: Programming Language :: Python :: 3
Classifier: Programming Language :: Python :: 3.3
Classifier: Programming Language :: Python :: 3.4
Classifier: Programming Language :: Python :: 3.5

Project details


Release history Release notifications

This version
History Node

1.0.0

History Node

0.4.0

History Node

0.3.0

History Node

0.2.2

History Node

0.2.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
metta_data-1.0.0-py2.py3-none-any.whl (10.7 kB) Copy SHA256 hash SHA256 Wheel py2.py3 Nov 6, 2017
metta-data-1.0.0.tar.gz (7.5 kB) Copy SHA256 hash SHA256 Source None Nov 6, 2017

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging CloudAMQP CloudAMQP RabbitMQ AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN DigiCert DigiCert EV certificate StatusPage StatusPage Status page