Store results of modeling runs
Project description
Copyright ©2016. The University of Chicago (“Chicago”). All Rights Reserved.
Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.
Created by Data Science and Public Policy, University of Chicago
The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.
IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Description: # results-schema
Store results of modeling runs in a relational database
## Quick Start
1. Install
`pip install git+https://github.com/dssg/results-schema.git`
2. Create a YAML file with your database credentials (see example_db_config.yaml), or an environment variable 'DBURL' with a connection string. The database must be created already.
3. Call 'upgrade_db' function from Python console or script
```
>>> from results_schema import upgrade_db
>>> upgrade_db('my_db_config.yaml')
```
This command will create a 'results' schema and the necessary tables.
## Modifying the schema
[Alembic](http://alembic.zzzcomputing.com/en/latest/tutorial.html) is a schema migrations library written in Python. It allows us to auto-generate migrations to run incremental database schema changes, such as adding or removing a column. This is done by comparing the definition of a schema in code with that of a live database. There are many valid ways to create migrations, which you can read about in [Alembic's documentation](http://alembic.zzzcomputing.com/en/latest/tutorial.html). But here is a common workflow we will use to modify the schema.
1. Have a candidate database for comparison. You can use a toy database for this that you upgrade to the current master, or use your project database if the results schema has not been manually modified.
2. Make the desired modifications to [results_schema.schema](results_schema/schema.py).
3. Autogenerate a migration: `alembic -c results_schema/alembic.ini -x db_config_file=my_db_config.yaml revision --autogenerate` - This will look at the difference between your schema definition and the database, and generate a new file in results_schema/alembic/versions/.
4. Inspect the file generated in step 3 and make sure that the changes it is suggesting make sense. Make any modifications you want; the autogenerate functionality is just meant as a guideline.
5. Upgrade the database: `alembic -c results_schema/alembic.ini -x db_config_file=my_db_config.yaml upgrade head`
6. Update the [factories file](results_schema/factories/__init__.py) with your changes - see more on factories below if you are unfamiliar with them.
7. If everything looks good, create a pull request!
## Using Factories
When you want to create rows of these results tables for a unit test, you can use the included factories to make this easier and with less boilerplate. Factories allow you to only specify the attribute that are important to your test, and choose reasonable defaults for all other attributes. results_schema uses [FactoryBoy](http://factoryboy.readthedocs.io/en/latest/index.html) to accomplish this.
A simple example is to just instantiate an `EvaluationFactory`. `Evaluations` depend on `Models`, which depend on both `ModelGroups` and `Experiments`. So instantiating an `EvaluationFactory` actually creates four objects in the database.
```
from results_schema.factories import EvaluationFactory, session
init_engine(engine)
EvaluationFactory()
session.commit()
results = engine.execute('select model_id, metric, parameter, value from results.evaluations')
for row in results:
print(row)
```
```
(1, 'precision@', '100_abs', Decimal('0.76'))
```
This is all well and good, but often your tests will require some more control over the relationships between the objects you create, like creating different evaluations keyed to the same model. You do this by instantiating a `ModelFactory` first and then passing that to each `EvaluationFactory`:
```
init_engine(engine)
model = ModelFactory()
for metric, value in [
('precision@', 0.4),
('recall@', 0.3),
]:
EvaluationFactory(
model_rel=model,
metric=metric,
parameter='100_abs',
value=value
)
session.commit()
results = engine.execute('select model_id, metric, parameter, value from results.evaluations')
for row in results:
print(row)
```
```
(1, 'precision@', '100_abs', Decimal('0.4'))
(1, 'recall@', '100_abs', Decimal('0.3'))
```
Keywords: analytics datascience modeling modelevaluation
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Permission to use, copy, modify, and distribute this software, including all object code and source code, and any accompanying documentation (together the “Program”) for educational and not-for-profit research purposes, without fee and without a signed licensing agreement, is hereby granted, provided that the above copyright notice, this paragraph and the following three paragraphs appear in all copies, modifications, and distributions. For the avoidance of doubt, educational and not-for-profit research purposes excludes any service or part of selling a service that uses the Program. To obtain a commercial license for the Program, contact the Technology Commercialization and Licensing, Polsky Center for Entrepreneurship and Innovation, University of Chicago, 1452 East 53rd Street, 2nd floor, Chicago, IL 60615.
Created by Data Science and Public Policy, University of Chicago
The Program is copyrighted by Chicago. The Program is supplied "as is", without any accompanying services from Chicago. Chicago does not warrant that the operation of the Program will be uninterrupted or error-free. The end-user understands that the Program was developed for research purposes and is advised not to rely exclusively on the Program for any reason.
IN NO EVENT SHALL CHICAGO BE LIABLE TO ANY PARTY FOR DIRECT, INDIRECT, SPECIAL, INCIDENTAL, OR CONSEQUENTIAL DAMAGES, INCLUDING LOST PROFITS, ARISING OUT OF THE USE OF THE PROGRAM, EVEN IF CHICAGO HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH DAMAGE. CHICAGO SPECIFICALLY DISCLAIMS ANY WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE. THE PROGRAM PROVIDED HEREUNDER IS PROVIDED "AS IS". CHICAGO HAS NO OBLIGATION TO PROVIDE MAINTENANCE, SUPPORT, UPDATES, ENHANCEMENTS, OR MODIFICATIONS.
Description: # results-schema
Store results of modeling runs in a relational database
## Quick Start
1. Install
`pip install git+https://github.com/dssg/results-schema.git`
2. Create a YAML file with your database credentials (see example_db_config.yaml), or an environment variable 'DBURL' with a connection string. The database must be created already.
3. Call 'upgrade_db' function from Python console or script
```
>>> from results_schema import upgrade_db
>>> upgrade_db('my_db_config.yaml')
```
This command will create a 'results' schema and the necessary tables.
## Modifying the schema
[Alembic](http://alembic.zzzcomputing.com/en/latest/tutorial.html) is a schema migrations library written in Python. It allows us to auto-generate migrations to run incremental database schema changes, such as adding or removing a column. This is done by comparing the definition of a schema in code with that of a live database. There are many valid ways to create migrations, which you can read about in [Alembic's documentation](http://alembic.zzzcomputing.com/en/latest/tutorial.html). But here is a common workflow we will use to modify the schema.
1. Have a candidate database for comparison. You can use a toy database for this that you upgrade to the current master, or use your project database if the results schema has not been manually modified.
2. Make the desired modifications to [results_schema.schema](results_schema/schema.py).
3. Autogenerate a migration: `alembic -c results_schema/alembic.ini -x db_config_file=my_db_config.yaml revision --autogenerate` - This will look at the difference between your schema definition and the database, and generate a new file in results_schema/alembic/versions/.
4. Inspect the file generated in step 3 and make sure that the changes it is suggesting make sense. Make any modifications you want; the autogenerate functionality is just meant as a guideline.
5. Upgrade the database: `alembic -c results_schema/alembic.ini -x db_config_file=my_db_config.yaml upgrade head`
6. Update the [factories file](results_schema/factories/__init__.py) with your changes - see more on factories below if you are unfamiliar with them.
7. If everything looks good, create a pull request!
## Using Factories
When you want to create rows of these results tables for a unit test, you can use the included factories to make this easier and with less boilerplate. Factories allow you to only specify the attribute that are important to your test, and choose reasonable defaults for all other attributes. results_schema uses [FactoryBoy](http://factoryboy.readthedocs.io/en/latest/index.html) to accomplish this.
A simple example is to just instantiate an `EvaluationFactory`. `Evaluations` depend on `Models`, which depend on both `ModelGroups` and `Experiments`. So instantiating an `EvaluationFactory` actually creates four objects in the database.
```
from results_schema.factories import EvaluationFactory, session
init_engine(engine)
EvaluationFactory()
session.commit()
results = engine.execute('select model_id, metric, parameter, value from results.evaluations')
for row in results:
print(row)
```
```
(1, 'precision@', '100_abs', Decimal('0.76'))
```
This is all well and good, but often your tests will require some more control over the relationships between the objects you create, like creating different evaluations keyed to the same model. You do this by instantiating a `ModelFactory` first and then passing that to each `EvaluationFactory`:
```
init_engine(engine)
model = ModelFactory()
for metric, value in [
('precision@', 0.4),
('recall@', 0.3),
]:
EvaluationFactory(
model_rel=model,
metric=metric,
parameter='100_abs',
value=value
)
session.commit()
results = engine.execute('select model_id, metric, parameter, value from results.evaluations')
for row in results:
print(row)
```
```
(1, 'precision@', '100_abs', Decimal('0.4'))
(1, 'recall@', '100_abs', Decimal('0.3'))
```
Keywords: analytics datascience modeling modelevaluation
Platform: UNKNOWN
Classifier: Development Status :: 2 - Pre-Alpha
Classifier: Intended Audience :: Developers
Classifier: Natural Language :: English
Classifier: Programming Language :: Python :: 3
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
results_schema-2.0.0.tar.gz
(9.7 kB
view details)
Built Distribution
File details
Details for the file results_schema-2.0.0.tar.gz
.
File metadata
- Download URL: results_schema-2.0.0.tar.gz
- Upload date:
- Size: 9.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d93cfe59ca67c943f40cb861e955b7a3c6ab1e8b219442d47c9485dc64bbccf3 |
|
MD5 | ffc89586ded139584d2c26259d4e65b4 |
|
BLAKE2b-256 | cae49bcd1898af5858eb73bc4d46808953e6a05c4f2948364e462a7071652317 |
File details
Details for the file results_schema-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: results_schema-2.0.0-py3-none-any.whl
- Upload date:
- Size: 17.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 06b00abea115794f95ab19c6d2c21db4c2f016a4388473457df8a04424b70af3 |
|
MD5 | 9b15584c38f770095f832f8fb5e41789 |
|
BLAKE2b-256 | ed9b970856bc73e4ab513b29029c16058931529a682e00afa977528d7fdf751f |