A tool for programmatic and customizable generation of Raven's Progressive Matrices in the style of the RAVEN dataset
Project description
raven-gen
This repo contains a rewrite of the data-generation code originating from the CVPR paper:
RAVEN: A Dataset for Relational and Analogical Visual rEasoNing
Chi Zhang*, Feng Gao*, Baoxiong Jia, Yixin Zhu, Song-Chun Zhu
Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR), 2019
(* indicates equal contribution.)
Dramatic progress has been witnessed in basic vision tasks involving low-level perception, such as object recognition, detection, and tracking. Unfortunately, there is still an enormous performance gap between artificial vision systems and human intelligence in terms of higher-level vision problems, especially ones involving reasoning. Earlier attempts in equipping machines with high-level reasoning have hovered around Visual Question Answering (VQA), one typical task associating vision and language understanding. In this work, we propose a new dataset, built in the context of Raven's Progressive Matrices (RPM) and aimed at lifting machine intelligence by associating vision with structural, relational, and analogical reasoning in a hierarchical representation. Unlike previous works in measuring abstract reasoning using RPM, we establish a semantic link between vision and reasoning by providing structure representation. This addition enables a new type of abstract reasoning by jointly operating on the structure representation. Machine reasoning ability using modern computer vision is evaluated in this newly proposed dataset. Additionally, we also provide human performance as a reference. Finally, we show consistent improvement across all models by incorporating a simple neural module that combines visual understanding and structure reasoning.
@inproceedings{zhang2019raven,
title={RAVEN: A Dataset for Relational and Analogical Visual rEasoNing},
author={Zhang, Chi and Gao, Feng and Jia, Baoxiong and Zhu, Yixin and Zhu, Song-Chun},
booktitle={Proceedings of the IEEE Conference on Computer Vision and Pattern Recognition (CVPR)},
year={2019}
}
DISCLAIMER:
I do not offer assurances that the code in this repository functions identically to the code in the original author's repository, or that the performance of models on data generated with this code may be directly and fairly compared to the performance of models on the published RAVEN dataset of Zhang et al. (2019). Authors who utilize this code to generate data and wish to compare model performance against that of models trained with the original RAVEN dataset should offer support for the fairness of this comparison and/or replicate previous work as necessary.
Sampling and Format
Due to deficiencies in the sampling strategy implemented by the original RAVEN authors, a number of researchers observed that puzzles could be solved from the answer grid alone, and implemented improved perturbation strategies to prevent this issue. This package does not yet implement these strategies, so it does not offer convenience methods to synthesize incomplete matrices or answer grids. Saving a generated matrix instead produces completed matrices, one named NAME_answer.png
with a correct bottom right tile, and zero or more alternatives named NAME_alternative_{i}.png
with incorrect bottom right tiles. This makes raven-gen
datasets suitable for binary classification rather than 8-way classification when training models to solve RPMs in the traditional sense. If training models to answer questions about other patterns exhibited by (correct) RPMs, various kinds of annotations may be derived by inspecting the abstractions discussed below and contained in this package. Convenience functions for retrieving and saving such alternative annotations may be added in future releases.
Goals & Usage
When producing RPM-style puzzles for downstream machine learning tasks, it is important to be able to adjust both the logical and rendering-related parameters of the RPM generation process. This allows researchers to tailor the patterns they expect their models to generalize, and to adjust puzzle resolution to fit their needs. raven-gen
aims to offer simple and flexible abstractions for programmatic RPM generation. The following demonstrates basic usage of the Matrix
class to generate a correct matrix of a random type (one of MatrixType.{ONE_SHAPE,FOUR_SHAPE,FIVE_SHAPE,NINE_SHAPE,TWO_SHAPE_VERTICAL_SEP,TWO_SHAPE_HORIZONTAL_SEP,SHAPE_IN_SHAPE,FOUR_SHAPE_IN_SHAPE}
) with default parameters (480x480 resolution, white background, 3 pixel line width, 2 pixel shape border width) in the current directory:
>>> from raven_gen import Matrix, MatrixType
>>> import numpy as np
>>> rpm = Matrix.make(np.random.choice(list(MatrixType)))
>>> rpm.save(".", "test")
>>> print(rpm)
>>> print(rpm.rules)
To generate incorrect puzzles as well, one may add an rpm.make_alternatives(n_alternatives)
call before calling rpm.save
; or one may provide the n_alternatives
keyword argument to Matrix.make
. Note that calls to rpm.make_alternatives
are destructive; if repeated, they will erase the previously-generated incorrect puzzle variants.
Custom Rulesets & Attribute Setting Ranges
Puzzles exhibit all kinds of row-wise variation (any of RuleType.{CONSTANT,PROGRESSION,ARITHMETIC,DISTRIBUTE_THREE}
) across all applicable traits (AttributeType.{POSITION,NUMBER,SIZE,SHAPE,COLOR}
) except shape, for which the arithmetic rule is disallowed by default due to being unintuitive. To customize this behavior, provide a custom Ruleset
to Matrix.make
:
>>> from raven_gen import Ruleset, RuleType
>>> ruleset = Ruleset(size_rules=[RuleType.CONSTANT, RuleType.PROGRESSION], shape_rules=list(RuleType))
>>> rpm = Matrix.make(np.random.choice(list(MatrixType)), ruleset=ruleset)
The resulting RPM may utilize the constant or progression rules to determine the sizes of shapes across rows, and may utilize any rule, including the arithmetic rule, to determine the number of sides those shapes will have.
To adjust the permissible ranges of attributes, one may modify the Matrix.attribute_bounds
class variable. The contents of this dictionary are validated each time a matrix is generated, so the insertion of invalid settings should not prevent puzzle generation.
Low Resolution Puzzle Generation
A common reason for changing the valid range of an attribute setting is enforcing larger minimum shape sizes. This is necessary to produce legible puzzles at smaller resolutions. At lower resolutions, oblique angle rotations can also cause shapes to be hard to discern. The following demonstrates the application of custom settings suitable for generating legible (if not very aesthetically-pleasing) 96x96 resolution RPMs:
>>> Matrix.oblique_angle_rotations(allowed=False)
>>> ruleset = Ruleset(size_rules=[RuleType.CONSTANT])
>>> matrix_types = [
... MatrixType.ONE_SHAPE, MatrixType.FOUR_SHAPE, MatrixType.FIVE_SHAPE,
... MatrixType.TWO_SHAPE_VERTICAL_SEP, MatrixType.TWO_SHAPE_HORIZONTAL_SEP,
... MatrixType.SHAPE_IN_SHAPE,
... ]
>>> weights = [.15, .2, .2, .15, .15, .15]
>>> Matrix.attribute_bounds[MatrixType.FOUR_SHAPE][(
... ComponentType.NONE, LayoutType.GRID_FOUR)]["size_min"] = 3
>>> Matrix.attribute_bounds[MatrixType.FIVE_SHAPE][(
... ComponentType.NONE, LayoutType.GRID_FIVE)]["size_min"] = 5
>>> Matrix.attribute_bounds[MatrixType.TWO_SHAPE_VERTICAL_SEP][(
... ComponentType.LEFT, LayoutType.CENTER)]["size_min"] = 3
>>> Matrix.attribute_bounds[MatrixType.TWO_SHAPE_VERTICAL_SEP][(
... ComponentType.RIGHT, LayoutType.CENTER)]["size_min"] = 3
>>> Matrix.attribute_bounds[MatrixType.TWO_SHAPE_HORIZONTAL_SEP][(
... ComponentType.UP, LayoutType.CENTER)]["size_min"] = 3
>>> Matrix.attribute_bounds[MatrixType.TWO_SHAPE_HORIZONTAL_SEP][(
... ComponentType.DOWN, LayoutType.CENTER)]["size_min"] = 3
>>> Matrix.attribute_bounds[MatrixType.SHAPE_IN_SHAPE][(
... ComponentType.OUT, LayoutType.CENTER)]["size_min"] = 5
>>> Matrix.attribute_bounds[MatrixType.SHAPE_IN_SHAPE][(
... ComponentType.IN, LayoutType.CENTER)]["size_min"] = 5
>>> for i in trange(size):
... rpm = Matrix.make(np.random.choice(matrix_types, p=weights),
... ruleset=ruleset,
... n_alternatives=1)
... for background_color in range(28, 225, 28):
... rpm.save(".",
... f"rpm_{i}_background_{background_color}",
... background_color,
... image_size=96,
... line_thickness=1,
... shape_border_thickness=1)
The above also exhibits the usage of the line_thickness
, shape_border_thickness
, and background_color
parameters to rpm.save
. Generating data in all background colors (or in random background colors) may be helpful for balancing rewards in representation learning tasks.
Custom Attribute Values
To change the values that underlie specific attribute settings, one may redefine the "constants" defined in raven_gen.attribute
:
>>> import raven_gen
>>> raven_gen.attribute.SIZE_VALUES
(0.4, 0.5, 0.6, 0.7, 0.8, 0.9)
>>> raven_gen.attribute.SIZE_VALUES = (0.3, 0.45, 0.6, 0.75, 0.9)
>>> raven_gen.attribute.SIZE_MIN
0
>>> raven_gen.attribute.SIZE_MAX = len(raven_gen.attribute.SIZE_VALUES) - 1
The bounds applied in Matrix.attribute_bounds
will now refer (by index) to this underlying set of values when determining the sizes of shapes. Please take caution when adjusting attribute values as the defaults have been tuned for appearance under the default rendering settings. If one desires only to change the range of settings that is permitted, it is preferable to modify Matrix.attribute_bounds
. Specific assumptions governing these constants include:
raven_gen.attribute.*_VALUES
constants are assumed to represent strictly increasing or decreasing sequences. If this assumption is violated rules may have an incoherent presentation.raven_gen.attribute.NUM_VALUES
must be of the formtuple(range(1, x))
wherex >= 10
.raven_gen.attribute.SHAPE_VALUES
cannot be adjusted.raven_gen.attribute.SIZE_VALUES
must be numbers in (0, 1]; extreme values in this range will be indiscernible or too large for the panels they occupy.raven_gen.attribute.COLOR_VALUES
are integers in [0, 256).raven_gen.attribute.UNI_VALUES
are booleans; the relative occurrence ofTrue
andFalse
in this sequence indicates the probability that angles are random or uniform across rows.
Upon altering raven_gen.attribute.*_VALUES
, one must ensure that raven_gen.attribute.*_{MIN, MAX}
refer to the correct boundary indices of the new sequence.
Custom Components/Layouts
The only setting that cannot be customized by the above means (aside from the set of valid shapes) are the positions and sizes of the boxes that shapes occupy within a panel. These settings are determined by the branch of MatrixType
that is supplied to Matrix.make
. One may generate matrices that customize this behavior by inheriting from the Matrix
class, imitating the structure of the methods called by Matrix.make
, and adding top-level entries to Matrix.attribute_bounds
. However, the stdlib enums used to describe the types of matrices, components, and layouts cannot be flexibly extended, so the str
representations of rpm
and rpm.rules
will not be available. This may be ameliorated in later releases.
Serialization and Deserialization
The Matrix
class does not currently offer deserialization from human readable formats. Users interested in rerendering a matrix with different settings or otherwise interrogating the logical specification of a matrix outside of the originating program/interpreter lifetime must pickle Matrix objects. Matrix
objects do not store previous renderings, so this will not duplicate the memory of previously generated images. A human readable serialization of Matrix
objects and their rules may be accessed/persisted by calling str
and saving this output. This summary CANNOT be used to regain the original Matrix
object and this package does not offer any programmatic tools for inspecting these summaries.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file raven_gen-0.3.2.tar.gz
.
File metadata
- Download URL: raven_gen-0.3.2.tar.gz
- Upload date:
- Size: 34.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4f4fb82022c7c5abbe00270d5238feeaa63c403752f62c1cf0fe7808cd066bb6 |
|
MD5 | 293746774eaa5120a1e2bca7cb612d86 |
|
BLAKE2b-256 | 93c80557ad538ea0c41764737798bb082d5c01a013e39f25ac04967020646b9d |
File details
Details for the file raven_gen-0.3.2-py3-none-any.whl
.
File metadata
- Download URL: raven_gen-0.3.2-py3-none-any.whl
- Upload date:
- Size: 32.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.8.10
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 49bd5d0dc68ad16a579f901ae16728e2614dbbb1db3b8a0306e16c04222862a1 |
|
MD5 | 8877b36030c2d09ab7d1bbf7d491e366 |
|
BLAKE2b-256 | 10039efc2f99539f8c7f0bb2617542318b2ce07e49062751d2ffaf4d1c3339db |