Skip to main content

Python's DSL to describe the search space of search problems

Project description

🔭 Python Search Space

wakatime unit and statistic testing

🤔 What's it?

A Python library that defines a DSL to describe search spaces. The DSL defines design patterns with which the user can decorate their Python classes to describe their space of interest. The resulting descriptions are declarative, expressive, context sensitive, extensible and scalable. In addition, the system has a sample generator system that is consistent and coherent with the selected description.
Principal Skills: Python Metaprogramming DSL AST Visitor Pattern

🤓 Motivation

After a study of the different systems that try to solve search problems, parametric optimization or AutoML, it was detected that there were limitations to describe in detail the search spaces. As well as the lack of a tool whose main objective is the description and generation of samples of such spaces. Except for the tools with imperative syntax, none of the tools studied can express constraints with contextual or conditional dependencies or dynamic dimension spaces. The list of tools studied include a: @autogoal, @optuna, @hyperopt, @auto-sklearn, @autogluon

⭐ Quickstart

Factory of Search Space

The Domain space factory is the main public interface to the Python Search Space. This factory is generic to the type of space you want to generate. In addition, its constructor can use the following parameters to modify the characteristics of the generated space:

  • min and max: with which the limits of a number space can be defined.
  • options: list of options to choose from, this parameter transforms any space to a categorical space.
  • distribute_like: string that refers to the random distribution with which the samples of the resulting space will be generated. Its default value is uniform distribution.
# binary space
B = Domain[bool]()
# space of real numbers, (-oo, +oo)
R = Domain[float]()
# space of natural numbers, [0, +oo)
N = Domain[int](min=0)
# space of negative integers plus zero, [-oo, 0]
_N = Domain[int](max=0)
# categorical space of the first 100 powers of 10
ten_powers = [pow(10, i) for i in range(100)]
P = Domain[int](options = ten_powers)
# space of colors
C = Domain[str](options = ['Red', 'Blue', 'Black', 'White'])

Any space generated by the Domain factory define the get_sample method generates random samples according to the respective distribution.

space = Domain[int](min=0, max=1)
value, sample_context = sample.get_sample()
# value is 1 or 0

The get_sample method, in addition to the generated sample, returns the sampling context relative to that sample. This context can be used as a parameter to the get_sample function and ensures that whenever an attempt is made to generate a sample from the same space with the same context, the same sample will be generated.

space = Domain[int](min=0, max=1)
value, sample_context = sample.get_sample()
# Let value = 1, so
value1, _ = sample.get_sample(context = sample_context)
# value1 = 1

Tensorial Spaces

With the Domain factory we can also create tensor spaces. To describe each of the dimensions of the tensors that belong to the resulting space, the DSL allows to use ints or random spaces previously defined.

# space of integer lists with size 10
L10 = Domain[int][10]()
# space of all integer lists
random_len = Domain[int]()
Ln = Domain[int][random_len]()
# Space of all square matrices
Mnxn = Domain[float][random_len][random_len]()
# Space of all lists of tuple with size 2
Mnx2 = Domain[float][random_len][2]()

In the case of tensor spaces, the parameters of the constructor of the Domain factory are applied to each of the internal spaces of the domain.

Constraint Function

The DSL defines a syntax to be able to describe the details of the search spaces we are defining. This syntax is inspired by the definition of mathematical domains, where a set is described from the particular characteristics of any element of the set. To describe these characteristics we use the lambda function syntax of Python. And to use these restriction functions we must know the following characteristics.

  • The constraint functions can define as many parameters as we need, but the first one always refers to any element of the space we are describing. The rest of the variables, if they do not have a default value declared, will represent dimensions of the space we are describing.

    # x is any element of A space,
    # So A = {x in Z, x != 10}
    A = Domain[int] | (lambda x: x != 10)
    # B is the space of all strictly increasing lists with size 10
    B = Domain[int][10] | (lambda x, i: x[i] > x[i+1])
    # C is the space of all symmetric square matrices with size 10
    C = Domain[int][10][10] | (lambda x, i, j: x[i][j] == x[j][i])
    
  • The constraint functions support all Python arithmetic and comparison operations but some settings are not valid. Check these tests for some of the valid and invalid combinations

    valid_space = Domain[int] | (lambda x : x / 10 < 0) # OK
    invalid_space = Domain[int] | (lambda x : 10 / x < 0) # raise UnSupportOpError
    
  • If we want to describe more than one constraint within the same function we must write them all between parentheses and separated by commas.

    # So A = {x in Z, x, in (0, 20) x != 10}
    A = Domain[int] | (lambda x: (x != 10, x > 0, x < 20)),
    
  • The &(AND) and |(OR) operators have a cutting behavior that allows you to define conditional constraints. If either of these operators can know its result only by evaluating the most left expression then the expression on the right will never be evaluated. In cases where the operands would be constraints then the & operator works like commas and the | operator defines a segmented domain.

    A = Domain[int] | (lambda x: False & (x == 0)) # A = Z (integer domain)
    A = Domain[int] | (lambda x: True & (x == 0)) # A = { 10 }
    B = Domain[int] | (lambda x: True | (x == 0)) # B = Z (integer domain)
    # C is a segmented domain, C = [(-oo, -1), (101, +oo)]
    # This means that in order to generate samples,
    # you must first choose between the following segments
    C = Domain[int] | (lambda x: (x < 0) | (x > 100))
    

Custom Spaces

Among the main objectives of the DSL is to define a design pattern for writing a search space using Python's classes. To define a custom space we only have to assign to each of the parameters of the functions of the classes a search space as a default value.

class Line:
  """
    We want to find the line that best interpolates our data.
    And for some reason, we know that the slope is an integer
    between 50 and 100, but it is not 65. We also know that
    the intercept of the line is less than 50
  """

  def __init__(
      self,
      m: int = Domain[int](min=50, max=100) | (lambda x: x != 65),
      n: float = Domain[float]() | (lambda x: x < 50)
  ) -> None:
      self.m, self.n = m, n

  def get_point(self, x: float = Domain[float](min=0, max=10))
    return x, self.m * x + self.n

  def contains(self, x, y):
    return y == self.m * x + self.n

If we use these decorated classes as a generic type of the Domain factory we can create random spaces where the samples are instances of these classes.

space = Domain[Line]()
line_instance, _ = space.get_sample()
type(line_instance) # Line

#If a class method was decorated with the DSL design pattern
#then its parameters become optional.
#If no values are specified for those parameters
#then random ones are generated.
random_point = line_instance.get_point()
specific_point = line_instance.get_point(1)

line_instance.contains(0, 0) # OK
line_instance.contains() # Error

Contextual Dependencies

To describe contextual dependencies, a reference to a previously defined search space is needed. This reference will be used as the default value of one of the parameters of the restriction function of the dependent space. In the same way that we describe the characteristics of a set by means of the particular characteristics of any element of it, we will describe the relation between two sets by means of the particular relation of any two elements of them.

class CenterPoint:
  """
  We want to find the pointer more centered and with more
  density around it. For some reason, we know that our data
  looks like a heavy diagonal 20u thick.
  All points in the dataset lie between the lines
  y = x + 10 and y = x - 10.
  """

  Y_Domain = Domain[int]() # previous definition

  def __init__(
      self,
      x: float = Domain[float]() | (
          #Let y be any element of the domain Y_Domain
          lambda x, y=Y_Domain: (x - 10 < y, y < x + 10)
      ),
      y: int = Y_Domain # use the previous definition
  ) -> None:
      self.x, self.y = x, y

Examples of Use Cases

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

python-search-space-1.0.1.tar.gz (7.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

python_search_space-1.0.1-py3-none-any.whl (7.3 kB view details)

Uploaded Python 3

File details

Details for the file python-search-space-1.0.1.tar.gz.

File metadata

  • Download URL: python-search-space-1.0.1.tar.gz
  • Upload date:
  • Size: 7.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.9.15

File hashes

Hashes for python-search-space-1.0.1.tar.gz
Algorithm Hash digest
SHA256 afc76f542790063a808899258b19887ecba5ab61f64b3893abf1e7e7f017066e
MD5 b689f98aa49f36fe78194a2f0d90035a
BLAKE2b-256 d63c5c786d5962bdc13d7ee681a38aa7ad147df4e71cdd0929a8212be3e2a5a0

See more details on using hashes here.

File details

Details for the file python_search_space-1.0.1-py3-none-any.whl.

File metadata

File hashes

Hashes for python_search_space-1.0.1-py3-none-any.whl
Algorithm Hash digest
SHA256 67e84ecf2ea5f7aa2147a813a7a31c66feddba4b98a068e5be17f364548ae229
MD5 b47643e5721be27c9124c16ae3b43a72
BLAKE2b-256 0dde7d72adb6479083cb17d93081b40b730aa3db61d93694dc70beb628583b83

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page