Verify the structure of YAML-formatted configuration files.
Project description
The yamlfig package provides developers with a framework for defining rules that test and verify a config file's structure. Those rules are captured in a parser object which can be applied to YAML-based config files to validate them.
In particular, this module enables a developer to:
-
define which fields are required, optional, or will be assigned default values if omitted;
-
declare types for those fields (e.g.,
str,int,date,dict,list, etc.); -
run arbitrary functions to test the values in each field (e.g., regular expressions matches or file-path-existence checks).
After a config file is parsed, validated, and accepted, the returned object can be used to access the field values with some confidence that they exist, are of the expected type, and have whatever other properties the rules established. If a config file is rejected, an error explaining the violated rule is raised.
This package was inspired by the similar capability that argparse
brought to command-line argument parsing.
Contents
Getting Started
Installation
pip install yamlfig
Website and Repository
The Python package is hosted on PyPI:
The source code, documentation, and issue tracker is hosted on GitHub:
Example Usage
As an example for when a developer might use yamlfig, consider developing a server that binds to an address and port. When any of a list of authorized users connects, it displays the server's name and the contents of a file. The following YAML file could act as the config file for such a server:
$ cat > quickstart_server.yaml
name: Simple Single-File Server
server:
port: 81
file_path: 'quickstart_shared_file.txt'
users:
- alice
- bob
- carol
The following script uses the yamlfig package to construct a parser for this example server. It instantiates a parser object and adds a set of rules that establish which fields and structures should be in a server's config file. Those rules establish what form those fields must take and what to do if they are missing. It then invokes this parser on a config file passed as a command line argument. Where an actual server script would then use those values to spin up a server, this script just demonstrates that the values can be accessed from the parsed object by printing them:
$ cat > quickstart_server.py
from __future__ import print_function
import sys
from yamlfig import YamlConfigParser, test
# Construct a parser for the server config file
confp = YamlConfigParser()
confp.add_rule('name', path_type=str)
confp.add_rule('description', optional=True)
confp.add_rule('server.addr', path_type=str, default='127.0.0.1', test=test.is_ipv4_address)
confp.add_rule('server.port', path_type=int, test=test.is_interval(1, 65535, include_upper=True))
confp.add_rule('file_path', path_type=str, test=test.is_file_path('exists', 'isfile'))
confp.add_rule('users', path_type=list)
confp.add_rule('users.*', test=test.is_regex('^[a-z][a-z0-9]*$'))
# Parse the config file
conf = confp.parse_file(sys.argv[1])
# Retrieve values from the conf object
print('conf.name = {0}'.format(repr(conf.name)))
print('conf.description = {0}'.format(repr(conf.description)))
print('conf.server.addr = {0}'.format(repr(conf.server.addr)))
print('conf.server.port = {0}'.format(repr(conf.server.port)))
print('conf.file_path = {0}'.format(repr(conf.file_path)))
for idx in conf.users:
print('conf.users[{0}] = {1}'.format(idx, repr(conf.users[idx])))
When we execute this server script on the above config file, it prints the following values:
$ python quickstart_server.py quickstart_server.yaml
conf.name = 'Simple Single-File Server'
conf.description = None
conf.server.addr = '127.0.0.1'
conf.server.port = 81
conf.file_path = 'quickstart_shared_file.txt'
conf.users[0] = 'alice'
conf.users[1] = 'bob'
conf.users[2] = 'carol'
Note how the fields and values printed came not only from the config file but also from the rules. Fields marked as optional or taking a default are present even though they were not in the config file. Also note how the values have the type and form required by the rules (e.g., an integer within a given interval, a string in IPv4 dotted-quad notation, and a list of usernames that all match a regular expression).
Walk-through
In this example, the config file specified a name to display, a
port to listen on within a server block, a file_path to a
file to share, and the list of accounts of authorized users. The
rules added to the confp parser ensure that those rules exist; they
also define rules for some paths not in the config file and configured
as optional or taking a default value:
-
a
descriptionfield, if provided, would be displayed to the user, but it is optional; -
an
addrfield within theserverblock that, if provided, would set the binding IP address, but that takes a default of127.0.0.1if omitted.
Additionally, the confp parser verifies that the values present in
the config file are suitable for our intended use. Some of the values
are type-checked or otherwise validated:
-
the
addrfield, if provided, will be tested to confirm that it has the format of an IPv4 address (i.e., a string in dotted-quad notation); -
the
portfield will have its type checked to ensure it is anint, and its value will be tested to confirm that in the range 1 to 65535; -
the
file_pathwill be interpreted as a path to a filesystem object, and that object will be tested to confirm it exists and is a file (rather than a directory); -
all the account names in the
userslist will be tested against a regular expression to confirm they match the format of accounts on this particular system (i.e., they start with a lowercase letters followed by zero or more digits or lowercase letters).
Once the confp parser is constructed and configured, its
parse_file method is called on the config-file path given on the
command line, and it returns a parsed conf object. To demonstrate
that the conf object contains all the fields and values from the
config file merged with the optional fields and defaults from the
parser rules, it prints those fields and values.
The values in the conf object returned by parse_file have also
been type-checked and tested. Had the fields and values in the config
file not conformed to the rules of the confp parser, a ParseError
exception would have been raised. Some examples:
-
If the
namefield were omitted:ParseError: quickstart_server.yaml: "name" is missing -
If the
serverblock contained a field calledthe_ip_addressthat did not match any of the parser's rules:ParseError: quickstart_server.yaml: "server.the_ip_address" unexpected by parser -
If the
portfield of theserverblock contained the string"eighty-one"rather than the integer81:ParseError: quickstart_server.yaml: "server.port" has type str not type int -
If the
addrfield were present in the server block and had the value452.34.256.193:ParseError: quickstart_server.yaml: "server.addr" failed test: 1st octet of "452.34.256.193" exceeds 255 -
If the
file_pathfield had been the path to an existing directory namedsome_directoryinstead of the path to an existing file:ParseError: quickstart_server.yaml: "file_path" failed test: "some_directory" is not a file -
If the 3rd value of the
userslist been the display nameCarol C.instead of the usernamecarol(and noting zero-based indexing):ParseError: quickstart_server.yaml: "users.2" failed test: "Carol C." does not match /^[a-z][a-z0-9]*$/
The presence of such errors in the config file would have stopped
execution and provided a relatively informative explanation of which
rule failed and why. Because none of these errors were raised, a
developer has some assurance that the structure and values in the
conf object meets their expectations.
Just as important as what happened in this example above is what
didn't happen. When the read_file function returned the conf
object, it didn't raise a ParseError exception. Since it executed
successfully, we know that all the parser assertions hold about which
fields must exist and what formats they take; the remaining code does
not need to perform such checks and error handling itself.
Details
What yamlfig provides beyond a standard YAML parser is validation, specifically verification that a config file conforms to the various rules established for it. In this section, we introduce and describe these rules, and the various constraints that can be placed on a config file's structure and values.
Basic Usage
The typical steps when using yamlfig are:
-
instantiate a
YamlConfigParserobject, which we usually callconfp, -
configure it by using
add_ruleto add rules for each field we intend to control through a config file, -
invoke
parse_fileon a config file which either raises aParseErroror returns aYamlConfigobject, usually calledconf, and -
use that
YamlConfigin subsequent code, confident that its structure and values have already been validated.
The following script illustrates this typical pattern by using yamlfig. For the sake of the example, let's say we need a config file to drive how often a loop is run, which of two functions is called by the loop, and what parameter is passed to that function:
$ cat basic_usage.py
import sys
from yamlfig import YamlConfigParser
# 1. Instantiate a YamlConfigParser object (confp)
confp = YamlConfigParser()
# 2. Configure the parser by adding rules for each field
confp.add_rule('loop_count')
confp.add_rule('do_special_function')
confp.add_rule('function_parameter')
# 3. Invoke the parser on a config file (provided as an argument)
conf = confp.parse_file(sys.argv[1])
# 4. Use the YamlConfig object in subsequent code
for loop_index in range(conf.loop_count):
if conf.do_special_function:
special_function(conf.function_parameter)
else:
regular_function(conf.function_parameter)
One config file would cause the script to produce one behavior:
$ cat basic_config_1.yaml
loop_count: 7
do_special_function: yes
function_parameter: "a meerkat"
Running the script on basic_config_1.yaml would cause
special_function to be invoked 7 times, each with the parameter "a meerkat".
Another config file would cause the script to produce a different behavior:
$ cat basic_config_2.yaml
loop_count: 3
do_special_function: no
function_parameter: a pony
Running the script on basic_config_2.yaml would cause
regular_function to be invoked 3 times, each with the parameter "a pony".
Just as important is understanding the behavior of the script on a bad config file. The following config file is missing one of the three required fields:
$ cat basic_config_bad.yaml
loop_count: 3
function_parameter: 'a unicorn'
Running the script on basic_config_bad.yaml exits unsuccessfully and
prints an exception:
Traceback (most recent call last):
File "basic_usage.py", line 22, in <module>
conf = confp.parse_file(sys.argv[1])
[...]
yamlfig.base.ParseError: basic_config_bad.yaml: "do_special_function" is missing
The error is raised within the confp.parse_file function. All
verification and validation occurs as part of that function called in
step 3 of the pattern, so if it return successfully, the YamlConfig
object conforms with the parser rules.
Fields, Paths, and Structure
The fundamental thing that yamlfig rules do is establish which fields should be in a config file and which fields should not.
Test that a field exists
confp.add_rule('dirname')
The first argument to add_rule is the rule_path. Every rule added
to a confp object must have one, and—unless additional
modifiers make the field optional or take a default value—it is
an existence requirement for the field. Given the rule above, any
config file must contain a line such as:
dirname: /var/share/SomeApp/SharedDir
A config file without a dirname field would generate a parse error.
Test that a path exists
Part of YAML's descriptive power comes from its ability to encode nested structures, like maps and lists, and yamlfig rules can describe constraints on that structure:
confp.add_rule('server.storage.dirname')
Rules use the '.' character
to delimit fields within a nested structure. The rule above expects
there to be a server block, within which is contained a storage
block, within which there is a dirname field. The following config
would satisfy such a rule:
server:
storage:
dirname: /var/share/SomeApp/SharedDir
Such a rule path implicitly includes existence requirements for
server and server.storage. The existence of those paths would not
need to be explicitly required through separate rules, unless we
wanted to modify them (e.g., by making them optional or take defaults
as described in a later
section.
Test that a block has a specific substructure
In this example, we need a config file to describe how a server's local storage cache is configured (i.e., where it is on the filesystem, how big it can grow, and what permissions the cache files have).
confp.add_rule('server.storage.dirname')
confp.add_rule('server.storage.maxsize')
confp.add_rule('server.storage.umask')
In combination, these three rules describe the structure that the
server.storage block must have (i.e., three fields with the names
dirname, maxsize, and umask).
The following config file would be accepted by this parser:
server:
storage:
dirname: /var/share/SomeApp/SharedDir
maxsize: 10GB
umask: 0644
Test that a field or path does not exist
Any field or path for which there is no matching rule will raise a parse error. In a sense, the yamlfig field-existence validation is deny-by-default. We do not need to do anything specific to assert that a field does not exist; just don't add an existence requirement.
Test that a block contains fields without specifying which fields by using wildcards
A rule path can contain wildcards. For the sake of this example, we
need a config file to specify upload paths for each of one or more
users. A server.upload_paths block will map from username to their
corresponding upload directory, as in the following example:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
Since we do not want to hardcode the usernames in the parser, we can use a wildcard rule to accept one or more fields within a block without specifying the specific field names:
confp.add_rule('server.upload_paths.*')
Such a rule asserts that the server.upload_paths block contains
non-empty substructure (i.e., it is a block), but not the specific
field names within the substructure. In the above example config
file, the wildcard woudl match both alice and bob, even though
neither are explicitly listed field paths.
A new user could be added with their own upload path, and the same parser would accept the config file:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
carol: /home/carol/tmp
Note that a wildcard rule must match one or more fields, not zero.
If the server.upload_paths block were empty, the config file would
raise an error: "server.upload_paths" must contain at least one field. A later
segment
describes how the optional and default flags can be used with
wildcards to implement a zero-or-more match.
Also note that partial wildcard matches are not currently supported.
A path like server.upload_paths.user-* intending to accept fields
like user-alice and user-bob would instead raise an error. As
described in a different later
segment, one way to implement such a
check would be to write a function that tests every field within a
block against a regular expression, and then specify that function as
a test function for the server.upload_paths block.
Wildcard fields can have substructure and rules can enforce matching substructure
For the sake of this example, a server hosts one or more projects,
each of which has a directory of static web pages associated with it,
and a backend database. Our configuration file maps from one or more
arbitrary project names (e.g., ProjectX and meerkat_works) to
blocks that contain precisely three datapoints (1) a path to a
directory of webpages, (2) a path or URL to a database, and (3) the
type of the database (e.g., sqlite, mysql, or mongodb).
The following config file gives an example of this structure:
projects:
ProjectX:
webpath: /home/alice/projx/html
dbpath: /home/alice/projx/project.db
dbtype: sqlite
meerkat_works:
webpath: /home/bob/public/meerkat/www
dbpath: mongodb://192.168.1.200:27017
dbtype: mongodb
With yamlfig, we can specify wildcards on paths while still requiring that any fields matching the wildcard have a required, fixed-field substructure. The following parser will accept one or more blocks, each corresponding to a project name, but every one must have the three required fields:
confp.add_rule('projects.*.webpath')
confp.add_rule('projects.*.dbpath')
confp.add_rule('projects.*.dbtype')
These rules implicitly assert that the projects block exists and
contains one or more sub-blocks, with no restriction on their field
names. The rules explicitly assert that each one of those sub-blocks
must contain exactly three fields: webpath, dbpath, and dbtype.
This parser would accept the example config file above.
If a project sub-block were missing one of the three required fields or had an extra field, an error would have been raised.
Wildcards are also useful for accepting lists of values
YAML's nested structure supports not only the mappings described above
but also lists. Technically, YAML offers a whole lot of different
nesting types (e.g., omap, pairs, etc.) but our python parser
represents them all as either dict or list objects, with mappings
represented as dict objects and lists as lists. To most easily
and succinctly accommodate both mappings and lists, yamlfig
effectively treats lists as a very specific kind of mapping, where
each field is a list index and each value is the item in the list.
Consider a config file where a users block contains a list of one or
more authorized users:
users:
- alice
- bob
- carol
The following rule would accept such a users block:
confp.add_rule('users.*')
Note that the wildcard rule by itself just ensure that there are subfields, not that they take the form of a list as opposed to a mapping. That same parser would accept a config file with a mapping:
users:
alice: Alice A.
bob: Bob B.
carol: Carol C.
These two config files—the list version and the mapping
version—have very different structures, and a program would
likely be expecting one and not the other. To ensure that a rule with
a wildcard matches only a list (and not a map) or only a map (and not
a list), we would need to use type checking, as described in a later
segment, to assert that
the type of the block is either list or dict respectively.
Also note that when accessing list values parsed into a conf object,
we need to be aware of some difference in their behavior from that of
a standard python list, as described in the Handling Parsed
Objects section.
Test that a list has exactly n elements
While not a typical occurrence, parser rules can be configured to
ensure that a list has a specific number of elements. The following
rules would accept a list of length 2 by explicitly requiring fields
named 0 and 1:
confp.add_rule('network.route.0')
confp.add_rule('network.route.1')
For the sake of the example, perhaps the application must have two network routes, a primary and a secondary.
As noted above, yamlfig treats lists as mappings from numeric fields to values, so the parser would accept the following config:
network:
route:
- 192.0.2.1
- 198.51.100.1
As a side note, the same two rules would accept a config in which the
route block contained a mapping from numeric string fields (i.e.,
"0" and "1") to the two IPv4 addresses. As we keep stressing, the
rules simply treat lists as mappings from numeric fields to the list
elements. To differentiate a list from a mapping, we would need to
use type checking, as described in a later
segment
Handling Parsed Objects
As described in the Basic Usage section, to parse a
config file, a parser's parse_file method would be called with the
name of the file:
conf = confp.parse_file(conffile)
Assuming the parsing and validation succeeds, the conf object would
have type YamlConfig or YamlConfigList, depending on whether the
root-level YAML object in the config file is a mapping or a list.
Typically, a YAML-formatted config file will have a mapping as its
root-level structure, and so we will consider that common case first.
Throughout this section, assume that we have successfully parsed the
following config file into a YamlConfig object named conf:
dirname: /var/share/SomeApp/SharedDir
server:
projects:
ProjectX:
webpath: /home/alice/projx/html
dbpath: /home/alice/projx/project.db
dbtype: sqlite
meerkat_works:
webpath: /home/bob/public/meerkat/www
dbpath: mongodb://192.168.1.200:27017
dbtype: mongodb
users:
- alice
- bob
- carol
- dave
While the focus of this section is on accessing the conf object
after confp successfully parses and validates the config file, for
the sake of completeness, the following rules would configure a
parser that accepts this file:
confp = YamlConfigParser()
confp.add_rule('dirname')
confp.add_rule('server.projects.*.webpath')
confp.add_rule('server.projects.*.dbpath')
confp.add_rule('server.projects.*.dbtype')
confp.add_rule('users.*')
Fields and paths can be accessed as attributes
Fields in a config file can be accessed as attributes of the
YamlConfig object.
conf.dirname # '/var/share/SomeApp/SharedDir'
If an attribute corresponds to a block in a config file, it will
return that block as a YamlConfig or YamlConfigList object.
conf.server # <YamlConfig object at 0x[...]>
conf.users # <YamlConfigList object at 0x[...]>
As such, attributes can be strung together in a sequence:
conf.server.projects.ProjectX.webpath # '/home/alice/projx/html'
conf.server.projects.ProjectX.dbpath # '/home/alice/projx/project.db'
conf.server.projects.ProjectX.dbtype # 'sqlite'
conf.server.projects.meerkat_works.dbtype # 'mongodb'
Note that to be accessed as an attribute, a field must be a valid Python attribute (e.g., must be a string, cannot start with a number, etc.).
Fields and paths can be accessed via index lookups
A values stored in a YamlConfig object can also be accessed via
index lookup.
conf.server.projects['ProjectX'].dbtype # 'sqlite'
proj = 'ProjectX'
conf.server.projects[proj].dbtype # 'sqlite'
conf['server']['projects']['ProjectX']['dbtype'] # 'sqlite'
path = ['server', 'projects', 'ProjectX', 'dbtype']
functools.reduce(lambda d, idx: d[idx], path, conf) # 'sqlite'
List values can be accessed via index lookups
Index lookups must be used to access the elements of a YamlConfigList
since attributes cannot be numbers.
conf.users[0] # 'alice'
conf.users[1] # 'bob'
conf.users[2] # 'carol'
conf.users[3] # 'carol'
conf.users[-1] # 'dave'
conf.users[-2] # 'carol'
In a departure from standard python lists, a YamlConfigList object
will translate to or from a string representation of an index as
needed.
conf.users["1"] # 'bob'
conf.users['-2'] # 'carol'
Once again, this is to allow—as much as possible—lists to be treated like mappings from the list indexes to the list elements.
Length checks can be used to determine the number of fields
As with dict and list objects, we can see how many elements are
in a YamlConfig and YamlConfigList object by querying their
length.
len(conf) # 3
len(conf.server) # 1
len(conf.server.projects) # 2
len(conf.server.projects.ProjectX) # 3
len(conf.server.projects.meerkat_works) # 3
len(conf.users) # 4
Iterators return field names for YamlConfig objects
Iterating on a YamlConfig object will return the field names
contained within the block, like what we would get from iterating on
a dict object:
list(conf) # ['dirname', 'server', 'users']
list(conf.server) # ['projects']
list(conf.server.projects) # ['ProjectX', 'meerkat_works']
list(conf.server.projects.ProjectX) # ['webpath', 'dbpath', 'dbtype']
list(conf.server.projects.meerkat_works) # ['webpath', 'dbpath', 'dbtype']
Note that the order in which YamlConfig fields are returned is the
order the rules were added to the parser, not the order in which the
rules appear in the config file. When a single parser rule matches
multiple fields (i.e., a wildcard rule), the fields are returned in
arbitrary order. Note however, that around Python 3.6 and Python 3.7,
they have started being returned in the order they appear in the
config file, likely due to dict objects beginning to return keys in
the order they were inserted.
Iterators return list indexes not values for YamlConfigList objects
Iterating on a YamlConfigList object is significantly different
from iterating on a python list. In particular, it will return the
list of index values as strings, not the actual list values:
list(conf.users) # ['0','1','2','3']
As noted previously, a YamlConfigList treats lists less like lists
per-se and more like mappings from zero-based, sequential, numeric
indexes to values. As such, its iterator returns field names that can
be used as indexes to look up values, not the values themselves.
This behavior is likely unexpected at first and arguably
controversial, but was chosen for greater overall simplicity. A lot
of code can iterate over fields, descend into blocks, and so on much
more simply, when it does not need to treat YamlConfigList objects
as a special case, separate from YamlConfig objects.
To get the values rather than the indexes, we recommend list comprehension:
[conf.users[idx] for idx in conf.users] # ['alice', 'bob', 'carol', 'dave']
Unlike the fields of a YamlConfig, indexes of a YamlConfigList
will be returned in a specific order: sequential and increasing from
a base of zero.
Optional, Default, and No-Follow Rules
Having examined how to configure a parser to require certain fields and structure, and how values will be represented in the parsed object, we introduce ways to make rules optional, take default values, and have the parser ignore their substructure.
A field flagged as optional can be omitted
When instantiating and adding a new rule, we can specify optional=True:
confp.add_rule('name')
confp.add_rule('description', optional=True)
The above parser would require a name field but not a description
field, as in the following config file:
name: Simple Single-File Server
The parser will accept the file, create a description field, and assign
it the value None.
conf.name # 'Simple Single-File Server'
conf.description # None
A program acting on the conf object can assume that the optional
field exists, but it will have the value None if it was not present
in the config file (or if it was explicitly assigned the value None
since the two are treated as equivalent).
Optional fields can have required substructure
A rule representing a nested block can be marked optional and still
have substructure with required fields. For the sake of example, a
server requires three files in order to encrypt its communications
using SSL. If a server.ssl block is present in the config file,
those files must be provided, and the server will use SSL. If the
block is omitted, the config file should still be accepted, but the
server will fall back to unencrypted communications.
The following parser is configured with an optional server.ssl block
that, if it exists, must have three specific fields:
confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')
In the following config, the optional ssl block and its substructure
have been omitted:
server:
addr: 127.0.0.1
port: 81
Since the block was not included, the ssl field is present in the
conf object but assigned a value of None.
conf.server.ssl # None
In the following config, the optional ssl block and its substructure
have been included:
server:
addr: 127.0.0.1
port: 81
ssl:
key: /etc/ssl/privkey.pem
cert: /etc/ssl/cert.pem
chain: /etc/ssl/full_chain.pem
Since the block was included, its substructure was parsed and
validated. The conf object includes the block and its substructure.
conf.server.ssl # <YamlConfig object at 0x[...]>
conf.server.ssl.key # '/etc/ssl/privkey.pem'
conf.server.ssl.cert # '/etc/ssl/cert.pem'
conf.server.ssl.chain # '/etc/ssl/full_chain.pem'
The existence requirements on the substructure will only be checked
and enforced if the optional field is present. In the following
config, the optional ssl block is present, but it is missing one of
its required fields:
server:
addr: 127.0.0.1
port: 81
ssl:
key: /etc/ssl/privkey.pem
# cert: /etc/ssl/cert.pem
chain: /etc/ssl/full_chain.pem
When parse_file is invoked on this config file, a ParseError is
raised: "server.ssl.cert" is missing.
A default field will take a default value if omitted
When instantiating and adding a new rule, we can specify a default.
confp.add_rule('server.addr', default='127.0.0.1')
confp.add_rule('server.port')
In the following config, the default rule has been omitted:
server:
port: 81
The parser will accept the file, create not only a port field but
also an addr field within the server block, and since the addr
field does not appear in the config, it will assign the default value
(127.0.0.1) to the field.
conf.server.port # 81
conf.server.addr # '127.0.0.1'
Default substructure must still undergo validation
The following config rule will provide an entire server block if none is specified in the config file:
confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})
Providing such structure is possible, but the above rule would
generate a ParseError unless it was accompanied by rules to accept
the server.addr and server.port paths. With only the rule above,
a config file that triggered the default would raise a ParseError:
"server.addr" unexpected by parser.
We need to add rules to prepare the parser for the substructure, as in the following parser that accompanies the default rule with two more:
confp.add_rule('server', default={'addr': '127.0.0.1', port: 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')
With these two additional rules, a config file will be accepted with
the default values if server is omitted, and it will require those
two values be present if a server block is present. In both cases,
once parsing is successful, the program can assume that
conf.server.addr and conf.server.port exist.
If we really did not want to validate the fields of the default substructure, rather than adding rules for the fields, we could mark the block as no-follow as described in a later segment.
Fields cannot both be optional and take a default
The optional and default parameters to add_rule are mutually
exclusive; if both are specified, an error will be raised.
Essentially, optional=True acts like a default rule for which the
default value is None. In fact, setting optional=True is the only
way for a missing field to be assigned a value of None, since
setting default=None is a no-op. A value of None for default
actually signals that no default has been specified, so the field is
still required.
It is unclear what the semantics would even be for an optional rule that also takes a default, so the pairing is just not allowed.
A default path can have optional subpaths and vice versa
In the following parser configuration, the server block takes a default,
while the server.ssl block is optional:
confp.add_rule('server', default={'addr': '127.0.0.1', 'port': 81})
confp.add_rule('server.addr')
confp.add_rule('server.port')
confp.add_rule('server.ssl', optional=True)
confp.add_rule('server.ssl.key')
confp.add_rule('server.ssl.cert')
confp.add_rule('server.ssl.chain')
We can see what will happen in the following config, where the
server field is omitted. Note that this config file uses the
convention that leaving a field value blank assigns it a value of
None (or null in YAML terms), and that causes it to be treated as
omitted by yamlfig:
server:
The parser above will accept this config file. Since the server
block has been omitted, it will substitute its default value. Since
the server.addr and server.port fields are provided by the
default, they will pass the rules requiring their existence. The
server.ssl field has not been provided by the default, but since
it is flagged as optional, the field will be created and assigned a
value of None:
conf.server # <YamlConfig object at 0x[...]>
conf.server.addr # '127.0.0.1'
conf.server.port # 81
conf.server.ssl # None
The reverse is also true. Default fields can be included in the substructure of an optional field, and they will take the default values if they are omitted from the config but the optional block is included. Other combinations work as well (e.g., default fields within default blocks; optional fields within optional blocks; optional fields within default blocks within optional blocks; etc.).
If it helps, we can think of optional and default flags being handled
from the top down in a cascade. If a parent field is omitted, it will
be checked for optional or default flags first. If it is
optional, the field will be created with a None value and the
parsing will move on. If it takes a default, the field will be
created with the default value or substructure, and the parser will
descend into that substructure, checking those fields and values before
moving on. The parser will only encounter child fields and values
after the parent's optional or default nature has been handled.
Default block and optional wildcarded path recognize zero-or-more
As noted earlier, a wildcard rule path requires that a config file have one or more fields matching the path. By default, a wildcard rule will raise an error if there are no fields matching it, but there are times when we want to accept zero-or-more matches.
The following rules configure a parser for cases where we want a block with zero or more subfields:
confp.add_rule('server.upload_paths', default={})
confp.add_rule('server.upload_paths.*', optional=True)
The optional flag on the wildcard path will cause the parser to
allow the upload_paths block to contain no fields. The default on
the upload_paths field will create that empty block if the field is
null. The following config file would be accepted by this parser:
server:
upload_paths:
By leaving upload_paths null, we signal that it has been explicitly
omitted, so it is given its default value (i.e., an empty mapping).
Then, since the wildcard path for the fields within upload_paths is
flagged as optional, that rule will be satisfied with zero fields.
The YamlConfig object would exist but contain zero fields:
conf.server.upload_paths # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths) # 0
For completeness sake, the same parser would also accept a config
file with one or more fields within upload_paths:
server:
upload_paths:
alice: /home/alice/uploads
bob: /home/bob/public
In this case, the YamlConfig object would contain two fields:
conf.server.upload_paths # <YamlConfig object at 0x[...]>
len(conf.server.upload_paths) # 2
set(conf.server.upload_paths) # {'alice','bob'}
This pattern—with the block taking an empty substructure as the default and the wildcard rule flagged as optional—is the recommended way to implement a parser that accepts zero-or-more fields or list elements.
A path marked no-follow can have any and arbitrary substructure
In some cases, we want to stop a yamlfig parser from attempting to validate a substructure, either because the program is designed to handle whatever is beneath that value or, more often the case, the actual structure follows a complicated syntax, but the program will be passing that structure to another package, and it has its own functions for validating the input.
For example, consider an example where a program needs to pull back a list of projects from a MongoDB database. The following config file provides values that might be needed to (1) access the database, (2) reference the specific collection within the database, and (3) filter the results to only a subset of all projects, using a MongoDB query:
mongodburl: mongodb://192.168.1.200:27017/
collection: projects
filterquery: { 'is_private': { '$ne': true } }
All three fields must exist, but the filterquery field contains a
MongoDB query as its substructure. MongoDB queries can be expressed
as JSON objects, and YAML syntax is a superset of JSON, so the query
can be expressed as JSON/YAML right within the YAML config file.
By default, the yamlfig parser will try to validate that object
(i.e., check whether the paths filterquery.is_private and
filterquery.is_private['$ne'] are expected by the parser).
Configuring a confp parser to correctly validate the syntax of an
arbitrary MongoDB query is impossible and an unnecessary waste of
complexity. As soon as the script hands the query off to MongoDB, it
is going to do a much better job of validating it.
The following parser is configured to accept the above config file:
confp.add_rule('mongodburl')
confp.add_rule('collection')
confp.add_rule('filterquery', nofollow=True)
This parser will require that a filterquery field exists along with
the mongodburl and collection fields, but the nofollow argument
ensures that it will not descend into the substructure within the
filterquery field. No additional validation of that substructure
will take place.
The value at conf.filterquery is a standard python dict which can be
passed to a MongoDB find command as-is.
conf.filterquery # {'is_private': {'$ne': True}}
As an aside, note that until this example, we have been using YAML block-structure syntax rather than JSON syntax, but there is no difference between the two formats once parsed. The config file above could have been written equivalently as follows:
mongodburl: mongodb://192.168.1.200:27017
collection: projects
filterquery:
is_private:
"$ne": true
A distinct alternative would have been to encode the MongoDB query object as a string, as in the following line:
filterquery: "{ is_private: { $ne: { true }}}"`
While a viable alternative, there are benefits to not doing so. By storing the query object as a query object, we actually do perform some syntax checking at parse time, before handing it off to MongoDB. We ensure that the brackets are balanced and the JSON is legal. We also get whatever syntax highlighting our editor provides to YAML/JSON. A string would simply be treated as a string by the yamlfig parser, and we would eventually have to invoke a JSON parser ourselves.
A path marked no-follow can also be optional or take defaults (but not both)
The nofollow parameter really affects the handling of the value not
the field, whereas optional and default are parameters that affect
the handling of the field (i.e., what to do if it is omitted).
If a field is omitted from a config, and if its path is marked in the
parser as both optional and no-follow, the field will be created and
assigned the value None. If it were not optional, an error would be
raised. Since the value None is terminal and has no substructure,
being marked no-follow has little effect.
If a field is omitted from a config, and if its path is marked in the
parser as no-follow and taking a default, the field will be created
and the default value will be substituted. If the default value has
substructure (i.e., it is a dict or a list), then the no-follow
marking would apply and no additional validation would be performed by
the parser on that substructure.
A path marked no-follow cannot have any subrules
A parser configuration such as the following would raise an error:
confp.add_rule('filterquery', nofollow=True)
confp.add_rule('filterquery.is_private')
The no-follow condition on a path means that no rules on descendant
paths will ever be checked or validated, so we prevent such rules from
being added. For this example, a ValueError would be raised
explaining that "filterquery.is_private" is a descendant of a no-follow rule.
Path Type Checking
Within the yamlfig parser, after establishing that every required field exists, that every optional or default field has been handled, and that there are no unexpected fields, the parser's next step is to check that any type assertions on the values for each field are satisfied.
Ensure that a field is a str (or int or bool or float, etc.)
When instantiating and adding a new rule, we can specify a path_type.
confp.add_rule('server.addr', path_type=str)
confp.add_rule('server.port', path_type=int)
In addition to requiring that the server block contains an addr
field and a port field, these rules will further check that the
values are instances of the given path_type types.
The following config has a null in the addr field:
server:
addr: ~
port: 81
The above parser would raise an error: "server.addr" has type NoneType not type str.
Likewise, the following config has a string in the port field:
server:
addr: 127.0.0.1
port: "81"
The above parser would raise an error: "server.port" has type str not type int.
The type that a value takes is determined by the underlying raw-YAML
parser that yamlfig uses. By defalt, we use SafeLoader within
PyYAML. It recognizes the
following types:
boolstrunicode(in Python 2, when the value contains non-ASCII characters)intlong(in Python 2, when the value is larger thansys.maxint)floatdate(in thedatetimepackage)datetime(in thedatetimepackage)dict(for mappings and mapping-like tags)list(for lists and list-like tags)NoneType(i.e.,path_type=type(None))
Any of those types could arise in a config and be accepted or rejected
by a path_type argument. Additionally, it is possible to replace
SafeLoader with a different YAML parsing class, in which case the
set of types would depend on what types it constructed.
Union types handle complex types like a number or a string
In the following config, the timeout field will be parsed as a float:
server:
timeout: 1.2
But in the following config, the timeout value will be parsed as an int:
server:
timeout: 1
Assuming the underlying server wants a float but handles the
conversion of an int all by itself, we don't really want to force a
user to add spurious decimal points (e.g., changing 1 to 1. will
ensure the parser returns a float) just to consistently achieve a
single type across all configs.
The following parser configuration rule will accept a timeout that
is either an int or a float:
confp.add_rule('server.timeout', path_type=(int, float))
By specifying a tuple of types, we can direct yamlfig to accept values that are instances of either type.
This feature was a lot more urgent in Python 2, where we almost always
wanted a string to be checked against (str, unicode), so that the
appearance of a word with an accent or umlaut in a descriptive string
wouldn't suddenly cause our config file to be rejected. Things have
gotten calmer with Python 3 (e.g., str vs unicode and int vs
long are no longer issues), but union types do still arise (e.g.,
int vs float).
Ensure that a path contains a map or a list
Consider this parser configured to accept mappings from project names to descriptions:
confp.add_rule('projects', path_type=dict)
confp.add_rule('projects.*')
Contrast it with this parser configured to accept lists of authorized user names:
confp.add_rule('users', path_type=list)
confp.add_rule('users.*')
In both cases, the wildcard rule accepts one-or-more arbitrary fields
within the block, but as noted in an earlier
segment,
the wildcard does not distinguish a map from a list. The
path_type=dict constraint is what ensures that the first parser
accepts config files with a map, like this:
projects:
ProjectX: "Project X is an eXtreme project (for more info talk to Alice)"
meerkat_works: "Bob's not-quite skunkworks project"
The path_type=list constraint is what ensures that the second parser
accepts config files with a list, like this:
users:
- alice
- bob
- carol
Since few programs are written to expect either a mapping or a list, we typically want to use type checking to ensure that a config-file block contains the expected structure.
A config file itself can be verified as either a list or a map
While we have so far considered YAML files with a map structure at the root level, a YAML file could also be a list:
- addr: 192.0.2.200
port: 81
- addr: 192.0.2.201
port: 81
- addr: 198.51.100.15
port: 8080
- addr: 203.0.113.130
port: 8080
In this example, these address-port pairs might be a list of mirrors, ordered by proximity. The following parser is configured to check that the YAML file itself is a list, and then that each element of the list has the proper substructure:
confp = YamlConfigParser(path_type=list)
confp.add_rule('*.addr', path_type=str)
confp.add_rule('*.port', path_type=int)
After reading and validating the above config file, this parser
returns a YamlConfigList object:
len(conf) # 4
conf[0].addr # '192.0.2.200'
conf[0].port # 81
Note that in this example, we actually included the instantiation of
the YamlConfigParser as confp. All our previous examples (after
Basic Usage) assumed that step. But when we are
making assertions about the top-level object parsed from the config
file, those are configured as part of the YamlConfigParser
instantiation.
Down deep, every rule that gets added to the parser is of type
YamlConfigRule. The YamlConfigParser class inherits from
YamlConfigRule, and any arguments are used to validate the
root-level object rather than any particular field or path within the
object. The primary difference between the root YamlConfigParser
object and the YamlConfigRule objects that are added to it is that
the YamlConfigParser cannot have a rule_path while the other
objects must. Additionally, the root-level object cannot be optional
or take a default. (It can be flagged no-follow, though.)
A config file cannot be an atomic value; it must be a map or list
A yamlfig parser will not accept a config file without any substructure. Technically, this is a valid YAML file:
42
A standard YAML parser will parsed it as an int. However,
yamlfig will raise an error: config is a(n) int but a record or list is expected.
Honestly, if our program must accept config files consisting of a single value, yamlfig might not be the right tool for the situation. If we still desired to make a go of it, we could nest that value in a single-field mapping, like so:
number: 42
The following parser would accept that config file, with a single field, the value of which is a list:
confp = YamlConfigParser(path_type=dict)
confp.add_rule('number', path_type=int)
Even more concisely, we could nest the value in a singleton list:
- 42
The following parser would accept such a config file:
confp = YamlConfigParser(path_type=list)
confp.add_rule('0', path_type=int)
Note the space between - and 42. Without it, the singleton list
collapses back to a single (negative) integer:
-42
Such a single value would not be accepted.
Rule Test Functions
While type checking helps validate the values in a config file, we
often want to place additional constraints on those values. For
instance, we might want a value not only to be an int but to fall
within a particular range. We might want another value not only to be
a str, but also to match a regular expression. We might want a
third value not only to be a str but also to point to an existing
file.
When adding a rule to a parser, we can specify a test function using
the test argument, to perform additional checking of values. The
yamlfig.test package contains a variety of pre-packaged test
functions for some common validation scenarios.
Verify that a value matches a regular expression
This parser rule will constrain username to start with a lower case
letter and be followed by zero or more lowercase letters or numbers:
confp.add_rule('username', test=test.is_regex('^[a-z][a-z0-9]*$'))
This config file would be accepted by such a parser:
username: 'carol57'
This config file would be rejected:
username: 'Carol C.'
The parse error would include the explanation "username" failed test: "Carol C." does not match /^[a-z][a-z0-9*$/.
Verify that a value is an IPv4 address
This parser rule will constrain the addr field of the server block to
be a valid dotted-quad IPv4 address:
confp.add_rule('server.addr', test=test.is_ipv4_address)
Test functions packaged within yamlfig
The yamlfig.test module that contains various common validation
tests has been directly imported as the test object in the examples
of this documentation, but would otherwise be accessed as
yamlfig.test (e.g., test=yamlfig.test.is_regex('^[a-z][a-z0-9]*$')).
The following test functions are available in the yamlfig.test
module:
-
is_interval(lower, upper)verifies that the value is within the range defined by the lower and upper bounds; -
is_regex(regex, invert=False)verifies that the value matches the regular expression (or does not match it, if inverted); -
is_ipv4_addressverifies that the value is an IPv4 address in dotted-quad notation; -
is_domain_nameverifies that the value conforms to the specification of a DNS domain name (which, note, is a looser constraint than that it be an actual operating and reachable domain name); -
is_email_addressverifies that the value (roughly) conforms to the specification of an email address; -
is_urlverifies that the value (roughly) conforms to the specification for URLs; -
is_file_path(*ostests)takes one or more strings corresponding to properties of filesystem objects, interprets the value as a filesystem path, and verifies that the path satisfies all of the listed properties. Properties include:'exists'and'!exists': the path exists (or is not);'isdir'and'!isdir': the path is a directory (or is not);'isfile'and'!isfile': the path is a file (or is not);'islink'and'!islink': the path is a symlink (or is not);'ismount'and'!ismount': the path is a mount point (or is not).
Note that for higher-order functions (i.e., the ones that return the
test functions suitable for test), not all optional parameters are
shown and described. Check the help documentation for each function
for additional detail on usage and options.
Writing our own test functions
The test parameter to add_rule takes a function with three
parameters: conf, path, and value. It signals acceptance of the
value by returning None, and rejection by returning a string
explaining what caused the failure.
Some of the test functions packaged within yamlfig, like
test.is_ipv4_address, directly match that specification. Others,
like test.is_regex, return a function that matches the specification
based on their arguments.
In an earlier
segment,
we explained that rule paths could not contain partial wildcards
(e.g., user-* to require that all fields start with a particular
prefix). However, test functions do offer a way to recognize such
properties and more. Consider a config file in which a block must
contain a default field, and can optionally contain zero or more
fields that must all match a partial wildcard like user-*.
Test functions offer a way to implement any test that we can write as a Python function:
def has_default_and_user_fields(conf, path, value):
if not 'default' in value:
return '"default" field is missing'
for field in value:
if field != 'default' and not field.startswith('user-'):
return '"{0}" is neither "default" nor starts with "user-"'.format(field)
return None
confp = YamlConfigParser()
confp.add_rule('uploads', test=has_default_and_user_fields)
confp.add_rule('uploads.*', path_type=str, optional=True)
The following config file would be accepted by this parser:
uploads:
default: /var/share/SomeApp/uploads
user-alice: /home/alice/uploads
user-bob: /home/bob/public
The following config file would be rejected by this parser:
uploads:
user-alice: /home/alice/uploads
user-bob: /home/bob/public
The parse error would include the explanation "uploads" test failed: "default" field is missing.
Of the three parameters (i.e., conf, path, and value), all of
the packaged test functions depend only on the value, and that will
typically be the case. By providing the entire conf object as well
as the path to the value being verified, yamlfig enables the
test to evaluate the value in the context of the rest of the config
file if necessary.
Warnings and Caveats
-
Field names with leading underscores – While accessing
YamlConfigfields as attributes is convenient, the drawback is that any field names that start with a leading underscore risk colliding with the methods and attributes that implement the class. Consequently, if the parser encounters any field that start with'_', a warning will be raised. The warning can be suppressed by settingyamlfig.print_underscore_warning.off = True. As with non-string field names, we can always look up a field with leading underscores via index lookup (e.g.,conf["_field"]rather thanconf._field), but the warning is intended to make us aware of the possibility for collision. -
References to
transformfunctions in code and documentation – The API for theYamlConfigRuleclass and theYamlBaseConfigclasses expose references to atransformfunction or the ability to invokedo_transformon the container object. The ability to specify a transform is planned (and described in more detail below). As the code indicates, work on this feature was already underway when this version was released, but consider it untested, incompletely documented, and subject to change.
Next Steps
Future Work
Several features are already on our list of things we would like to or have started to implement:
-
Allow standard fielded rules to co-exist alongside wildcard rules, with the standard rule taking precedence if it matches and the wildcard being used as a catch-all. We probably still want to disallow partial wildcards since (a) they can already be handled with test functions, and (b) they would raise the possibility of allowing multiple partial wildcard rules attached to the same parent path, and that would raise all sorts of ambiguity about what to do if multiple rules match the same field.
-
Allow a user to more easily specify a constraint on field names, for instance a
field_typeto do the same type checking on a field thatpath_typedoes on the value, and/or afield_regexto specify a pattern that the field must match. Currently these are possible, but would require the user to implement their own rule test, as described above. -
Add options to
test.is_file_paththat (1) allow the user to specify a directory from which all relative paths are resolved, and (2) allow the user to specify a path intoconfwhere such a base directory would be stored. These would enable support for a config file where onehomepathfield specifies where the program willchdirto, and then all the other paths (e.g.,dbfileorhtmldirare specified relative tohomepath). -
Extend the
rule_pathspecification to allow us to express field types that are not strings or are strings that include 'special characters' like whitespace, the delimiter (.), or the wildcard (*). Right now, we're thinking of using square brackets in a rule path, so that the string resembles the path that would be used to access the value once parsed (e.g.,rule_path="dbhosts['192.0.2.1'].port"would indicate the config file had a structure where adbhostsblock contained a field field named192.0.2.1which maps to a sub-block that has aportfield). -
Implement a
test.is_in_choicesetwhich verifies that the value is one of the configured options or choices. So, if a rule for adbtypepath hadtest=test.is_in_choiceset(['sqlite', 'mysql', 'mongodb']), it would verify that thedbtypevalue took one of those values, returning an explanatory error message if not. -
Extend
YamlConfigParserwith a function to write a config-file template (or and actual config file if provided with aconfobject) to use for the values. For each rule, it is already possible to specify adescdescribing the purpose of the path, and anexamplevalue. These values could be incorporated into the config-file template, making it somewhat self documented, and making it easier for a program to provide its users with a template. In truth, we would probably want to implement this as aRepresenterclass that inherits from and extendsSafeDumper, but with that class invoked by something likeconf.write_file(filename, conf=None). -
Extend
YamlConfigRulewith support for transformations. A goal for yamlfig was to gather into one package all of the things that we find ourselves doing over and over again when we read in a config file (e.g., checking whether fields exist, that they have the right types, and that they meet various other conditions). Another thing we do at this stage is converting them to the objects that we really want to use in our program. For instance, we don't really want the path to the log file, we want the open filehandle to it; we don't really want the IPv4 address in dotted-quad notation, we want theIPv4Addressobject that we can construct with it. Some initial groundwork for such transformations already exists within the code, but it needs to be built out and tested (lots and lots of testing, especially the interaction with the write-out-configs extension described above).
This list is neither exhaustive nor a promise of what is certain to come. Other suggestions are also welcome, of course, too.
Support and Collaboration
We welcome reports of issues and other contributions through our package's page on GitHub:
Note that this is our first open-source project, and it was shared in part so that we could get more experience with the standard tools and workflows. We aim to respond to any issues, requests, or other feedback promptly and professionally, but some understanding may be required since we are learning as we go.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yamlfig-1.0.0.tar.gz.
File metadata
- Download URL: yamlfig-1.0.0.tar.gz
- Upload date:
- Size: 77.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
96392a5bf319c6ae8bdc2d23249c9619bd73b56796903c83e98466feeb54416e
|
|
| MD5 |
3312656d5cf900dd0998f137fca40d3f
|
|
| BLAKE2b-256 |
5524dbafde6bbbdf6a7901a0da476c99fd7dae8249cae61833dbcf5f6b21cf70
|
File details
Details for the file yamlfig-1.0.0-py2.py3-none-any.whl.
File metadata
- Download URL: yamlfig-1.0.0-py2.py3-none-any.whl
- Upload date:
- Size: 38.1 kB
- Tags: Python 2, Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/47.1.1 requests-toolbelt/0.9.1 tqdm/4.46.1 CPython/3.6.8
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
e7e06a297708bd13d90dbc3b0aafa92829cfb24babf67cbb712269da7ecb2319
|
|
| MD5 |
fb2b58f6f5e132b0e97fc47b512fbe56
|
|
| BLAKE2b-256 |
a189ed93707b0e2217f42b5b1772d49429d0669eb82ee97f9d8af8855d6ccaa5
|