A parser for the KDL language.
Project description
KDL-py
A handwritten Python 3.7+ implemenation of a parser for the KDL Document Language, fully compliant with KDL 1.0.0.
KDL is, as the name suggests, a document language, filling approximately the same niche as JSON/YAML/XML/etc. It combines the best of several of these languages, while avoiding their pitfalls: more general than JSON and more powerful than XML, while avoiding the verbosity of XML or the explosive complexity of YAML.
Installing
[will be on pypi shortly]
When installed, a kdlreformat
command-line program is also made available,
which can canonicalize a KDL document. See [below][#kdlreformat] for options.
Using
The kdl.parse(str, parseConfig?)
function parses, you guessed it, a string of KDL into a KDL document object:
import kdl
>>> import kdl
>>> doc = kdl.parse('''
... node_name "arg" {
... child_node foo=1 bar=true
... }
... ''')
>>>
>>> doc
Document(
children=[
Node(
name='node',
values=['arg'],
children=[
Node(
name='child',
properties=OrderedDict([
('foo', 1.0),
('bar', True)
])
)
]
)
]
)
You can also create a kdl.Parser()
object and call its .parse()
method; Parser
objects can set up parsing and printing options that'll apply by default. See below for how to configure parsing options.
Either way, you'll get back a kdl.Document
object, which is fully mutable. By default, untagged KDL values are represented with native Python objects.
>>> doc.children[0].children[0].properties["foo"] = 2
>>>
>>> print(doc)
node_name "arg" {
child_node bar=True foo=2
}
Stringifying a kdl.Document
object will produce a valid KDL document back. You can also call doc.print(printConfig?)
to customize the printing with a PrintConfig
object, described below. See below for how to configure printing options.
Inserting Native Values
kdl-py allows a number of native Python objects to be used directly in KDL documents by default, and allows you to customize your own objects for use.
kdl-py automatically recognizes and correctly serializes the following objects:
bool
: as untaggedtrue
orfalse
None
: as untaggednull
int
,float
: as untagged decimal numberstr
: as untagged stringbytes
: as(base64)
-tagged stringdecimal.Decimal
: as(decimal)
-tagged stringdatetime
,date
, andtime
: as(date-time)
,(date)
, or(time)
-tagged stringsipaddress.IPv4Address
andipaddress.IPv6Address
: as(ipv4)
or(ipv6)
-tagged stringsurllib.parse.ParseResult
(the result of callingurllib.parse.urlparse()
): as(url)
-tagged stringuuid.UUID
: as(uuid)
-tagged stringre.Pattern
(the result of callingre.compile()
): as(regex)
-tagged raw string
All of the tags used above are reserved and predefined by the KDL specification.
In addition, any value with a .to_kdl()
method
can be used in a kdl-py document.
The method will be called when the document is stringified,
and must return one of the kdl-py types,
or any of the native types defined above.
(For parsing KDL into these native types,
or your own types,
see the ParseConfig
section, below.)
Customizing Parsing
Parsing can be controlled via a kdl.ParseConfig
object,
which can be provided in three ways.
In order of importance:
- Passing a
ParseConfig
object tokdl.parse(str, ParseConfig?)
orparser.parse(str, ParseConfig?)
(if you've constructed akdl.Parser
). - Creating a
kdl.Parser(parseConfig?, printConfig?)
, which automatically applies it to its.parse()
method if not overriden. - Fiddling with the
kdl.parsing.defaults
object, which is used if nothing else provides a config.
A ParserConfig
object has the following properties:
-
nativeUntaggedValues: bool = True
Controls whether the parser produces native Python objects (
str
,int
,float
,bool
,None
) when parsing untagged values (those without a(foo)
prefix), or always produces kdl-py objects (such askdl.String
,kdl.Decimal
, etc). -
nativeTaggedValues: bool = True
Controls whether the parser produces native Python objects when parsing tagged values, for some of KDL's predefined tags:
i8
,i16
,i32
,i64
,u8
,u16
,u32
,u64
on numbers: Checks that the value is in the specified range, then converts it to anint
. (It will serialize back out as an ordinary untagged number.)f32
,f64
on numbers: Converts it to afloat
. (It will serialize back out as an ordinary untagged number.)decimal64
,decimal128
on numbers, anddecimal
on strings: Converts it to adecimal.Decimal
object. (Always reserializes to a(decimal)
-tagged string.)date-time
,date
,time
on strings: Converts to adatetime
,time
, ordate
object.ipv4
,ipv6
on strings: Converts it to anipaddress.IPv4Address
oripaddress.IPv6Address
object.url
on strings: Converts it to aurllib.parse.ParseResult
tuple.uuid
on strings: Converts it to auuid.UUID
object.regex
on strings: Converts it to are.Pattern
object. (It will serialize back out as a raw string.)base64
on strings: Converts it to abytes
object.
-
valueConverters: Dict[str, Callable] = {}
A dictionary of tag->converter functions, letting you parse tagged values (like
(date)"2021-01-01"
) into whatever types you'd like.Whenever a value is encountered with the given tag, your converter will be called with two arguments: the fully-constructed kdl-py object, and a
ParseFragment
object giving you access to the precise characters parsed from the document. Whatever you return will be inserted into the document instead.(Note that this does not specialize on value type; a converter set to handle, say, a "base6" tag, intending it to be used on numbers like
(base6)123450
, will get called for(base6)"a string"
too. If you intend to only handle specific types of values, make sure to check the value's type and return it unchanged if you don't intend to handle it.)You can produce KDL values (such as parsing
(hex)"0x12.e5"
into akdl.Decimal
, since KDL doesn't support fractional hex values), or into any other type. Note that non-kdl-py types are automatically handled by the printer if they have a.to_kdl()
method. -
nodeConverters: Dict[Union[str, Tuple[str, str]], Callable] = {}
Similar to
valueConverters
, except the converters here are called onkdl.Node
s.The keys for the map are different, as well, because the node name is as important or more than the tag for indicating identity. You can use either a
(tag, name)
tuple, which will be called only when both match, or just aname
string (not a tag), which will be used when there's not a more specific tag+name match.
ParseFragment
`kdl.ParseFragment` is passed to your custom converters,
specified in `kdl.ParseConfig.tags`,
giving you direct access to the input characters
before any additional processing was done on them.
This is useful, for example,
to handle numeric types
that might have lost precision in the normal parse.
It exposes a `.fragment` property,
containing the raw text of the value
(after the tag, if any).
It also exposes a `.error(str)` method,
which takes a custom error message
and returns a `kdl.ParseError`
with the `ParseFragment`'s location already built in.
This should be called if your conversion fails for any reason.
Customizing Printing
Like parsing, printing a kdl-py Document
back to a KDL string can be controlled by a kdl.PrintConfig
object,
which can be provided in three ways.
In order of importance:
- Passing a
PrintConfig
object todoc.print(PrintConfig?)
. - Setting
doc.printConfig
to aPrintConfig
. (This is done automatically for any documents produced by aParser
, if you pass theprintConfig
option to the constructor.) - Fiddling with the
kdl.printing.defaults
object, which is used if nothing else provides a config.
A PrintConfig
object has the following properties:
-
indent: str = "\t"
The string used for each indent level. Defaults to tabs, but can be set to a sequence of spaces if desired (or anything else).
-
semicolons: bool = False
Whether or not nodes are ended with semicolons. (The printer always ends nodes with a newline anyway, so this is purely a stylistic choice.)
-
printNullArgs: bool = True
When
False
, automatically skips over any "null"/None
arguments. This will corrupt documents that use the "null" keyword intentionally, but can be useful if you'd prefer to use aNone
value as a signal that the argument has been removed. -
printNullProps: bool = True
Identical to
printNullArgs
, but applies to properties rather than arguments. -
respectStringType: bool = True
When
True
, the printer will output strings as the same type they were in the input, either raw (r#"foo"#
) or normal ("foo"
). WhenFalse
, the printer always outputs normal strings.Note that this only has an effect on
kdl.String
andkdl.RawString
objects; if the document contains Pythonstr
objects, they will always output as normal strings. -
respectRadix: bool = True
Similar to
respectStringType
, whenTrue
the printer will output numbers as the radix they were in the input, like0x1b
for hex numbers. WhenFalse
, the printer always outputs decimal numbers.Again, this only has an effect on kdl-py objects; native Python numbers are printed as normal for Python.
-
exponent: str = "e"
What character to use for the exponent part of decimal numbers, when printed with scientific notation. Should only be set to "e" or "E".
Like the previous options, this only has an effect on kdl-py objects; native Python numbers are printed as normal for Python.
Full API Reference
in progress
kdl.parse(str, config: kdl.ParseConfig?) -> kdl.Document
kdl.Parser(parseConfig: kdl.ParseConfig?, printConfig: kdl.PrintConfig?)
parser.parse(str, config: kdl.ParseConfig?) -> kdl.Document
parser.print(config: kdl.PrintConfig?) -> str
kdl.Document(nodes: list[kdl.Node]?, printConfig: kdl.PrintConfig?)
doc.print(PrintConfig?) -> str
kdl.Node(name: str, tag: str?, args: list[Any]?, props: dict[str, Any]?, nodes: list[kdl.Node]?)
kdl.Binary(value: int, tag: str?)
kdl.Octal(value: int, tag: str?)
kdl.Decimal(mantissa: int|float, exponent: int?, tag: str?)
dec.value
: readonly,mantissa * (10**exponent)
kdl.Hex(value: int, tag: str?)
kdl.Bool(value: bool, tag: str?)
kdl.Null(tag: str?)
null.value
: readonly, alwaysNone
kdl.RawString(value: str, tag: str?)
kdl.String(value: str, tag: str?)
kdl.ExactValue(chars: str, tag: str?)
†kdl.Value
,kdl.Numberish
,kdl.Stringish
‡kdl.ParseConfig(...)
see above for optionskdl.parsing.defaults
: defaultParseConfig
kdl.PrintConfig(...)
see above for optionskdl.printing.defaults
: defaultPrintConfig
kdl.ParseError
: thrown for all parsing errorserror.msg: str
: hopefully informativeerror.line: int
: 1-indexederror.col: int
: 1-indexed
ParseFragment
: passed to converter functionspf.fragment
: slice from the source stringpf.error(msg: str)
returns akdl.ParseError
with error location set properly already
† Not produced by the parser.
Can be returned by a user's .to_kdl()
method
if they want to produce a value precisely in a particular syntax,
in a way that the built-in kdl-py classes don't.
‡ Not produced by the parser.
These are abstract base classes to help in type testing:
Value
matches all eight value classes,
Numberish
matches all four numeric value classes,
and Stringish
matches both string value classes.
kdlreformat
The kdlreformat
command-line program is installed by default
when you install this module from pypi.
It can also be run manually from the kdlreformat.py
file
at the root of this repository
(or from the kdl.cli.cli()
function)
usage: kdlreformat [-h] [--indent INDENT] [--semicolons] [--radix]
[--no-radix] [--raw-strings] [--no-raw-strings]
[--exponent EXPONENT]
[infile] [outfile]
KDL parser/printer, letting you easily reformat KDL files into a canonical
representation.
positional arguments:
infile
outfile
optional arguments:
-h, --help show this help message and exit
--indent INDENT How many spaces for each level of indent. -1 indicates
to indent with tabs.
--semicolons Whether to end nodes with semicolons or not.
--radix Output numeric values in the radix used by the input.
(0x1a outputs as 0x1a)
--no-radix Convert all numeric arguments to decimal. (0x1a outputs
as 26)
--raw-strings Output string values in the string type used by the
input.
--no-raw-strings Convert all string arguments into plain strings.
--exponent EXPONENT What character to use ('e' or 'E') for indicating
exponents on scinot numbers.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.