Speed up PyYAML by automatically enabling LibYAML bindings.
Project description
pylibyaml
pylibyaml is a simple Python module that monkey patches PyYAML to automatically enable the fast LibYAML-based parser and emitter if they are installed.
Installation
To install, run:
pip install pylibyaml
There is no explicit requirement for PyYAML or LibYAML to be installed in advance, but this package will be useless without them. Please refer to the PyYAML installation documentation, especially the points about installing the LibYAML bindings.
Usage
Run import pylibyaml
BEFORE import yaml
, and enjoy!
import pylibyaml
import yaml
yaml.safe_load(stream)
yaml.load(stream, Loader=yaml.SafeLoader)
yaml.safe_dump(data)
yaml.dump(data, Dumper=yaml.SafeDumper)
Most existing code should run without modification. Any references to
yaml.Loader
and yaml.Dumper
(including Safe
, Unsafe
, and Full
flavors) will automatically point to their yaml.cyaml.CLoader
and
yaml.cyaml.CDumper
equivalents. The convenience methods (safe_load
,
safe_dump
, etc.) will all use the C classes, as well as the methods
for adding resolvers, constructors, or representers. Objects that
inherit from YAMLObject
should work as intended.
Details
Background
PyYAML is the canonical YAML parser and emitter library for Python. It is not particularly fast.
LibYAML is a C library for parsing and emitting YAML. It is very fast.
By default, the setup.py script for PyYAML checks whether LibYAML is installed and if so, builds and installs LibYAML bindings.
For the bindings to actually be used, they need to be explicitly selected. The PyYAML documentation suggests some variations of the following:
When LibYAML bindings are installed, you may use fast LibYAML-based parser and emitter as follows:
>>> yaml.load(stream, Loader=yaml.CLoader)
>>> yaml.dump(data, Dumper=yaml.CDumper)
In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter. For instance,
from yaml import load, dump
try:
from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
from yaml import Loader, Dumper
# ...
data = load(stream, Loader=Loader)
# ...
output = dump(data, Dumper=Dumper)
This approach is repetitive, inconvenient, and ineffectual when dealing with third-party libraries that also use PyYAML.
Implementation
The approach taken by pylibyaml
is to rebind the global names of the
Loaders and Dumpers in the yaml
module to the LibYAML versions if they
are available, before the various functions and classes are defined.
For example, compare the following.
Without pylibyaml:
>>> import yaml
>>> yaml.Loader
<class 'yaml.loader.Loader'>
>>> yaml.Dumper
<class 'yaml.dumper.Dumper'>
>>> help(yaml.dump)
Help on function dump in module yaml:
dump(data, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, **kwds)
Serialize a Python object into a YAML stream.
If stream is None, return the produced string instead.
Using pylibyaml (with LibYAML bindings available):
>>> import pylibyaml
>>> import yaml
>>> yaml.Loader
<class 'yaml.cyaml.CLoader'>
>>> yaml.Dumper
<class 'yaml.cyaml.CDumper'>
>>> help(yaml.dump)
Help on function dump in module yaml:
dump(data, stream=None, Dumper=<class 'yaml.cyaml.CDumper'>, **kwds)
Serialize a Python object into a YAML stream.
If stream is None, return the produced string instead.
Note that the top-level names now point to the cyaml versions, and that the default function arguments have changed.
The code samples above will still run without modification, but the second can be simplified - the logic of determining the best loader and dumper is not longer required.
import pylibyaml
from yaml import load, dump
from yaml import Loader, Dumper
# ...
data = load(stream, Loader=Loader)
# ...
output = dump(data, Dumper=Dumper)
Caveats
This is a rather ugly hack.
In order need to rebind the names of the default loaders and dumpers
prior to the function and class definitions in PyYAML's __init__.py
,
we use inspect
to get the source, edit it, and reload it with
importlib
. This works for now (the current version of PyYAML is
5.3.1), and as far back as 3.11, but it may not always.
LibYAML and PyYAML are not 100% interchangeable.
From the PyYAML docs:
Note that there are some subtle (but not really significant) differences between pure Python and LibYAML based parsers and emitters.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file pylibyaml-0.1.0.tar.gz
.
File metadata
- Download URL: pylibyaml-0.1.0.tar.gz
- Upload date:
- Size: 4.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 3b58dea061903c04689e35fab63ec149f7cf5e82f0808bd3425fb3ab3950623e |
|
MD5 | 825ea77151d7e91e3d5f2143712ece50 |
|
BLAKE2b-256 | c91a3ae773a0d4cc0b787d1b7307786c666de0729df2c4159ec964e8ba45d06d |