Skip to main content

Speed up PyYAML by automatically enabling LibYAML bindings.

Project description

pylibyaml

pylibyaml is a simple Python module that monkey patches PyYAML to automatically enable the fast LibYAML-based parser and emitter if they are installed.

Installation

To install, run:

pip install pylibyaml

There is no explicit requirement for PyYAML or LibYAML to be installed in advance, but this package will be useless without them. Please refer to the PyYAML installation documentation, especially the points about installing the LibYAML bindings.

Usage

Run import pylibyaml BEFORE import yaml, and enjoy!

import pylibyaml
import yaml

yaml.safe_load(stream)
yaml.load(stream, Loader=yaml.SafeLoader)

yaml.safe_dump(data)
yaml.dump(data, Dumper=yaml.SafeDumper)

Most existing code should run without modification. Any references to yaml.Loader and yaml.Dumper (including Safe, Unsafe, and Full flavors) will automatically point to their yaml.cyaml.CLoader and yaml.cyaml.CDumper equivalents. The convenience methods (safe_load, safe_dump, etc.) will all use the C classes, as well as the methods for adding resolvers, constructors, or representers. Objects that inherit from YAMLObject should work as intended.

Details

Background

PyYAML is the canonical YAML parser and emitter library for Python. It is not particularly fast.

LibYAML is a C library for parsing and emitting YAML. It is very fast.

By default, the setup.py script for PyYAML checks whether LibYAML is installed and if so, builds and installs LibYAML bindings.

For the bindings to actually be used, they need to be explicitly selected. The PyYAML documentation suggests some variations of the following:

When LibYAML bindings are installed, you may use fast LibYAML-based parser and emitter as follows:

>>> yaml.load(stream, Loader=yaml.CLoader)
>>> yaml.dump(data, Dumper=yaml.CDumper)

In order to use LibYAML based parser and emitter, use the classes CParser and CEmitter. For instance,

from yaml import load, dump
try:
    from yaml import CLoader as Loader, CDumper as Dumper
except ImportError:
    from yaml import Loader, Dumper
# ...
data = load(stream, Loader=Loader)
# ...
output = dump(data, Dumper=Dumper)    

This approach is repetitive, inconvenient, and ineffectual when dealing with third-party libraries that also use PyYAML.

Implementation

The approach taken by pylibyaml is to rebind the global names of the Loaders and Dumpers in the yaml module to the LibYAML versions if they are available, before the various functions and classes are defined.

For example, compare the following.

Without pylibyaml:

>>> import yaml
>>> yaml.Loader
<class 'yaml.loader.Loader'>
>>> yaml.Dumper
<class 'yaml.dumper.Dumper'>
>>> help(yaml.dump)
Help on function dump in module yaml:

dump(data, stream=None, Dumper=<class 'yaml.dumper.Dumper'>, **kwds)
    Serialize a Python object into a YAML stream.
    If stream is None, return the produced string instead.

Using pylibyaml (with LibYAML bindings available):

>>> import pylibyaml
>>> import yaml
>>> yaml.Loader
<class 'yaml.cyaml.CLoader'>
>>> yaml.Dumper
<class 'yaml.cyaml.CDumper'>
>>> help(yaml.dump)
Help on function dump in module yaml:

dump(data, stream=None, Dumper=<class 'yaml.cyaml.CDumper'>, **kwds)
    Serialize a Python object into a YAML stream.
    If stream is None, return the produced string instead.

Note that the top-level names now point to the cyaml versions, and that the default function arguments have changed.

The code samples above will still run without modification, but the second can be simplified - the logic of determining the best loader and dumper is not longer required.

import pylibyaml
from yaml import load, dump
from yaml import Loader, Dumper
# ...
data = load(stream, Loader=Loader)
# ...
output = dump(data, Dumper=Dumper)

Caveats

This is a rather ugly hack.

In order need to rebind the names of the default loaders and dumpers prior to the function and class definitions in PyYAML's __init__.py, we use inspect to get the source, edit it, and reload it with importlib. This works for now (the current version of PyYAML is 5.3.1), and as far back as 3.11, but it may not always.

LibYAML and PyYAML are not 100% interchangeable.

From the PyYAML docs:

Note that there are some subtle (but not really significant) differences between pure Python and LibYAML based parsers and emitters.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylibyaml-0.1.0.tar.gz (4.7 kB view details)

Uploaded Source

File details

Details for the file pylibyaml-0.1.0.tar.gz.

File metadata

  • Download URL: pylibyaml-0.1.0.tar.gz
  • Upload date:
  • Size: 4.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.24.0 setuptools/50.3.0 requests-toolbelt/0.9.1 tqdm/4.49.0 CPython/3.7.5

File hashes

Hashes for pylibyaml-0.1.0.tar.gz
Algorithm Hash digest
SHA256 3b58dea061903c04689e35fab63ec149f7cf5e82f0808bd3425fb3ab3950623e
MD5 825ea77151d7e91e3d5f2143712ece50
BLAKE2b-256 c91a3ae773a0d4cc0b787d1b7307786c666de0729df2c4159ec964e8ba45d06d

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page