Skip to main content

Small library to read serialized protobuf(s) directly into Pandas DataFrame

Project description

read-protobuf

Small library to read serialized protobuf(s) directly into Pandas DataFrame.

This is intended to be a simple shortcut for translating serialized protobuf bytes / files directly to a dataframe.

Install

Available via pip:

$ pip install read-protobuf

Usage

Run the demo-notebook for an interactive demo.

import demo_pb2                             # compiled protobuf message module 
from read_protobuf import read_protobuf

MessageType = demo_pb2.MessageType()        # instantiate a new message type
df = read_protobuf(b'\x00\x00', MessageType)    # create a dataframe from serialized protobuf bytes
df = read_protobuf([b'\x00\x00', b'x00\x00'] MessageType)    # read multiple protobuf bytes

df = read_protobuf('demo.pb', MessageType)    # use file instead of bytes
df = read_protobuf(['demo.pb', 'demo2.pb'], MessageType)    # read multiple files

# options
df = read_protobuf('demo.pb', MessageType, flatten=False)    # don't flatten pb messages
df = read_protobuf('demo.pb', MessageType, prefix_nested=True)    # prefix nested messages with parent keys (like pandas.io.json.json_normalize)

To compile a protobuf Message class from python, use:

$ protoc --python_out="." demo.proto

Alternatives

protobuf-to-dict

https://github.com/benhodgson/protobuf-to-dict

This library was developed earlier to convert protobufs to JSON via a dict.

MessageToDict, MessageToJson

The google protobuf library comes with utilities to convert messages to a dict or JSON, then loaded by Pandas.

from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import MessageToDict

In brief tests, the read_protobuf package is about 2x as fast as using MessageToDict and 3x as fast as MessageToJson.

Develop

To install a development version of the package, run from the root directory:

$ pip install -e .
  • To install development dependencies, use the optional [dev]dependencies:
$ pip install -e ".[dev]"

Format

Uses black and isort to format files.

$ make black
$ make isort

Lint

Uses ruff to lint application.

$ make ruff

Test

Uses pytest to run unit tests. From the root of the repository, run:

$ make pytest

# specify test
$ pytest -k "TestRead::test_read_bytes"

Code Coverage

Use coverage to monitor code coverage during tests. To record coverage while running tests, run:

$ make pytest-cov

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

read-protobuf-0.2.0.tar.gz (5.6 kB view hashes)

Uploaded Source

Built Distribution

read_protobuf-0.2.0-py3-none-any.whl (4.7 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page