Skip to main content

Small library to read serialized protobuf(s) directly into Pandas DataFrame

Project description

read-protobuf

Small library to read serialized protobuf(s) directly into Pandas DataFrame.

This is intended to be a simple shortcut for translating serialized protobuf bytes / files directly to a dataframe.

Install

Available via pip:

$ pip install read-protobuf

Usage

Run the demo-notebook for an interactive demo.

import demo_pb2                             # compiled protobuf message module 
from read_protobuf import read_protobuf

MessageType = demo_pb2.MessageType()        # instantiate a new message type
df = read_protobuf(b'\x00\x00', MessageType)    # create a dataframe from serialized protobuf bytes
df = read_protobuf([b'\x00\x00', b'x00\x00'] MessageType)    # read multiple protobuf bytes

df = read_protobuf('demo.pb', MessageType)    # use file instead of bytes
df = read_protobuf(['demo.pb', 'demo2.pb'], MessageType)    # read multiple files

# options
df = read_protobuf('demo.pb', MessageType, flatten=False)    # don't flatten pb messages
df = read_protobuf('demo.pb', MessageType, prefix_nested=True)    # prefix nested messages with parent keys (like pandas.io.json.json_normalize)

To compile a protobuf Message class from python, use:

$ protoc --python_out="." demo.proto

Alternatives

protobuf-to-dict

https://github.com/benhodgson/protobuf-to-dict

This library was developed earlier to convert protobufs to JSON via a dict.

MessageToDict, MessageToJson

The google protobuf library comes with utilities to convert messages to a dict or JSON, then loaded by Pandas.

from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import MessageToDict

In brief tests, the read_protobuf package is about 2x as fast as using MessageToDict and 3x as fast as MessageToJson.

Develop

To install a development version of the package, run from the root directory:

$ pip install -e .
  • To install development dependencies, use the optional [dev]dependencies:
$ pip install -e ".[dev]"

Format

Uses black and isort to format files.

$ make black
$ make isort

Lint

Uses ruff to lint application.

$ make ruff

Test

Uses pytest to run unit tests. From the root of the repository, run:

$ make pytest

# specify test
$ pytest -k "TestRead::test_read_bytes"

Code Coverage

Use coverage to monitor code coverage during tests. To record coverage while running tests, run:

$ make pytest-cov

License

MIT License

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

read-protobuf-0.2.0.tar.gz (5.6 kB view details)

Uploaded Source

Built Distribution

read_protobuf-0.2.0-py3-none-any.whl (4.7 kB view details)

Uploaded Python 3

File details

Details for the file read-protobuf-0.2.0.tar.gz.

File metadata

  • Download URL: read-protobuf-0.2.0.tar.gz
  • Upload date:
  • Size: 5.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.11.3

File hashes

Hashes for read-protobuf-0.2.0.tar.gz
Algorithm Hash digest
SHA256 42755793bc107317bca4400851056f7e1c73376a8e90d748c2ed8bbc9c6372c2
MD5 cc41c5ad56efc964ae421384a16f31d6
BLAKE2b-256 6d0215b98644924d4d2d7b637ba6b6d8ca033d465415f66e00e157033caf4b4a

See more details on using hashes here.

File details

Details for the file read_protobuf-0.2.0-py3-none-any.whl.

File metadata

File hashes

Hashes for read_protobuf-0.2.0-py3-none-any.whl
Algorithm Hash digest
SHA256 643ff9dfc4185f7e5f89d4447c6e452e5d86cf95051ab3f95cd5c39aec7a3d79
MD5 980be6975bcdd9e0a31d42b48c767fb8
BLAKE2b-256 9d1b46c10517f5fe91298be9f4bc2f185a87deb691b7f3b10401c55aa85772c5

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page