Small library to read serialized protobuf(s) directly into Pandas DataFrame
Project description
read-protobuf
Small library to read serialized protobuf(s) directly into Pandas DataFrame.
This is intended to be a simple shortcut for translating serialized protobuf bytes / files directly to a dataframe.
Install
Available via pip:
$ pip install read-protobuf
Usage
Run the demo-notebook for an interactive demo.
import demo_pb2 # compiled protobuf message module
from read_protobuf import read_protobuf
MessageType = demo_pb2.MessageType() # instantiate a new message type
df = read_protobuf(b'\x00\x00', MessageType) # create a dataframe from serialized protobuf bytes
df = read_protobuf([b'\x00\x00', b'x00\x00'] MessageType) # read multiple protobuf bytes
df = read_protobuf('demo.pb', MessageType) # use file instead of bytes
df = read_protobuf(['demo.pb', 'demo2.pb'], MessageType) # read multiple files
# options
df = read_protobuf('demo.pb', MessageType, flatten=False) # don't flatten pb messages
df = read_protobuf('demo.pb', MessageType, prefix_nested=True) # prefix nested messages with parent keys (like pandas.io.json.json_normalize)
To compile a protobuf Message class from python, use:
$ protoc --python_out="." demo.proto
Alternatives
protobuf-to-dict
https://github.com/benhodgson/protobuf-to-dict
This library was developed earlier to convert protobufs to JSON via a dict.
MessageToDict, MessageToJson
The google protobuf library comes with utilities to convert messages to a dict
or JSON,
then loaded by Pandas.
from google.protobuf.json_format import MessageToJson
from google.protobuf.json_format import MessageToDict
In brief tests, the read_protobuf
package is about 2x as fast
as using MessageToDict
and 3x as fast as MessageToJson
.
Develop
To install a development version of the package, run from the root directory:
$ pip install -e .
- To install development dependencies, use the optional
[dev]
dependencies:
$ pip install -e ".[dev]"
Format
Uses black
and isort
to format files.
$ make black
$ make isort
Lint
Uses ruff
to lint application.
$ make ruff
Test
Uses pytest
to run unit tests. From the root of the repository, run:
$ make pytest
# specify test
$ pytest -k "TestRead::test_read_bytes"
Code Coverage
Use coverage
to monitor code coverage during tests.
To record coverage while running tests, run:
$ make pytest-cov
License
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file read-protobuf-0.2.0.tar.gz
.
File metadata
- Download URL: read-protobuf-0.2.0.tar.gz
- Upload date:
- Size: 5.6 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 42755793bc107317bca4400851056f7e1c73376a8e90d748c2ed8bbc9c6372c2 |
|
MD5 | cc41c5ad56efc964ae421384a16f31d6 |
|
BLAKE2b-256 | 6d0215b98644924d4d2d7b637ba6b6d8ca033d465415f66e00e157033caf4b4a |
File details
Details for the file read_protobuf-0.2.0-py3-none-any.whl
.
File metadata
- Download URL: read_protobuf-0.2.0-py3-none-any.whl
- Upload date:
- Size: 4.7 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.11.3
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 643ff9dfc4185f7e5f89d4447c6e452e5d86cf95051ab3f95cd5c39aec7a3d79 |
|
MD5 | 980be6975bcdd9e0a31d42b48c767fb8 |
|
BLAKE2b-256 | 9d1b46c10517f5fe91298be9f4bc2f185a87deb691b7f3b10401c55aa85772c5 |