Skip to main content

No project description provided

Project description

Actions Status PyPI Bytewax User Guide

Bytewax is an open source Python framework for building highly scalable dataflows in a streaming or batch context.

Get started

Check out our getting started guide.

About

Bytewax lets you build Python based dataflows to process your data for augmentation, advanced analysis, machine learning and more. It is based on Timely Dataflow, which is a cyclic dataflow computational model. At a high-level, dataflow programming is a programming paradigm where program execution is conceptualized as data flowing through a series of operator based steps. Operators are the processing primitives of bytewax. Each of them gives you a “shape” of data transformation, and you give them functions to customize them to a specific task you need. The combination of each operator and their custom logic functions we call a dataflow step. You chain together steps in a dataflow to solve your high-level data processing problem.

At a high level, Bytewax provides a few major benefits:

  • The operators in Bytewax are largely “data-parallel”, meaning they can operate on independent parts of the data concurrently.
  • Bytewax offers the ability to express higher-level control constructs, like iteration.
  • Bytewax allows you to develop and run your code locally, and then easily scale that code to multiple workers or processes without changes.
  • Bytewax can be used in both a streaming and batch context
  • Ability to leverage the Python ecosystem directly

Community

Slack Is the main forum for communication and discussion.

GitHub Issues is reserved only for actual issues. Please use the slack community for discussions.

Code of Conduct

Usage

Install the latest release with pip:

pip install bytewax

Example

Here is an example of a simple dataflow program using Bytewax:

from bytewax import Dataflow, run


flow = Dataflow()
flow.map(lambda x: x * x)
flow.capture()


if __name__ == "__main__":
    for epoch, x in sorted(run(flow, enumerate(range(10)))):
        print(x)

Running the program prints the following output:

0
1
4
9
16
25
36
49
64
81

For a more complete example, and documentation on the available operators, check out the User Guide.

For an exhaustive list of examples, checkout the /examples folder

License

Bytewax is licensed under the Apache-2.0 license.

Contributing

Contributions are welcome! This community and project would not be what it is without the contributors. All contributions, from bug reports to new features, are welcome and encouraged.



With ❤️ Bytewax

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Built Distributions

bytewax-0.10.0-cp310-none-win_amd64.whl (3.2 MB view hashes)

Uploaded cp310

bytewax-0.10.0-cp39-none-win_amd64.whl (3.2 MB view hashes)

Uploaded cp39

bytewax-0.10.0-cp38-none-win_amd64.whl (3.2 MB view hashes)

Uploaded cp38

bytewax-0.10.0-cp37-none-win_amd64.whl (3.2 MB view hashes)

Uploaded cp37

Supported by

AWS AWS Cloud computing Datadog Datadog Monitoring Facebook / Instagram Facebook / Instagram PSF Sponsor Fastly Fastly CDN Google Google Object Storage and Download Analytics Huawei Huawei PSF Sponsor Microsoft Microsoft PSF Sponsor NVIDIA NVIDIA PSF Sponsor Pingdom Pingdom Monitoring Salesforce Salesforce PSF Sponsor Sentry Sentry Error logging StatusPage StatusPage Status page