Skip to main content

Pathway is a data processing framework which takes care of streaming data updates for you.

Project description



Linux macOS License: BSL
chat on Discord follow on Twitter follow on LinkedIn
Getting Started | Example | Performance | Deployment | Resources | Documentation | Blog | Get Help

Pathway

Pathway is an open framework for high-throughput and low-latency real-time data processing. It is used to create Python code which seamlessly combines batch processing, streaming, and real-time API's for LLM apps. Pathway's distributed runtime (🦀-🐍) provides fresh results of your data pipelines whenever new inputs and requests are received.

Pathway is an incremental data stream processing engine

In the first place, Pathway was designed to be a life-saver (or at least a time-saver) for Python developers and ML/AI engineers faced with live data sources, where you need to react quickly to fresh data. Still, Pathway is a powerful tool that can be used for a lot of things. If you want to do streaming in Python, build an AI data pipeline, or if you are looking for your next Python data processing framework, keep reading.

Pathway provides a high-level programming interface in Python for defining data transformations, aggregations, and other operations on data streams. With Pathway, you can effortlessly design and deploy sophisticated data workflows that efficiently handle high volumes of data in real time.

Pathway is interoperable with various data sources and sinks such as Kafka, CSV files, SQL/noSQL databases, and REST API's, allowing you to connect and process data from different storage systems.

Typical use-cases of Pathway include realtime data processing, ETL (Extract, Transform, Load) pipelines, data analytics, monitoring, anomaly detection, and recommendation. Pathway can also independently provide the backbone of a light LLMops stack for real-time LLM applications.

In Pathway, data is represented in the form of Tables. Live data streams are also treated as Tables. The library provides a rich set of operations like filtering, joining, grouping, and windowing.

For any questions, you will find the community and team behind the project on Discord.

Screencast animation of converting batch code to streaming by changing one keyword argument in the script.

Getting started

Installation

Pathway requires Python 3.10 or above.

You can install the current release of Pathway using pip:

$ pip install -U pathway

⚠️ Pathway is available on MacOS and Linux. Users of other systems should run Pathway on a Virtual Machine.

Running Pathway locally

To use Pathway, you only need to import it:

import pathway as pw

Now, you can easily create your processing pipeline, and let Pathway handle the updates. Once your pipeline is created, you can launch the computation on streaming data with a one-line command:

pw.run()

You can then run your Pathway project (say, main.py) just like a normal Python script: $ python main.py. Alternatively, use the pathway'ish version:

$ pathway spawn python main.py

Pathway natively supports multithreading. To launch your application with 3 threads, you can do as follows:

$ pathway spawn --threads 3 python main.py

To jumpstart a Pathway project, you can use our cookiecutter template.

Example

import pathway as pw

# Using the `demo` module to create a data stream
table = pw.demo.range_stream(nb_rows=50)
# Storing the stream into a CSV file
pw.io.csv.write(table, "output_table.csv")

# Summing all the values in a new table
sum_table = table.reduce(sum=pw.reducers.sum(pw.this.value))
# Storing the sum (which is a stream) in another CSV file
pw.io.csv.write(sum_table, "sum_table.csv")

# Now that the pipeline is built, the computation is started
pw.run()

Run this example in Google Colab!

Deployment

Do you feel limited by a local run? If you want to scale your Pathway application, you may be interested in our Pathway for Enterprise. Pathway for Enterprise is specially tailored towards end-to-end data processing and real time intelligent analytics. It scales using distributed computing on the cloud and supports Kubernetes deployment.

You can learn more about the features of Pathway for Enterprise on our website.

If you are interested, don't hesitate to contact us to learn more.

Monitoring Pathway

Pathway comes with a monitoring dashboard that allows you to keep track of the number of messages sent by each connector and the latency of the system. The dashboard also includes log messages.

This dashboard is enabled by default; you can disable it by passing monitoring_level = pathway.MonitoringLevel.NONE to pathway.run().

Pathway dashboard

In addition to Pathway's built-in dashboard, you can use Prometheus to monitor your Pathway application.

Resources

See also: 📖 Pathway Documentation webpage (including API Docs).

Videos about Pathway

▶️ Building an LLM Application without a vector database - by Jan Chorowski (7min 56s)

▶️ Linear regression on a Kafka Stream - by Richard Pelgrim (7min 53s)

▶️ Introduction to reactive data processing - by Adrian Kosowski (27min 54s)

Guides

Tutorials

Showcases

External and community content

If you would like to share with us some Pathway-related content, please give an admin a shout on Discord.

Manul conventions

Manuls (aka Pallas's Cats) are creatures with fascinating habits. As a tribute to them, we usually read pw, one of the most frequent tokens in Pathway code, as: "paw".

manul

Performance

Pathway is made to outperform state-of-the-art technologies designed for streaming and batch data processing tasks, including: Flink, Spark, and Kafka Streaming. It also makes it possible to implement a lot of algorithms/UDF's in streaming mode which are not readily supported by other streaming frameworks (especially: temporal joins, iterative graph algorithms, machine learning routines).

If you are curious, here are some benchmarks to play with.

WordCount Graph

If you try your own benchmarks, please don't hesitate to let us know. We investigate situations in which Pathway is underperforming on par with bugs (i.e., to our knowledge, they shouldn't happen...).

Coming soon

Here are some features we plan to incorporate in the near future:

  • Enhanced monitoring, observability, and data drift detection (integrates with Grafana visualization and other dashboarding tools).
  • New connectors: interoperability with Delta Lake and Snowflake data sources.
  • Easier connection setup for MongoDB.
  • More performant garbage collection.

Dependencies

Pathway is made to run in a "clean" Linux/MacOS + Python environment. When installing the pathway package with pip (from a wheel), you are likely to encounter a small number of Python package dependencies, such as sqlglot (used in the SQL API) and python-sat (useful for resolving dependencies during compilation). All necessary Rust crates are pre-built; the Rust compiler is not required to install Pathway, unless building from sources. A modified version of Timely/Differential Dataflow (which provides a dataflow assembly layer) is part of this repo.

License

Pathway is distributed on a BSL 1.1 License which allows for unlimited non-commercial use, as well as use of the Pathway package for most commercial purposes, free of charge. Code in this repository automatically converts to Open Source (Apache 2.0 License) after 4 years. Some public repos which are complementary to this one (examples, libraries, connectors, etc.) are licensed as Open Source, under the MIT license.

Contribution guidelines

If you develop a library or connector which you would like to integrate with this repo, we suggest releasing it first as a separate repo on a MIT/Apache 2.0 license.

For all concerns regarding core Pathway functionalities, Issues are encouraged. For further information, don't hesitate to engage with Pathway's Discord community.

Get Help

If you have any questions, issues, or just want to chat about Pathway, we're here to help! Feel free to:

Our team is always happy to help you and ensure that you get the most out of Pathway. If you would like to better understand how best to use Pathway in your project, please don't hesitate to reach out to us.

Project details


Release history Release notifications | RSS feed

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

pathway-0.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl (15.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ x86-64

pathway-0.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl (14.1 MB view details)

Uploaded CPython 3.10+manylinux: glibc 2.17+ ARM64

pathway-0.7.6-cp310-abi3-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl (27.1 MB view details)

Uploaded CPython 3.10+macOS 10.15+ universal2 (ARM64, x86-64)macOS 10.15+ x86-64macOS 11.0+ ARM64

File details

Details for the file pathway-0.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl.

File metadata

File hashes

Hashes for pathway-0.7.6-cp310-abi3-manylinux_2_17_x86_64.manylinux2014_x86_64.whl
Algorithm Hash digest
SHA256 2f12c208bea532ad39bb8e9856b0d7e05562bac2d4606fb2e5197586710680a5
MD5 a47bbbd97219c0b319e2aebb6cb2c0f6
BLAKE2b-256 5ab765c1f90fc2a9926d9786079c50cfd30de0b07b01f7943f04ea44d76cb562

See more details on using hashes here.

File details

Details for the file pathway-0.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl.

File metadata

File hashes

Hashes for pathway-0.7.6-cp310-abi3-manylinux_2_17_aarch64.manylinux2014_aarch64.whl
Algorithm Hash digest
SHA256 08e61f24ef87312fb3477e6b02912ed393811f8e4b03611afed883cda3a3ddd7
MD5 ee1197568f8d677bf779468538b5b206
BLAKE2b-256 c97e381a264ad14835f57ced32ba1c9d66111d5141c181229f819c7091c4865d

See more details on using hashes here.

File details

Details for the file pathway-0.7.6-cp310-abi3-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl.

File metadata

File hashes

Hashes for pathway-0.7.6-cp310-abi3-macosx_10_15_x86_64.macosx_11_0_arm64.macosx_10_15_universal2.whl
Algorithm Hash digest
SHA256 7898a01ff5a45f9e43681077098049738606a66ef05534063ac97a1798d981d5
MD5 ccd24a6f744f0ed4a70a04c1e1368f45
BLAKE2b-256 b2b70e94a6bd089117748104e601597a89c80b8566aa915bc18ff834f4e9bc98

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page