PathsData Distributed Query Engine - Python client for distributed SQL execution
Project description
PathsData Distributed
PathsData Distributed Query Engine - Python client for distributed SQL execution.
This project is versioned and released independently from the main Ballista project and is intentionally not part of the default Cargo workspace so that it doesn't cause overhead for maintainers of the main Ballista codebase.
Creating a SessionContext
[!IMPORTANT] Current approach is to support datafusion python API, there are know limitations of current approach, with some cases producing errors. We trying to come up with the best approach to support datafusion python interface. More details could be found at #1142
Creates a new context and connects to a Ballista scheduler process.
from pathsdata_distributed import BallistaBuilder
>>> ctx = BallistaBuilder().standalone()
Example SQL Usage
>>> ctx.sql("create external table t stored as parquet location './testdata/test.parquet'")
>>> df = ctx.sql("select * from t limit 5")
>>> pyarrow_batches = df.collect()
Example DataFrame Usage
>>> df = ctx.read_parquet('./testdata/test.parquet').limit(5)
>>> pyarrow_batches = df.collect()
Scheduler and Executor
Scheduler and executors can be configured and started from python code.
To start scheduler:
from pathsdata_distributed import BallistaScheduler
scheduler = BallistaScheduler()
scheduler.start()
scheduler.wait_for_termination()
For executor:
from pathsdata_distributed import BallistaExecutor
executor = BallistaExecutor()
executor.start()
executor.wait_for_termination()
Development Process
Creating Virtual Environment
python3 -m venv venv
source venv/bin/activate
pip3 install -r requirements.txt
Building
maturin develop
Note that you can also run maturin develop --release to get a release build locally.
Testing
python3 -m pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file pathsdata_distributed-43.0.0.tar.gz.
File metadata
- Download URL: pathsdata_distributed-43.0.0.tar.gz
- Upload date:
- Size: 47.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
a0eb0f566d8687f9a4a7741d0b2c163bd9c68352def15272328a4a767dd3c4d1
|
|
| MD5 |
c19b5efa9c9c428f946365248f87f8de
|
|
| BLAKE2b-256 |
1a0f3365f5c199219d8ed05641b8392701cd8294a25ccab4a31c1120dd8ae573
|
File details
Details for the file pathsdata_distributed-43.0.0-cp38-abi3-manylinux_2_39_x86_64.whl.
File metadata
- Download URL: pathsdata_distributed-43.0.0-cp38-abi3-manylinux_2_39_x86_64.whl
- Upload date:
- Size: 55.2 MB
- Tags: CPython 3.8+, manylinux: glibc 2.39+ x86-64
- Uploaded using Trusted Publishing? No
- Uploaded via: maturin/1.9.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9324c4de6942fb67c29f065407f0650e3bb7e2258449fe5cde9518636b8ff1cb
|
|
| MD5 |
56272f5b2ee5eb529288054eab316b5c
|
|
| BLAKE2b-256 |
8d101722f5d554b745503364f8d990fe57b7adae9a238823ee3f63b561354d2d
|