Skip to main content

Python bindings and extensions for Velox

Project description

Velox logo

Velox is a composable execution engine distributed as an open source C++ library. It provides reusable, extensible, and high-performance data processing components that can be (re-)used to build data management systems focused on different analytical workloads, including batch, interactive, stream processing, and AI/ML. Velox was created by Meta and it is currently developed in partnership with IBM/Ahana, Intel, Voltron Data, Microsoft, ByteDance and many other companies.

In common usage scenarios, Velox takes a fully optimized query plan as input and performs the described computation. Considering Velox does not provide a SQL parser, a dataframe layer, or a query optimizer, it is usually not meant to be used directly by end-users; rather, it is mostly used by developers integrating and optimizing their compute engines.

Velox provides the following high-level components:

  • Type: a generic typing system that supports scalar, complex, and nested types, such as structs, maps, arrays, etc.
  • Vector: an Arrow-compatible columnar memory layout module, providing encodings such as Flat, Dictionary, Constant, and Sequence/RLE, in addition to a lazy materialization pattern and support for out-of-order writes.
  • Expression Eval: a fully vectorized expression evaluation engine that allows expressions to be efficiently executed on top of Vector/Arrow encoded data.
  • Functions: sets of vectorized scalar, aggregates, and window functions implementations following the Presto and Spark semantic.
  • Operators: implementation of relational operators such as scans, writes, projections, filtering, grouping, ordering, shuffle/exchange, hash, merge, and nested loop joins, unnest, and more.
  • I/O: a connector interface for extensible data sources and sinks, supporting different file formats (ORC/DWRF, Parquet, Nimble), and storage adapters (S3, HDFS, GCS, ABFS, local files) to be used.
  • Network Serializers: an interface where different wire protocols can be implemented, used for network communication, supporting PrestoPage and Spark's UnsafeRow.
  • Resource Management: a collection of primitives for handling computational resources, such as memory arenas and buffer management, tasks, drivers, and thread pools for CPU and thread execution, spilling, and caching.

Velox is extensible and allows developers to define their own engine-specific specializations, including:

  1. Custom types
  2. Simple and vectorized functions
  3. Aggregate functions
  4. Window functions
  5. Operators
  6. File formats
  7. Storage adapters
  8. Network serializers

Examples

Examples of extensibility and integration with different component APIs can be found here

Documentation

Developer guides detailing many aspects of the library, in addition to the list of available functions can be found here.

Blog posts are available here.

Community

Velox is an open source project supported by a community of individual contributors and organizations. The project's technical governance mechanics is described in this document..

Project maintainers are listed here.

The main communication channel with the Velox OSS community is through the the Velox-OSS Slack workspace, github Issues, and Discussions.

For access to the Velox Slack workspace, please add a comment to this Discussion

Contributing

Check our contributing guide to learn about how to contribute to the project.

License

Velox is licensed under the Apache 2.0 License. A copy of the license can be found here.

Getting Started

Get the Velox Source

git clone https://github.com/facebookincubator/velox.git
cd velox

Once Velox is checked out, the first step is to install the dependencies. Details on the dependencies and how Velox manages some of them for you can be found here.

Velox also provides the following scripts to help developers setup and install Velox dependencies for a given platform.

Setting up dependencies

The following setup scripts use the DEPENDENCY_DIR environment variable to set the location to download and build packages. This defaults to deps-download in the current working directory.

Use INSTALL_PREFIX to set the install directory of the packages. This defaults to deps-install in the current working directory on macOS and to the default install location (eg. /usr/local) on linux. Using the default install location /usr/local on macOS is discouraged since this location is used by certain Homebrew versions.

Manually add the INSTALL_PREFIX value in the IDE or bash environment, say export INSTALL_PREFIX=/Users/$USERNAME/velox/deps-install to ~/.zshrc so that subsequent Velox builds can use the installed packages.

You can reuse DEPENDENCY_INSTALL and INSTALL_PREFIX for Velox clients such as Prestissimo by specifying a common shared directory.`

The build parallelism for dependencies can be controlled by the BUILD_THREADS environment variable and overrides the default number of parallel processes used for compiling and linking. The default value is the number of cores on your machine. This is useful if your machine has lots of cores but no matching memory to process all compile and link processes in parallel resulting in OOM kills by the kernel.

Setting up on macOS

On a macOS machine (either Intel or Apple silicon) you can setup and then build like so:

$ ./scripts/setup-macos.sh
$ make

With macOS 14.4 and XCode 15.3 where m4 is missing, you can either

  1. install m4 via brew:
$ brew install m4
$ export PATH=/opt/homebrew/opt/m4/bin:$PATH
  1. or use gm4 instead:
$ M4=/usr/bin/gm4 make

Setting up on Ubuntu (20.04 or later)

The supported architectures are x86_64 (avx, sse), and AArch64 (apple-m1+crc, neoverse-n1). You can build like so:

$ ./scripts/setup-ubuntu.sh
$ make

Setting up on Centos 9 Stream with adapters

Velox adapters include file-systems such as AWS S3, Google Cloud Storage, and Azure Blob File System. These adapters require installation of additional libraries. Once you have checked out Velox, you can setup and build like so:

$ ./scripts/setup-centos9.sh
$ ./scripts/setup-adapters.sh
$ make

Note that setup-adapters.sh supports macOS and Ubuntu 20.04 or later.

Using Clang on Linux

Clang 15 can be additionally installed during the setup step for Ubuntu 22.04/24.04 and CentOS 9 by setting the USE_CLANG environment variable prior to running the platform specific setup script.

$ export USE_CLANG=true

This will install and use Clang 15 to build the dependencies instead of using the default GCC compiler.

Once completed, and before running any make command, set the compiler to be used:

$ export CC=/usr/bin/clang-15
$ export CXX=/usr/bin/clang++-15
$ make

Building Velox

Run make in the root directory to compile the sources. For development, use make debug to build a non-optimized debug version, or make release to build an optimized version. Use make unittest to build and run tests.

Note that,

  • Velox requires a compiler at the minimum GCC 11.0 or Clang 15.0.
  • Velox requires the CPU to support instruction sets:
    • bmi
    • bmi2
    • f16c
  • Velox tries to use the following (or equivalent) instruction sets where available:
    • On Intel CPUs
      • avx
      • avx2
      • sse
    • On ARM
      • Neon
      • Neon64

Build metrics for Velox are published at https://facebookincubator.github.io/velox/bm-report/

Building Velox with docker-compose

If you don't want to install the system dependencies required to build Velox, you can also build and run tests for Velox on a docker container using docker-compose. Use the following commands:

$ docker-compose build ubuntu-cpp
$ docker-compose run --rm ubuntu-cpp

If you want to increase or decrease the number of threads used when building Velox you can override the NUM_THREADS environment variable by doing:

$ docker-compose run -e NUM_THREADS=<NUM_THREADS_TO_USE> --rm ubuntu-cpp

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyvelox-0.2.0.tar.gz (13.2 MB view details)

Uploaded Source

Built Distributions

pyvelox-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl (69.4 MB view details)

Uploaded CPython 3.13 manylinux: glibc 2.28+ x86-64

pyvelox-0.2.0-cp313-cp313-macosx_14_0_arm64.whl (38.5 MB view details)

Uploaded CPython 3.13 macOS 14.0+ ARM64

pyvelox-0.2.0-cp313-cp313-macosx_13_0_x86_64.whl (44.7 MB view details)

Uploaded CPython 3.13 macOS 13.0+ x86-64

pyvelox-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl (69.4 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

pyvelox-0.2.0-cp312-cp312-macosx_14_0_arm64.whl (38.5 MB view details)

Uploaded CPython 3.12 macOS 14.0+ ARM64

pyvelox-0.2.0-cp312-cp312-macosx_13_0_x86_64.whl (44.7 MB view details)

Uploaded CPython 3.12 macOS 13.0+ x86-64

pyvelox-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl (69.4 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

pyvelox-0.2.0-cp311-cp311-macosx_14_0_arm64.whl (38.5 MB view details)

Uploaded CPython 3.11 macOS 14.0+ ARM64

pyvelox-0.2.0-cp311-cp311-macosx_13_0_x86_64.whl (44.7 MB view details)

Uploaded CPython 3.11 macOS 13.0+ x86-64

pyvelox-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl (69.4 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

pyvelox-0.2.0-cp310-cp310-macosx_14_0_arm64.whl (38.5 MB view details)

Uploaded CPython 3.10 macOS 14.0+ ARM64

pyvelox-0.2.0-cp310-cp310-macosx_13_0_x86_64.whl (44.7 MB view details)

Uploaded CPython 3.10 macOS 13.0+ x86-64

File details

Details for the file pyvelox-0.2.0.tar.gz.

File metadata

  • Download URL: pyvelox-0.2.0.tar.gz
  • Upload date:
  • Size: 13.2 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pyvelox-0.2.0.tar.gz
Algorithm Hash digest
SHA256 090e9ef3fcfd35f8dd9a44b50e6f8e498f68865501dec1cdd36afa01d2c486fc
MD5 a5fd87f9a59fe9c64b11776df2340a4f
BLAKE2b-256 24f8191b530d92434f5ba4f3136924848ecfcf76ece1be9a0085c979a1174c82

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp313-cp313-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 d127d541ad37192dd8c62502e11ba6590a4a3a94add57e965b10c11dacde57aa
MD5 51aa462989d122e0b07cade86e5da9f5
BLAKE2b-256 872277253ec1d353d27d2b2087508100e6e9d157e814baf01242ce632487e2d4

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp313-cp313-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp313-cp313-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 fca3bc68832539f7631a3ba9ac6fd5ede03fd8421158a50d5a71d03c729d0cda
MD5 8236a49d88dc3c53552ab5877995a3b8
BLAKE2b-256 c21aabeb3c321df4c81a07d0ce9092aab2bda92104e0193f52478428f7f3d760

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp313-cp313-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp313-cp313-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 b66ada75ab22ee490cdd290ce1bac8b1ce5751ccb8a43a667ffa6b8b5d2e9853
MD5 47b67999a1192487ab8223002af1a237
BLAKE2b-256 ccad9187de922350f9ea0c983828c70abc0929c89d7e16e54f34846a09e3bda7

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 a025508b92070729ecdfbe3c21f98744a3ce960b175961df1a49405783b0ccff
MD5 71b00bf8f707f194859545fe13307d02
BLAKE2b-256 747e3a71d3ef4fd85e843956548785c44ce0bbc02ca77e19572ba12755d312f8

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 36f3862afaa79585f869659b7fee2c1dbcd3beacc639e4930c2b063380b7ee2c
MD5 8e81e6e6cea5ed020508477875d8db86
BLAKE2b-256 b3250bf913857f18a034c7916625f62b4f45f48ac22f95c4af2a9225ff815d3d

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp312-cp312-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 e516263180da0a1a7b6c51cffcb5b2ba6a13361c2c64042d10606422dd303190
MD5 1b8bc4c91ead6b9ac7be5ca2cb89c0af
BLAKE2b-256 5010ddba1caaa58912fc1dd8af7c8112423a6173aa4eb22e6ed0c4d44a1448dc

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 f0977dc9a5d0383a2dee1a80d4c6960b08690649750e542e41793337bc28d7b6
MD5 7ba97753328c588e1a31876f85ebad5d
BLAKE2b-256 22d9a1b2f6e466439db5395d72c2ae00636ba5d351333d606f2c1c6bca3ac0ef

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 f5a3b6e964440ea0a9c5a9d5e04dfcf74c8af72e70d6168ace6bdb8994cffcf6
MD5 99c4928d4a14719d2820e3b1987ffbe4
BLAKE2b-256 25f4a1624dce8d71b1442bd7528e2488281bbee709ba46951b824f375b0b9024

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp311-cp311-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 c49de2be9023a14fa193cef0cf92234543d180cfd5cbb04db1c4c9da28912606
MD5 3cd1eb4ffad7274e4c3b89906efbb49e
BLAKE2b-256 a727c58f18dda352a705ed627f822fa73535c783030d76c2d4d7d9132ac4d423

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 7df90bf169dec9a7ebf1ef3922ce1e9f6cee3488aa2b62a1a26a5ca3103a8d2e
MD5 9c2b2d8ac0c36eebfd77cd0a04627c00
BLAKE2b-256 c12fd1e4cddb219bb213fdd0f04b3aae09e602d1f10b3d69cd46248e701428e7

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 51159f82532bb774e8ae38d88055cbe7946514b0cb03901ac593d3fc7c7d3c4a
MD5 2f307a2011826705d8f1b79eb60aa116
BLAKE2b-256 c40943cf60c776aac40fffa7612d38db8a250e3ea8efffd64d3d4a29693b6a34

See more details on using hashes here.

File details

Details for the file pyvelox-0.2.0-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for pyvelox-0.2.0-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 77ab7e5676c12ed25612779310c73783354b71d52f3359aabc344131db13eee7
MD5 42ed10cc5ad8faca48a73e40edcfb738
BLAKE2b-256 4705c62fe73de222075437785c3e695f7a7363f72ffca45b0ae3393d63120651

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page