Skip to main content

A High-Performance Rendering Framework with Layered and Unified Interfaces on Stream Architectures

Project description

LuisaCompute

teaser

LuisaCompute is a high-performance cross-platform computing framework for graphics and beyond.

LuisaCompute is also the rendering framework described in the SIGGRAPH Asia 2022 paper

LuisaRender: A High-Performance Rendering Framework with Layered and Unified Interfaces on Stream Architectures.

See also LuisaRender for the rendering application as described in the paper; and please visit the project page for other information about the paper and the project.

Welcome to join the discussion channel on Discord!

对于中国大陆的用户,也欢迎加入我们的 QQ 群组:295618382。

Table of Contents

Overview

LuisaCompute seeks to balance the seemingly ever-conflicting pursuits for unification, programmability, and performance. To achieve this goal, we design three major components:

  • A domain-specific language (DSL) embedded inside modern C++ for kernel programming exploiting JIT code generation and compilation;
  • A unified runtime with resource wrappers for cross-platform resource management and command scheduling; and
  • Multiple optimized backends, including CUDA, DirectX, Metal, and CPU.

To demonstrate the practicality of the system, we also build a Monte Carlo renderer, LuisaRender, atop the framework, which is faster than the state-of-the-art rendering frameworks on modern GPUs.

Embedded Domain-Specific Language

The DSL in our system provides a unified approach to authoring kernels, i.e., programmable computation tasks on the device. Distinct from typical graphics APIs that use standalone shading languages for device code, our system unifies the authoring of both the host-side logic and device-side kernels into the same language, i.e., modern C++.

The implementation purely relies on the C++ language itself, without any custom preprocessing pass or compiler extension. We exploit meta-programming techniques to simulate the syntax, and function/operator overloading to dynamically trace the user-defined kernels. ASTs are constructed during the tracing as an intermediate representation and later handed over to the backends for generating concrete, platform-dependent shader source code.

Example program in the embedded DSL:

Callable to_srgb = [](Float3 x) {
    $if (x <= 0.00031308f) {
        x = 12.92f * x;
    } $else {
        x = 1.055f * pow(x, 1.f / 2.4f) - .055f;
    };
    return x;
};
Kernel2D fill = [&](ImageFloat image) {
    auto coord = dispatch_id().xy();
    auto size = make_float2(dispatch_size().xy());
    auto rg = make_float2(coord) / size;
    // invoke the callable
    auto srgb = to_srgb(make_float3(rg, 1.f));
    image.write(coord, make_float4(srgb, 1.f));
};

Unified Runtime with Resource Wrappers

Like the RHIs in game engines, we introduce an abstract runtime layer to re-unify the fragmented graphics APIs across platforms. It extracts the common concepts and constructs shared by the backend APIs and plays the bridging role between the high-level frontend interfaces and the low-level backend implementations.

On the programming interfaces for users, we provide high-level resource wrappers to ease programming and eliminate boilerplate code. They are strongly and statically typed modern C++ objects, which not only simplify the generation of commands via convenient member methods but also support close interaction with the DSL. Moreover, with the resource usage information in kernels and commands, the runtime automatically probes the dependencies between commands and re-schedules them to improve hardware utilization.

Multiple Backends

The backends are the final realizers of computation. They generate concrete shader sources from the ASTs and compile them into native shaders. They implement the virtual device interfaces with low-level platform-dependent API calls and translate the intermediate command representations into native kernel launches and command dispatches.

Currently, we have 3 working GPU backends for the C++ and Python frontends, based on CUDA, Metal, and DirectX, respectively, and a CPU backend (re-)implemented in Rust for debugging purpose and fallback.

Python Frontend

Besides the native C++ DSL and runtime interfaces, we are also working on a Python frontend and have published early-access packages to PyPI. You may install the pre-built wheels with pip (Python >= 3.10 required):

python -m pip install luisa-python

You may also build your own wheels with pip:

python -m pip wheel <path-to-project> -w <output-dir>

Examples using the Python frontend can be found under src/tests/python.

Note: Due to the different syntax and idioms between Python and C++, the Python frontend does not 1:1 reflects the C++ DSL and APIs. For instance, Python does not have a dedicated reference type qualifier, so we follow the Python idiom that structures and arrays are passed as references to @luisa.func and built-in types (scalar, vector, matrix, etc.) as values by default.

C API and Frontends in Other Languages

We are also making a C API for creating other language bindings and frontends (e.g., in Rust and C#).

Building

Note: LuisaCompute is a rendering framework rather than a renderer itself. It is designed to provide general computation functionalities on modern stream-processing hardware, on which high-performance, cross-platform graphics applications can be easily built. If you would like to just try a Monte Carlo renderer out of the box rather than building one from scratch, please see LuisaRender.

Preparation

  • Check your hardware and platform. Currently, we support CUDA on Linux and Windows; DirectX on Windows; Metal on macOS; and CPU on all the major platforms. For CUDA, an RTX-enabled graphics card, e.g., NVIDIA RTX 20 and 30 series, is required. For DirectX, a DirectX-12.1 & Shader Model 6.5 compatible graphics card is required.

  • Prepare the environment and dependencies. We recommend using the latest IDEs, Compilers, XMake/CMake, CUDA drivers, etc. Since we aggressively use new technologies like C++20 and OptiX 8, you may need to, for example, upgrade your VS to 2019 or 2022 and install CUDA 11.7+ and NVIDIA driver R535+.

  • Clone the repo with the --recursive option:

    git clone -b next https://github.com/LuisaGroup/LuisaCompute.git/ --recursive
    

    Since we use Git submodules to manage third-party dependencies, a --recursive clone is required.

  • Detailed requirements for each platform are listed in BUILD.md.

Build via the Bootstrap Script

The easiest way to build LuisaCompute is to use the bootstrap script. It can even download and install the required dependencies and build the project.

python bootstrap.py cmake -f cuda -b # build with CUDA backend using CMake
python bootstrap.py cmake -f cuda -b -- -DCMAKE_BUILD_TYPE=RelWithDebInfo # everything after -- will be passed to CMake

You may specify -f all to enable all available features on your platform.

To install certain dependencies, you can use the --install or -i option. For example, to install Rust, you can use:

python bootstrap.py -i rust

Alternatively, the bootstrap script can output a configuration file for build system without actually building the project. This is useful when you want to use the project inside IDE.

python bootstrap.py cmake -f cuda -c -o cmake-build-release # generate CMake configuration in ./cmake-build-release

Please use python bootstrap.py --help for more details.

Build from Source with XMake/CMake

LuisaCompute follows the standard XMake and CMake build process. Please see also BUILD.md for details on platform requirements, configuration options, and other precautions.

Usage

A Minimal Example

Using LuisaCompute to construct a graphics application basically involves the following steps:

  1. Create a Context and loading a Device plug-in;
  2. Create a Stream for command submission and other device resources (e.g., Buffer<T>s for linear storage, Image<T>s for 2D readable/writable textures, and Meshes and Accels for ray-scene intersection testing structures) via Device's create_* interfaces;
  3. Author Kernels to describe the on-device computation tasks, and compile them into Shaders via Device's compile interface;
  4. Generate Commands via each resource's interface (e.g., Buffer<T>::copy_to), or Shader's operator() and dispatch, and submit them to the stream;
  5. Wait for the results by inserting a synchronize phoney command to the Stream.

Putting the above together, a minimal example program that write gradient color to an image would look like

#include <luisa-compute.h>

// For the DSL sugar macros like $if.
// We exclude this header from <luisa-compute.h> to avoid pollution.
// So you have to include it explicitly to use the sugar macros.
#include <dsl/sugar.h>

using namespace luisa;
using namespace luisa::compute;

int main(int argc, char *argv[]) {

    // Step 1.1: Create a context
    Context context{argv[0]};
    
    // Step 1.2: Load the CUDA backend plug-in and create a device
    Device device = context.create_device("cuda");
    
    // Step 2.1: Create a stream for command submission
    Stream stream = device.create_stream();
    
    // Step 2.2: Create an 1024x1024 image with 4-channel 8-bit storage for each pixel; the template 
    //           argument `float` indicates that pixel values reading from or writing to the image
    //           are converted from `byte4` to `float4` or `float4` to `byte4` automatically
    Image<float> device_image = device.create_image<float>(PixelStorage::BYTE4, 1024u, 1024u, 0u);
    
    // Step 3.1: Define kernels to describe the device-side computation
    // 
    //           A `Callable` is a function *entity* (not directly inlined during 
    //           the AST recording) that is invocable from kernels or other callables
    Callable linear_to_srgb = [](Float4 /* alias for Var<float4> */ linear) noexcept {
        // The DSL syntax is much like the original C++
        auto x = linear.xyz();
        return make_float4(
            select(1.055f * pow(x, 1.0f / 2.4f) - 0.055f,
                   12.92f * x,
                   x <= 0.00031308f),
            linear.w);
    };
    //           A `Kernel` is an *entry* function to the device workload 
    Kernel2D fill_image_kernel = [&linear_to_srgb](ImageFloat /* alias for Var<Image<float>> */ image) noexcept {
        Var coord = dispatch_id().xy();
        Var rg = make_float2(coord) / make_float2(dispatch_size().xy());
        image->write(coord, linear_to_srgb(make_float4(rg, 1.0f, 1.0f)));
    };
    
    // Step 3.2: Compile the kernel into a shader (i.e., a runnable object on the device)
    auto fill_image = device.compile(fill_image_kernel);
    
    // Prepare the host memory for holding the image
    std::vector<std::byte> download_image(1024u * 1024u * 4u);
    
    // Step 4: Generate commands from resources and shaders, and
    //         submit them to the stream to execute on the device
    stream << fill_image(device_image.view(0)).dispatch(1024u, 1024u)
           << device_image.copy_to(download_image.data())
           << synchronize();// Step 5: Synchronize the stream
   
   // Now, you have the device-computed pixels in the host memory!
   your_image_save_function("color.png", downloaded_image, 1024u, 1024u, 4u);
}

Basic Types

In addition to standard C++ scalar types (e.g., int, uint --- alias of uint32_t, float, and bool), LuisaCompute provides vector/matrix types for 3D graphics, including the following types:

// boolean vectors
using bool2 = Vector<bool, 2>;   // alignment: 2B
using bool3 = Vector<bool, 3>;   // alignment: 4B
using bool4 = Vector<bool, 4>;   // alignment: 4B
// signed and unsigned integer vectors
using int2 = Vector<int, 2>;     // alignment: 8B
using int3 = Vector<int, 3>;     // alignment: 16B
using int4 = Vector<int, 4>;     // alignment: 16B
using uint2 = Vector<uint, 2>;   // alignment: 8B
using uint3 = Vector<uint, 3>;   // alignment: 16B
using uint4 = Vector<uint, 4>;   // alignment: 16B
// floating-point vectors and matrices
using float2 = Vector<float, 2>; // alignment: 8B
using float3 = Vector<float, 3>; // alignment: 16B
using float4 = Vector<float, 4>; // alignment: 16B
using float2x2 = Matrix<2>;      // column-major, alignment: 8B
using float3x3 = Matrix<3>;      // column-major, alignment: 16B
using float4x4 = Matrix<4>;      // column-major, alignment: 16B

⚠️ Please pay attention to the alignment of 3D vectors and matrices --- they are aligned like 4D ones rather than packed. Also, we do not provide 64-bit integer or floating-point vector/matrix types, as they are less useful and typically unsupported on GPUs.

To make vectors/matrices, we provide make_* and read-only swizzle interfaces, e.g.,

auto a = make_float2();       // (0.f, 0.f)
auto b = make_int3(1);        // (1,   1,   1)
auto c = make_uint3(b);       // (1u,  1u,  1u): converts from a same-dimentional but (possibly) differently typed vector
auto d = make_float3(a, 1.f); // (0.f, 0.f, 1.f): construct float3 from float2 and a float scalar
auto e = d.zzxy();            // (1.f, 1.f, 0.f, 0.f): swizzle
auto m = make_float2x2(1.f);  // ((1.f, 0.f,), (0.f, 1.f)): diagonal matrix from a scalar
...

Operators are also overloaded for scalar-vector, vector-vector, scalar-matrix, vector-matrix, and matrix-matrix calculations, e.g.,

auto one = make_float2(1.f); // (1.f, 1.f)
auto two = 2.f;
auto three = one + two;      // (3.f, 3.f), scalar broadcast to vector
auto m2 = make_float2(2.f);  // ((2.f, 0.f), (0.f, 2.f))
auto m3 = 1.5f * m2;         // ((3.f, 0.f), (0.f, 3.f)), scalar-matrix multiplication
auto v = m3 * one;           // (3.f, 3.f), matrix-vector multiplication, the vector should always
                             // appear at the right-hand side and is interpreted as a column vector
auto m6 = m2 * m3;           // ((6.f, 0.f), (0.f, 6.f)), matrix-matrix multiplication

The scalar, vector, matrix, and array types are also supported in the DSL, together with make_*, swizzles, and operators. Just wrap them in the Var<T> template or use the pre-defined aliases:

// scalar types; note that 64-bit ones are not supported
using Int = Var<int>;
using UInt = Var<uint>;
using Float = Var<float>;
using Bool = Var<bool>;

// vector types
using Int2 = Var<int2>; // = Var<Vector<int, 2>>
using Int3 = Var<int3>; // = Var<Vector<int, 3>>
/* ... */

// matrix types
using Float2x2 = Var<float2x2>; // = Var<Matrix<2>>
using Float3x3 = Var<float3x3>; // = Var<Matrix<3>>
using Float4x4 = Var<float4x4>; // = Var<Matrix<4>>

// array types
template<typename T, size_t N>
using ArrayVar = Var<std::array<T, N>>;

// make_*
auto a = make_float2(one);    // Float2(1.f, 1.f), suppose one = Float(1.f)
auto m = make_float2x2(a, a); // Float2x2((1.f, 1.f), (1.f, 1.f))
auto c = make_int2(a);        // Int2(1, 1)
auto d = c.xxx();             // Int3(1, 1, 1)
auto e = d[0];                // 1
/* ... */

// operators
auto v2 = a * 2.f;  // Float2(2.f, 2.f)
auto eq = v2 == v2; // Bool2(true, true)
/* ... */

⚠️ The only exception is that we disable operator&& and operator|| in the DSL for scalars. This is because the DSL does not support the short-circuit semantics. We disable them to avoid ambiguity. Please use operator& and operator| instead, which have the consistent non-short-circuit semantics on both the host and device sides.

Besides the Var<T> template, there's also an Expr<T>, which is to Var<T> what const T & is to T on the host side. In other words, Expr<T> stands for a const DSL variable reference, which does not create variables copies when passed around. However, note that the parameters of Callable/Kernel definition functions may only be Var<T>. This restriction might be removed in the future.

To conveniently convert a C++ variable to the DSL, we provide a helper template function def<T>:

auto a = def(1.f);              // equivalent to auto a = def<float>(1.f);
auto b_host = make_float2(1.f); // host C++ variable float2(1.f, 1.f)
auto b_device = def(b_host);    // device DSL variable Float2(1.f, 1.f)
/* ... */

Structures

To export a C++ data struct to the DSL, we provide a helper macro LUISA_STRUCT, which (semi-)automatically reflects the member layouts of the input structure:

// A C++ data structure
namespace foo {
struct alignas(8) S {
    float a;
    int   b;
};
}

// A reflected DSL structure
LUISA_STRUCT(foo::S, a, b) {
/* device-side member functions, e.g., */
    [[nodiscard]] auto twice_a() const noexcept { return 2.f * a; }
};

⚠️ The LUISA_STRUCT may only be used in the global namespace. The C++ structure to be exported may only contain scalar, vector, matrix, array, and other already exported structure types. The alignment of the whole structure specified with alignas will be reflected but must be under 16B; member alignments specified with alignas are not supported.

Built-in Functions

For the DSL, we provide a rich set of built-in functions, in the following categories

  • Thread coordinate and launch configuration queries, including block_id, thread_id, dispatch_size, and dispatch_id;
  • Mathematical routines, such as max, abs, sin, pow, and sqrt;
  • Resource accessing and modification methods, such as texture sampling, buffer read/write, and ray intersection;
  • Variable construction and type conversion, e.g., the aforementioned make_*, cast<T> for static type casting, and as<T> for bitwise type casting; and
  • Optimization hints for backend compilers, which currently consist of assume and unreachable.

The mathematical functions basically mirrors GLSL. We are working on the documentations that will provide more descriptions on them.

Control Flows

The DSL in LuisaCompute supports device-side control flows. They are provided as special macros prefixed with $:

$if (cond) { /*...*/ };
$if (cond) { /*...*/ } $else { /*...*/ };
$if (cond) { /*...*/ } $elif (cond2) { /*...*/ };
$if (cond) { /*...*/ } $elif (cond2) { /*...*/ } $else { /*...*/ };

$while (cond) { /*...*/ };
$for (variable, n) { /*...*/ };
$for (variable, begin, end) { /*...*/ };
$for (variable, begin, end, step) { /*...*/ };
$loop { /*...*/ }; // infinite loop, unless $break'ed

$switch (variable) {
    $case (value) { /*...*/ }; // no $break needed inside, as we automatically add one
    $default { /*...*/ };      // no $break needed inside, as we automatically add one
};

$break;
$continue;

Note that users are still able to use the native C++ control flows, i.e., if, while, etc. without the $ prefix. In that case the native control flows acts like a meta-stage to the DSL that directly controls the generation of the callables/kernels. This can be a powerful means to achieve multi-stage programming patterns. Such usages can be found throughout LuisaRender. We will cover such usage in the tutorials in the future.

Callable and Kernels

LuisaCompute supports two categories of device functions: Kernels (Kernel1D, Kernel2D, or Kernel3D) and Callables. Kernels are entries to the parallelized computation tasks on the device (equivalent to CUDA's __global__ functions). Callables are function objects invocable from kernels or other callables (i.e., like CUDA's __device__ functions). Both kinds are template classes that are constructible from C++ functions or function objects including lambda expressions:

// Define a callable from a lambda expression
Callable add_one = [](Float x) { return x + 1.f; };

// A callable may invoke another callable
Callable add_two = [&add_one](Float x) {
    add_one(add_one(x));
};

// A callable may use captured device resources or resources in the argument list
auto buffer = device.create_buffer<float>(...);
Callable copy = [&buffer](BufferFloat buffer2, UInt index) {
    auto x = buffer.read(index); // use captured resource
    buffer2.write(index, x);     // use declared resource in the argument list
};

// Define a 1D kernel from a lambda expression
Kernel1D add_one_and_some = [&buffer, &add_one](Float some, BufferFloat out) {
    auto index = dispatch_id().x;    // query thread index in the whole grid with built-in dispatch_id()
    auto x = buffer.read(index);     // use resource through capturing
    auto result = add_one(x) + some; // invoke a callable
    out.write(index, result);        // use resource in the argument list
};

⚠️ Note that parameters of the definition functions for callables and kernels must be Var<T> or Var<T> & (or their aliases).

Kernels can be compiled into shaders by the device:

auto some_shader = device.compile(some_kernel);

⚠️ Note that the compilation blocks the calling thread. For large kernels this might take a considerably long time. You may accelerate the process by compiling multiple kernels concurrently, e.g., with thread pools.

Most backends support caching the compiled shaders to accelerate future compilations of the same shader. The cache files are at <build-folder>/bin/.cache.

Backends, Context, Devices and Resources

LuisaCompute currently supports these backends:

  • CUDA
  • DirectX
  • Metal
  • CPU (Clang + LLVM)

More backends might be added in the future. A device backend is implemented as a plug-in, which follows the lc-backend-<name> naming convention and is placed under <build-folder>/bin.

The Context object is responsible for loading and managing these plug-ins and creating/destroying devices. Users have to pass the executable path (typically, argv[0]) or the runtime directory to a context's constructor (so that it's able to locate the plug-ins), and pass the backend name to create the corresponding device object.

int main(int argc, char *argv[]) {
    Context context{argv[0]};
    Device device = context.create_device("cuda");
    /* ... */
}

⚠️ Creating multiple devices inside the same application is allowed. However, the resources are not shared across devices. Visiting one device's resources from another device's commands/shaders would lead to undefined behaviors.

The device object provides methods for backend-specific operations, typicall, creating resources. LuisaCompute supports the following rousource types:

  • Buffer<T>s, which are linear memory ranges on the device for structured data storage;
  • Image<T>s and Volume<T>s, which are 2D/3D textures of scalars or vectors readable and writable from the shader, possibly with hardware-accelerated caching and format conversion;
  • BindlessArrays, which provide slots for references to buffers and textures (Images or Volumes bound with texture samplers, read-only in the shader), helpful for reducing the overhead and bypassing the limitations of binding shader parameters;
  • Meshes and Accels (short for acceleration structures) for high-performance ray intersection tests, with hardware acceleration if available (e.g., on graphics cards that feature RT-Cores);
hardware_resources

Devices are also responsible for

  • Creating Streams and Events (the former are for command submission and the latter are for host-stream and stream-stream synchronization); and
  • Compiling kernels into shaders, as introduced before.

All resources, shaders, streams, and events are C++ objects with move contrutors/assignments and following the RAII idiom, i.e., automatically calling the Device::destroy_* interfaces when destructed.

⚠️ Users may need to pay attention not to dangle a resource, e.g., accidentally releases it before the dependent commands finish.

Command Submission and Synchronization

LuisaCompute adopts the explicit command-based execution model. Conceptually, commands are description units of atomic computation tasks, such as transferring data between the device and host, or from one resource to another; building meshes and acceleration structures; populating or updating bindless arrays; and most importantly, launching shaders.

Commands are organized into command buffers and then submitted to streams which are essentially queues forwarding commands to the backend devices in a logically first-in-first-out (FIFO) manner.

The resource wrappers provide convenient methods for creating commands, e.g.,

auto buffer_upload_command   = buffer.copy_from(host_data)
auto accel_build_command     = accel.build();
auto shader_dispatch_command = shader(args...).dispatch(n);

Command buffers are group commands that are submitted together:

auto command_buffer = stream.command_buffer();
command_buffer
    << raytrace_shader(framebuffer, accel, resolution)
        .dispatch(resolution)
    << accumulate_shader(accum_image, framebuffer)
        .dispatch(resolution)
    << hdr2ldr_shader(accum_image, ldr_image)
        .dispatch(resolution)
    << ldr_image.copy_to(host_image.data())
    << commit(); // the commands are submitted to the stream together on commit()

For convenience, a stream implicitly creates a proxy object, which submit commands in the internal command buffer at the end of statements:

stream << buffer.copy_from(host_data) // a stream proxy is created on Stream::operator<<()
       << accel.build()               // consecutive commands are stored in the implicit commad buffer in the proxy object
       << raytracing(image, accel, i)
           .dispatch(width, height);  // the proxy object automatically submits the commands at the end of the statement

⚠️ Since commands are asynchronously executed, users should pay attention to resource and host data lifetimes.

The backends in LuisaCompute can automatically determine the dependencies between the commands in a command buffer, and re-schedule them into an optimized order to improve hardware ultilization. Therefore, larger command buffers might be preferred for better computation throughput.

command scheduling

Multiple streams run concurrently. Therefore, users may require synchronizations between them or with respect to the host via Events, similar to condition variables that ensure ordering across threads:

auto event = device.create_event();
stream_a << command_a
         << event.signal(); // signals an event
stream_b << event.wait()    // waits until the event signals
         << command_b;      // will be executed after the event signals
         << event.signal(); // signals again
event.synchronize();        // blocks until the event signals

Automatic Differentiation

We implemented reverse mode autodiff using source-to-source transformation. The autodiff supports control flows such as if-else and switch, as well as callables. The following example shows how to use the autodiff to compute the gradient of a function f(t, x, y) = t < 1 ? x * y : x + y with respect to x and y:

Var<float> x = ...;
Var<float> y = ...;
Var<float> t = ...;
$autodiff {
    requires_grad(x, y);
    Var<float> z;
    $if(t < 1.0) {
        auto no_grad = some_non_differentiable_function(x, y);
        z = x * y;
    }$else {
        z = callable(x, y);
    };
    backward(z);
    dx->write(tid, grad(x));
    dy->write(tid, grad(y));
};

Limitation (might be removed in the future):

  • we don't support loop with dynamic iteration count. To differentiate a loop, users have to unroll it by using for(auto i = 0;i <count;i++) { dsl_body(i); }.

Applications

We implement several proof-of-concept examples in tree under src/tests (sorry for the misleading naming; they are also test programs we used during the development). Besides, you may also found the following applications interesting:

Documentation and Tutorials

Sorry that we are still working on them. Currently, we would recommand reading the original paper and learning through the examples and applications.

If you have any problem or suggestion, please just feel free to open an issue or start a discussion. We are very happy to hear from you!

Roadmap

See ROADMAP.md.

Citation

@article{Zheng2022LuisaRender,
    author = {Zheng, Shaokun and Zhou, Zhiqian and Chen, Xin and Yan, Difei and Zhang, Chuyan and Geng, Yuefeng and Gu, Yan and Xu, Kun},
    title = {LuisaRender: A High-Performance Rendering Framework with Layered and Unified Interfaces on Stream Architectures},
    year = {2022},
    issue_date = {December 2022},
    publisher = {Association for Computing Machinery},
    address = {New York, NY, USA},
    volume = {41},
    number = {6},
    issn = {0730-0301},
    url = {https://doi.org/10.1145/3550454.3555463},
    doi = {10.1145/3550454.3555463},
    journal = {ACM Trans. Graph.},
    month = {nov},
    articleno = {232},
    numpages = {19},
    keywords = {stream architecture, rendering framework, cross-platform renderer}
}

The publisher version of the paper is open-access. You may download it for free.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

luisa_python-0.3.7-pp310-pypy310_pp73-win_amd64.whl (78.1 MB view details)

Uploaded PyPy Windows x86-64

luisa_python-0.3.7-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl (60.7 MB view details)

Uploaded PyPy manylinux: glibc 2.28+ x86-64

luisa_python-0.3.7-pp310-pypy310_pp73-macosx_14_0_arm64.whl (23.9 MB view details)

Uploaded PyPy macOS 14.0+ ARM64

luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_x86_64.whl (32.5 MB view details)

Uploaded PyPy macOS 13.0+ x86-64

luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_arm64.whl (23.9 MB view details)

Uploaded PyPy macOS 13.0+ ARM64

luisa_python-0.3.7-cp312-cp312-win_amd64.whl (78.1 MB view details)

Uploaded CPython 3.12 Windows x86-64

luisa_python-0.3.7-cp312-cp312-manylinux_2_28_x86_64.whl (60.7 MB view details)

Uploaded CPython 3.12 manylinux: glibc 2.28+ x86-64

luisa_python-0.3.7-cp312-cp312-macosx_14_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.12 macOS 14.0+ ARM64

luisa_python-0.3.7-cp312-cp312-macosx_13_0_x86_64.whl (32.5 MB view details)

Uploaded CPython 3.12 macOS 13.0+ x86-64

luisa_python-0.3.7-cp312-cp312-macosx_13_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.12 macOS 13.0+ ARM64

luisa_python-0.3.7-cp311-cp311-win_amd64.whl (78.1 MB view details)

Uploaded CPython 3.11 Windows x86-64

luisa_python-0.3.7-cp311-cp311-manylinux_2_28_x86_64.whl (60.7 MB view details)

Uploaded CPython 3.11 manylinux: glibc 2.28+ x86-64

luisa_python-0.3.7-cp311-cp311-macosx_14_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.11 macOS 14.0+ ARM64

luisa_python-0.3.7-cp311-cp311-macosx_13_0_x86_64.whl (32.5 MB view details)

Uploaded CPython 3.11 macOS 13.0+ x86-64

luisa_python-0.3.7-cp311-cp311-macosx_13_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.11 macOS 13.0+ ARM64

luisa_python-0.3.7-cp310-cp310-win_amd64.whl (78.1 MB view details)

Uploaded CPython 3.10 Windows x86-64

luisa_python-0.3.7-cp310-cp310-manylinux_2_28_x86_64.whl (60.7 MB view details)

Uploaded CPython 3.10 manylinux: glibc 2.28+ x86-64

luisa_python-0.3.7-cp310-cp310-macosx_14_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.10 macOS 14.0+ ARM64

luisa_python-0.3.7-cp310-cp310-macosx_13_0_x86_64.whl (32.5 MB view details)

Uploaded CPython 3.10 macOS 13.0+ x86-64

luisa_python-0.3.7-cp310-cp310-macosx_13_0_arm64.whl (23.9 MB view details)

Uploaded CPython 3.10 macOS 13.0+ ARM64

File details

Details for the file luisa_python-0.3.7-pp310-pypy310_pp73-win_amd64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-pp310-pypy310_pp73-win_amd64.whl
Algorithm Hash digest
SHA256 bbc9fcf5fc78d9411ad5b48666c46ede38bd543377713bfee0efbd5463923926
MD5 ae4bcbbc1b0ee62541a59d8e65c9d966
BLAKE2b-256 358b101a907e02b6baca9ba8c2d044840feebcdb65b321a53d52fd86ba3332fb

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-pp310-pypy310_pp73-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 1ddb7fa864246ba26bdc22b5592adeec4b43fc24d48253bd1ebb0c7a8620d134
MD5 807ed4f9233812c03b240e2d7528f988
BLAKE2b-256 59b3e1853ea686082a0b63b0386b4630cd1799836a3c81eed0f4df5409210121

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-pp310-pypy310_pp73-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-pp310-pypy310_pp73-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 89b477f12909255ab447184d69576779c21512bc30e0cef1428a4e7d1689db16
MD5 7dde3d03933811c2616e9a478fcb66dc
BLAKE2b-256 23251370efac82ad39b9ef0cbf0e6316dffceba1f0aa97721863cb6dd952430d

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 85425ef62396edf6ac0da9ea20e7002e02933f7575a1167a300e9a4538cfe98f
MD5 2070b72a78c510e3eeb0280457a6e404
BLAKE2b-256 4d515480df12eac1498c669682d6ef3fc0e428f1806dbc508020a6ffc0ac62f2

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-pp310-pypy310_pp73-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 d9e50e64eccfeeb7acae1d4fe4dbe63f5b41ba9c4a5c9c01d906c816427cca0b
MD5 1eb9ac7f42ebf2a21f54c2b3756a4b76
BLAKE2b-256 1c41e1df30cc0804ade944039f103fbc9bdd6ca36f58a630634abc1b4728615c

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp312-cp312-win_amd64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp312-cp312-win_amd64.whl
Algorithm Hash digest
SHA256 6551750fe242788ec6e48217f1068e47866b9fa9847bfb2d47418cf16e4fad90
MD5 bca0a54de231c044c9f641b5118e3bd3
BLAKE2b-256 02fabb1d4440b89c1dfa3fdc09d313b17102df9c75a936182f2f387896befa50

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp312-cp312-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp312-cp312-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 88475b00915ce7eac92f0c919070c708c01195c617e08302fde766cf9d87c3f7
MD5 4aebeb860f9b466c698d58292741b971
BLAKE2b-256 17205a05a8679e133073bd17cb58a8ec14e336737e5b52f65198ac803a365aef

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp312-cp312-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp312-cp312-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 7a24d7c5dced3cdc573d9a6e2f578abe2d4a98fe1ee2aef57628f4d0d4e5a7ee
MD5 a7dd9f2d90e594a00ff0ea4f054f5132
BLAKE2b-256 c639466298bc05e98f8395729dcbf7b133b437734fd2ddd8c106fbad29b659f4

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp312-cp312-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp312-cp312-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 f65c48b5549087bd537c78402d588ecf600d8c52da0cb1ed7894a7ac1ba93ce7
MD5 cccea007779d7201982b77d7c006d423
BLAKE2b-256 6f3e69cfe40ad63988cff780d2444b53a195dd054bc96f70b3cd050488fe8975

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp312-cp312-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp312-cp312-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 12df43f64c62f8226978e1a5373a7a7235c649b101bffe31b3d15bddb5767891
MD5 9bc7ccfb3f81d0011eb4239f00dfd3c3
BLAKE2b-256 ec6c93df68af0d120ffeb707c7ceadbd4dbacf229a93174cf2d40b073bc82eed

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp311-cp311-win_amd64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp311-cp311-win_amd64.whl
Algorithm Hash digest
SHA256 3e0227c0e923125369c1fbe8dfdf3d5797a32ba57e42fdb926c7231b0bf4362c
MD5 ae3eacd9d9cf57295a0bb0801bbfe933
BLAKE2b-256 a7dff8241dff635bd466a97a483822b02a0762dcba15e34eef4e89d2db565cc4

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp311-cp311-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp311-cp311-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 71f2e904532962cc2bd3549c11abdfdbbb6eb5f39476760a7242905a05a86559
MD5 8f99a61d6d1e722e01a8bbeac079336e
BLAKE2b-256 c2c963340c40bd76f534e4fbb1b17cc3001bd4e0169ff3f7c72f9c34b1ab78ad

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp311-cp311-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp311-cp311-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 1164b2b4981b976c5a82b6de1d6458e13faa6f111944db762afaf39a9bcfc1b5
MD5 c4d43e5b6d12670fd086a40a5c3558dd
BLAKE2b-256 3c3aeefb8b82e050935646b2174bf56cf90da474cdc8021e4f94ad5c72577239

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp311-cp311-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp311-cp311-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 a5a20e6d217ae218b078d677e51a22e46e94a1737cb455e152b5a182528eb963
MD5 136a986380242d1429dcc71990afe0a9
BLAKE2b-256 189ccbca7cb78eb5e1135c3a0ceea30a40b746ebf14eb1a97e1b71f79ac8c735

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp311-cp311-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp311-cp311-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 f6b8d8030d01857045dcf2070da9072bfb2526a649325a3b1d8aad6cbc58156c
MD5 93fabfdc64eb347a3d63d191e3a52471
BLAKE2b-256 01449f59415a09fe4e2020e5044781a1c5a2e29c96ceef30151d3ae37d112b86

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp310-cp310-win_amd64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp310-cp310-win_amd64.whl
Algorithm Hash digest
SHA256 666137d0f2bf7be483ec662ed856e13d284acfa1e0db3ca8968a7f63721c2169
MD5 6094d3f0a05e0fcc0f89c972c34015a0
BLAKE2b-256 ed41ee04234232186510363b863e258c22382c20b1c2da051bd9e4e0427268a7

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp310-cp310-manylinux_2_28_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp310-cp310-manylinux_2_28_x86_64.whl
Algorithm Hash digest
SHA256 bdc04710d085843efbdf3901ef8b6d0aad7c7a5ab1cf64d7382803542bb39f66
MD5 71aed6cc2e0717a58b9638774d277884
BLAKE2b-256 686e46703faabc2fef43cb53562290ca58404427fd89e09ab82eb6600d146447

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp310-cp310-macosx_14_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp310-cp310-macosx_14_0_arm64.whl
Algorithm Hash digest
SHA256 b959ede089dd8e2ac7d4429063399f4a18ada1f4b0223bf88cf8264f6639782f
MD5 75b269016d3efc23f853bf8a1595b715
BLAKE2b-256 4e3f8adb340d2c06485a1e5201197c224d4abedea4c2f14e607d5b8517d0ca92

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp310-cp310-macosx_13_0_x86_64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp310-cp310-macosx_13_0_x86_64.whl
Algorithm Hash digest
SHA256 991e5f9a27d865cdd6eeb36882f550917fee4c8490b8f60ef06995b313baab3c
MD5 7ea0535c1b3e7e2351a19f21e77d6779
BLAKE2b-256 cc1d80518f8338d02103908062087e3d41f91721f8dcd29c8eceb31ba26f5ddb

See more details on using hashes here.

File details

Details for the file luisa_python-0.3.7-cp310-cp310-macosx_13_0_arm64.whl.

File metadata

File hashes

Hashes for luisa_python-0.3.7-cp310-cp310-macosx_13_0_arm64.whl
Algorithm Hash digest
SHA256 ed8cb526c2fde45416d91d36f531b46848218872deaf1cfcf90906f809572528
MD5 82de818ebaa6adeeef709203043ffaf1
BLAKE2b-256 4cef7cf383fc8247596398e4aefb9de773342ec39ce0fd39484c5c283c7b5f5b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page