FLIR - ROCm Domain Specific Language for layout algebra (Python + embedded MLIR runtime)

Project description

FlyDSL (Flexible layout python DSL)

A Python DSL and a MLIR stack for authoring high‑performance GPU kernels with explicit layouts and tiling.

FlyDSL is the Python front‑end of the project: a Flexible Layout Python DSL for expressing tiling, partitioning, data movement, and kernel structure at a high level.

FlyDSL: FlyDSL is powered by FLIR (Flexible Layout Intermediate Representation): an end‑to‑end, MLIR‑native compiler stack for GPU kernels. Its core is the flir dialect—a first‑class layout IR with explicit algebra and coordinate mapping, plus a composable lowering pipeline to GPU/ROCDL.

Overview

FlyDSL (Python DSL): author kernels in Python and compile them through FLIR
- Primary package: flydsl/ (flydsl/src/flydsl/)
- Kernel examples: kernels/ (importable as kernels.*)
FLIR (flir dialect): the layout IR and compiler foundation
- Core abstractions: !flir.shape, !flir.stride, !flir.layout, !flir.coord
- Algebra ops: composition/product/divide/partition + coordinate mapping ops
- Tooling: flir-opt for pass testing and IR experimentation
Embedded MLIR Python runtime (_mlir)
- No external mlir python wheel is required: MLIR python bindings are built and staged into .flir/build/python_packages/flydsl/_mlir (default; legacy build/ also works)
- Python package root: .flir/build/python_packages/flydsl/

Repository layout

FlyDSL/
├── scripts/                   # helper scripts (build llvm, tests, packaging)
├── flir/                      # C++ sources + build scripts (CMake, embedded python bindings)
│   ├── CMakeLists.txt
│   ├── build.sh               # build FLIR + python bindings (recommended)
│   ├── include/flir/          # dialect headers + TableGen definitions
│   ├── lib/                   # dialect implementation (Dialect/, Transforms/)
│   ├── python_bindings/       # MLIR python bindings + runtime wrappers
│   └── tools/flir-opt/        # flir-opt CLI tool
├── flydsl/                    # Python sources (src/flydsl) + python-only docs/reqs
├── tests/                     # mlir + python tests/benchmarks
│   ├── mlir/                  # MLIR file tests
│   ├── pyir/                  # Python IR tests (no GPU required)
│   └── kernels/               # GPU execution tests
└── kernels/                   # Python kernels (importable as `kernels.*`)

Getting started

ROCm: required for GPU execution tests/benchmarks (IR-only tests do not need a GPU).
Build tools: cmake, C++ compiler, and optionally ninja (faster).
Python: Python 3 + pip.
- scripts/build_llvm.sh installs nanobind, numpy, pybind11.
- flydsl/requirements.txt exists for auxiliary deps (numpy, ) for runtime data initialize and result check.

Build

A) Build / use an existing llvm-project (MLIR)

If you already have an MLIR build, set:

export MLIR_PATH=/path/to/llvm-project/build

Or use the helper script (clones ROCm llvm-project and builds MLIR):

bash scripts/build_llvm.sh

B) Build FLIR (C++ + embedded python package)

./flir/build.sh

After a successful build, you will have:

.flir/build/bin/flir-opt (default; legacy build/bin/flir-opt also works)
Python package root at:
- .flir/build/python_packages/flydsl/
- This contains:
  - flydsl/ (your Python API)
  - _mlir/ (embedded MLIR python bindings)

Python install

python3 -m pip install -e .
#for development, you can also use:  
python setup.py develop

Build a wheel (default output under dist/):

python3 setup.py bdist_wheel
ls dist/

Run tests

bash scripts/run_tests.sh

What run_tests.sh does (high level):

MLIR file tests: runs tests/mlir/*.mlir through flir-opt --flir-to-standard
Python IR tests: runs tests/pyir/test_*.py (no GPU required)
Kernel/GPU execution tests (only if ROCm is detected): runs tests/kernels/test_*.py

For the test folder organization, see tests/ (mlir/, pyir/, kernels/).

Troubleshooting

flir-opt not found
- Run ./flir/build.sh, or build it explicitly:
  - cmake --build build --target flir-opt -j$(nproc)
Python import issues (No module named flydsl / No module named mlir)
- Ensure you are using the embedded package:
  - export PYTHONPATH=$(pwd)/build/python_packages/flydsl:$PYTHONPATH
- Or prefer in-tree sources:
  - export PYTHONPATH=$(pwd)/flydsl/src:$(pwd)/.flir/build/python_packages/flydsl:$PYTHONPATH
MLIR .so load errors
- Add MLIR build lib dir to the loader path:
  - export LD_LIBRARY_PATH=$MLIR_PATH/lib:$LD_LIBRARY_PATH

📐 FLIR Layout System

FLIR = Flexible Layout Intermediate Representation.

FLIR introduces a layout system to express complex data mapping patterns on GPUs (tiling, swizzling, vectorization).

Core Abstractions

Shape: The extent of dimensions (e.g., (M, N)).
Stride: The distance between elements in memory (e.g., (1, M) for column-major).
Layout: A pair of (Shape, Stride) that maps a logical Coordinate to a physical linear Index.

Formula: Index = dot(Coord, Stride) = sum(c_i * s_i)

Operations

Construction: make_shape, make_stride, make_layout, make_coord
Mapping:
- crd2idx(coord, layout) -> index: Convert logical coordinate to physical index.
- idx2crd(index, layout) -> coord: Convert physical index to logical coordinate.
Inspection: size, cosize, rank
Algebra:
- composition(A, B): Compose layouts (A ∘ B).
- product(A, B): Combine layouts (Logical, Tiled, Blocked, etc.).
- divide(A, B): Partition layout A by B (Logical, Tiled, etc.).
- local_partition(layout, tile, index): Slice layout for a specific thread/block.

Example (MLIR)

func.func @layout_example(%i: index, %j: index) -> index {
  %c8 = arith.constant 8 : index
  %c16 = arith.constant 16 : index
  %c1 = arith.constant 1 : index

  // Create 2D layout (8, 16) with column-major stride (1, 8)
  %shape = flir.make_shape %c8, %c16 : (index, index) -> !flir.shape<(8,16)>
  %stride = flir.make_stride %c1, %c8 : (index, index) -> !flir.stride<(1,8)>
  %layout = flir.make_layout %shape, %stride : (!flir.shape<(8,16)>, !flir.stride<(1,8)>) -> !flir.layout<(8,16):(1,8)>

  // Convert coordinate (i, j) to linear index
  %coord = flir.make_coord %i, %j : (index, index) -> !flir.coord<(?,?)>
  %idx = flir.crd2idx %coord, %layout : (!flir.coord<(?,?)>, !flir.layout<(8,16):(1,8)>) -> index

  return %idx : index
}

🐍 Python API (`flydsl`)

Python package: flydsl (C++/dialect namespace: flir).

FLIR provides a high-level Python API for generating kernels.

Layout Construction

from flydsl.dialects.ext import flir

class _LayoutExample(flir.MlirModule):
    @flir.jit
    def layout_ops(self: flir.T.i64):
        # Create Layout (8x16, column-major)
        shape = flir.make_shape(8, 16)
        stride = flir.make_stride(1, 8)
        layout = flir.make_layout(shape, stride)

        # Query layout properties
        total_size = flir.size(shape)
        layout_rank = flir.rank(layout)
        return total_size

Pipeline API

Easy-to-use compilation pipeline:

from flydsl.compiler.pipeline import Pipeline

# Build and run optimization pipeline
pipeline = (
    Pipeline()
    .flir_to_standard()
    .canonicalize()
    .cse()
    .rocdl_attach_target(chip="gfx942")
    # convert-gpu-to-rocdl must run under gpu.module
    .Gpu(Pipeline().convert_gpu_to_rocdl(runtime="HIP"))
    .gpu_to_llvm()
    .lower_to_llvm()
    .gpu_module_to_binary(format="bin")
)

binary_module = pipeline.run(module)

⚙️ Hierarchical Kernel Control

FLIR keeps the tiling hierarchy explicit across block, warp, thread, and instruction scopes:

# Define thread and value layouts
thr_layout = flir.make_ordered_layout((THR_M, THR_N), order=(1, 0))
val_layout = flir.make_ordered_layout((VAL_M, VAL_N), order=(1, 0))

# Create tiled copy with vectorized atoms
copy_atom = flir.make_copy_atom(T.f32(), vector_size=8)
tiled = flir.make_tiled_copy_tv(copy_atom, thr_layout, val_layout,
                                thr_shape=(THR_M, THR_N), val_shape=(VAL_M, VAL_N))

# Partition tensor across blocks and threads
tensor_A = flir.make_tensor(A, shape=(M, N), strides=(N, 1))
tiles = flir.zipped_divide(tensor_A, (THR_M * VAL_M, THR_N * VAL_N))
blk_tile = tiles[(flir.block_idx("y"), flir.block_idx("x"))]
thr_tile = tiled.get_slice(tid_linear).partition_S(blk_tile)

With per-level partitions, you can allocate register fragments, emit predicate masks, and schedule MFMA/vector instructions while retaining full knowledge of the execution hierarchy.

🧮 Minimal VecAdd Example

This condensed snippet mirrors tests/kernels/test_vec_add.py, showing how to define GPU kernels with tiled copies and fragments:

import flydsl
from flydsl.dialects.ext import flir
import _mlir.extras.types as T

THREADS, TILE, VEC = 256, 8, 4

class VecAddKernel(flir.MlirModule):
    GPU_MODULE_NAME = "vec_kernels"
    GPU_MODULE_TARGETS = ['#rocdl.target<chip = "gfx942">']

    @flir.kernel
    def vec_add(self: flir.T.i64,
                A: lambda: T.memref(T.dynamic(), T.f32()),
                B: lambda: T.memref(T.dynamic(), T.f32()),
                C: lambda: T.memref(T.dynamic(), T.f32()),
                n: lambda: T.index()):
        tid = flir.thread_idx("x")
        bid = flir.block_idx("x")

        # Define thread/value layouts for tiled copy
        thr_layout = flir.make_ordered_layout((THREADS,), order=(0,))
        val_layout = flir.make_ordered_layout((TILE,), order=(0,))
        copy_atom = flir.make_copy_atom(T.f32(), vector_size=VEC)
        tiled = flir.make_tiled_copy_tv(copy_atom, thr_layout, val_layout,
                                        thr_shape=(THREADS,), val_shape=(TILE,))

        # Partition tensors across blocks and threads
        tensor_A = flir.make_tensor(A, shape=(n,), strides=(1,))
        tiles_A = flir.zipped_divide(tensor_A, (THREADS * TILE,))
        blkA = tiles_A[(bid,)]
        thrA = tiled.get_slice(tid).partition_S(blkA)

        # Load to registers, compute, store
        frgA = flir.make_fragment_like(thrA, T.f32())
        flir.copy(tiled, thrA, frgA)
        # ... repeat for B/C, add, store results

# Compile and run
module = VecAddKernel().module
exe = flydsl.compile(module)
exe(a_dev, b_dev, c_dev, size)

See tests/kernels/test_vec_add.py for the complete implementation with benchmarking.

✅ Testing Status

Category	Status	Description
MLIR Core	✅ Passing	Type parsing, Op verification, Basic transforms
Flir Ops	✅ Passing	Layout algebra, Coordinate lowering
GPU Backend	✅ Passing	GPU kernel compilation, Shared memory, Vectorization
Hardware	✅ Passing	MFMA (Matrix Fused Multiply-Add) execution on MI300-family GPUs

Verified Platforms:

AMD MI300X/MI308X (gfx942), AMD MI350 (gfx950)
Linux / ROCm 6.x, 7.x

🙏 Acknowledgements

FLIR's design is inspired by ideas from several projects:

Categorical Foundations for CuTe Layouts – mathematical framework for layout algebra (companion code)
NVIDIA CUTLASS – CuTe layout algebra concepts (BSD-3-Clause parts only; no EULA-licensed code was referenced)
ROCm Composable Kernel – tile-based kernel design patterns for AMD GPUs
Triton – Python DSL for GPU kernel authoring

📄 License

Apache License 2.0

Project details

Release history Release notifications | RSS feed

0.1.6

May 8, 2026

0.1.6.dev529 pre-release

May 6, 2026

0.1.5

May 2, 2026

0.1.5.dev515 pre-release

Apr 29, 2026

0.1.5.dev504 pre-release

Apr 28, 2026

0.1.4.2

Apr 23, 2026

0.1.4

Apr 20, 2026

0.1.3.1

Apr 14, 2026

0.1.3

Apr 10, 2026

0.1.2

Apr 7, 2026

0.1.1

Mar 20, 2026

0.1.1.dev442 pre-release

Apr 5, 2026

0.1.1.dev435 pre-release

Apr 7, 2026

0.1.1.dev409 pre-release

Mar 24, 2026

0.1.1.dev408 pre-release

Mar 24, 2026

This version

0.0.1.dev95158637 pre-release

Feb 8, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distributions

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

flydsl-0.0.1.dev95158637-cp312-cp312-manylinux_2_35_x86_64.whl (72.3 MB view details)

Uploaded Feb 8, 2026 CPython 3.12manylinux: glibc 2.35+ x86-64

flydsl-0.0.1.dev95158637-cp310-cp310-manylinux_2_35_x86_64.whl (72.3 MB view details)

Uploaded Feb 8, 2026 CPython 3.10manylinux: glibc 2.35+ x86-64

File details

Details for the file flydsl-0.0.1.dev95158637-cp312-cp312-manylinux_2_35_x86_64.whl.

File metadata

Download URL: flydsl-0.0.1.dev95158637-cp312-cp312-manylinux_2_35_x86_64.whl
Upload date: Feb 8, 2026
Size: 72.3 MB
Tags: CPython 3.12, manylinux: glibc 2.35+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for flydsl-0.0.1.dev95158637-cp312-cp312-manylinux_2_35_x86_64.whl
Algorithm	Hash digest
SHA256	`c9f6bb2cd7ec23a995103aa1380f997952d19e56b98de0e39f9f0bb8c0b53962`
MD5	`000b43bb7f3da2db74cf5a271ecd2bc4`
BLAKE2b-256	`dbd64004356982a6ba34fd6c6bdbad93468b6b578387120d52b9c0d04ff47104`

See more details on using hashes here.

File details

Details for the file flydsl-0.0.1.dev95158637-cp310-cp310-manylinux_2_35_x86_64.whl.

File metadata

Download URL: flydsl-0.0.1.dev95158637-cp310-cp310-manylinux_2_35_x86_64.whl
Upload date: Feb 8, 2026
Size: 72.3 MB
Tags: CPython 3.10, manylinux: glibc 2.35+ x86-64
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.10.12

File hashes

Hashes for flydsl-0.0.1.dev95158637-cp310-cp310-manylinux_2_35_x86_64.whl
Algorithm	Hash digest
SHA256	`8186ab87d88383118ef7cc88cdf3f265186c6a4fbf6077c530a8e7a6d528278f`
MD5	`5f656fee954255c25aa3f352d700deda`
BLAKE2b-256	`ffcf39b78ea119ddffdf4fc9d9843cf7b9708af76722400c0a2606bedf5863b6`

See more details on using hashes here.

flydsl 0.0.1.dev95158637

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

FlyDSL (Flexible layout python DSL)

Overview

Repository layout

Getting started

Build

A) Build / use an existing llvm-project (MLIR)

B) Build FLIR (C++ + embedded python package)

Python install

Run tests

Troubleshooting

📐 FLIR Layout System

Core Abstractions

Operations

Example (MLIR)

🐍 Python API (flydsl)

Layout Construction

Pipeline API

⚙️ Hierarchical Kernel Control

🧮 Minimal VecAdd Example

✅ Testing Status

🙏 Acknowledgements

📄 License

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distributions

Built Distributions

File details

File metadata

File hashes

File details

File metadata

File hashes

🐍 Python API (`flydsl`)