Skip to main content

Portable Efficient Assembly Codegen in Higher-level Python

Project description

PEACH-Py is a Python framework for writing high-performance assembly kernels. PEACH-Py is developed to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PEACH-Py features:

  • Automatic register allocation

  • Stack frame management, including re-aligning of stack frame as needed

  • Generating versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)

  • Allows to define constants in the place where they are used (just like in high-level languages)

  • Tracking of instruction extensions used in the function.

  • Multiplexing of multiple instruction streams (helpful for software pipelining)


from peachpy.x64 import *

# Use 'x64-ms' for Microsoft x64 ABI
abi = peachpy.c.ABI('x64-sysv')
assembler = Assembler(abi)

# Implement function void add_1(const uint32_t *src, uint32_t *dst, size_t length)
src_argument = peachpy.c.Parameter("src", peachpy.c.Type("const uint32_t*"))
dst_argument = peachpy.c.Parameter("dst", peachpy.c.Type("uint32_t*"))
len_argument = peachpy.c.Parameter("length", peachpy.c.Type("size_t"))

# This optimized kernel will target Intel Nehalem processors. Any instructions which are not
# supported on Intel Nehalem (e.g. AVX instructions) will generate an error. If you don't have
# a particular target in mind, use "Unknown"
with Function(assembler, "add_1", (src_argument, dst_argument, len_argument), "Nehalem"):
    # Load arguments into registers
    srcPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( srcPointer, src_argument )

    dstPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( dstPointer, dst_argument )

    length = GeneralPurposeRegister64()
    LOAD.PARAMETER( length, len_argument )

    # Main processing loop. Length must be a multiple of 4.
    LABEL( 'loop' )

    x = SSERegister()
    MOVDQU( x, [srcPointer] )
    ADD( srcPointer, 16 )

    # Add 1 to x
    PADDD( x, Constant.uint32x4(1) )

    MOVDQU( [dstPointer], x )
    ADD( dstPointer, 16 )

    SUB( length, 4 )
    JNZ( 'loop' )


print assembler

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution (116.9 kB view hashes)

Uploaded source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page