Skip to main content

Portable Efficient Assembly Codegen in Higher-level Python

Project description

PEACH-Py is a Python framework for writing high-performance assembly kernels. PEACH-Py is developed to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PEACH-Py features:

  • Automatic register allocation
  • Stack frame management, including re-aligning of stack frame as needed
  • Generating versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)
  • Allows to define constants in the place where they are used (just like in high-level languages)
  • Tracking of instruction extensions used in the function.
  • Multiplexing of multiple instruction streams (helpful for software pipelining)


from peachpy.x64 import *

# Use 'x64-ms' for Microsoft x64 ABI
abi = peachpy.c.ABI('x64-sysv')
assembler = Assembler(abi)

# Implement function void add_1(const uint32_t *src, uint32_t *dst, size_t length)
src_argument = peachpy.c.Parameter("src", peachpy.c.Type("const uint32_t*"))
dst_argument = peachpy.c.Parameter("dst", peachpy.c.Type("uint32_t*"))
len_argument = peachpy.c.Parameter("length", peachpy.c.Type("size_t"))

# This optimized kernel will target Intel Nehalem processors. Any instructions which are not
# supported on Intel Nehalem (e.g. AVX instructions) will generate an error. If you don't have
# a particular target in mind, use "Unknown"
with Function(assembler, "add_1", (src_argument, dst_argument, len_argument), "Nehalem"):
    # Load arguments into registers
    srcPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( srcPointer, src_argument )

    dstPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( dstPointer, dst_argument )

    length = GeneralPurposeRegister64()
    LOAD.PARAMETER( length, len_argument )

    # Main processing loop. Length must be a multiple of 4.
    LABEL( 'loop' )

    x = SSERegister()
    MOVDQU( x, [srcPointer] )
    ADD( srcPointer, 16 )

    # Add 1 to x
    PADDD( x, Constant.uint32x4(1) )

    MOVDQU( [dstPointer], x )
    ADD( dstPointer, 16 )

    SUB( length, 4 )
    JNZ( 'loop' )


print assembler

Project details

Release history Release notifications

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date (116.9 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page