Skip to main content
This is a pre-production deployment of Warehouse. Changes made here affect the production instance of PyPI (
Help us improve Python packaging - Donate today!

Portable Efficient Assembly Codegen in Higher-level Python

Project Description

PEACH-Py is a Python framework for writing high-performance assembly kernels. PEACH-Py is developed to simplify writing optimized assembly kernels while preserving all optimization opportunities of traditional assembly. Some PEACH-Py features:

  • Automatic register allocation
  • Stack frame management, including re-aligning of stack frame as needed
  • Generating versions of a function for different calling conventions from the same source (e.g. functions for Microsoft x64 ABI and System V x86-64 ABI can be generated from the same source)
  • Allows to define constants in the place where they are used (just like in high-level languages)
  • Tracking of instruction extensions used in the function.
  • Multiplexing of multiple instruction streams (helpful for software pipelining)


from peachpy.x64 import *

# Use 'x64-ms' for Microsoft x64 ABI
abi = peachpy.c.ABI('x64-sysv')
assembler = Assembler(abi)

# Implement function void add_1(const uint32_t *src, uint32_t *dst, size_t length)
src_argument = peachpy.c.Parameter("src", peachpy.c.Type("const uint32_t*"))
dst_argument = peachpy.c.Parameter("dst", peachpy.c.Type("uint32_t*"))
len_argument = peachpy.c.Parameter("length", peachpy.c.Type("size_t"))

# This optimized kernel will target Intel Nehalem processors. Any instructions which are not
# supported on Intel Nehalem (e.g. AVX instructions) will generate an error. If you don't have
# a particular target in mind, use "Unknown"
with Function(assembler, "add_1", (src_argument, dst_argument, len_argument), "Nehalem"):
    # Load arguments into registers
    srcPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( srcPointer, src_argument )

    dstPointer = GeneralPurposeRegister64()
    LOAD.PARAMETER( dstPointer, dst_argument )

    length = GeneralPurposeRegister64()
    LOAD.PARAMETER( length, len_argument )

    # Main processing loop. Length must be a multiple of 4.
    LABEL( 'loop' )

    x = SSERegister()
    MOVDQU( x, [srcPointer] )
    ADD( srcPointer, 16 )

    # Add 1 to x
    PADDD( x, Constant.uint32x4(1) )

    MOVDQU( [dstPointer], x )
    ADD( dstPointer, 16 )

    SUB( length, 4 )
    JNZ( 'loop' )


print assembler
Release History

Release History

This version
History Node


Download Files

Download Files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

File Name & Checksum SHA256 Checksum Help Version File Type Upload Date (116.9 kB) Copy SHA256 Checksum SHA256 Source Nov 24, 2013

Supported By

WebFaction WebFaction Technical Writing Elastic Elastic Search Pingdom Pingdom Monitoring Dyn Dyn DNS Sentry Sentry Error Logging CloudAMQP CloudAMQP RabbitMQ Heroku Heroku PaaS Kabu Creative Kabu Creative UX & Design Fastly Fastly CDN DigiCert DigiCert EV Certificate Rackspace Rackspace Cloud Servers DreamHost DreamHost Log Hosting