Skip to main content

Python-to-Many code translation

Project description

MultiGen: Multi-Language Code Generator

MultiGen is a Python-to-multiple-languages code generator that translates Python code to C, C++, Rust, Go, Haskell, OCaml, and LLVM IR while preserving semantics and performance characteristics.

Overview

MultiGen extends the CGen (Python-to-C) project into a multi-language translation system with enhanced runtime libraries, code generation, and a clean backend architecture.

Key Features

  • Multi-Language Support: Generate code for C, C++, Rust, Go, Haskell, OCaml, and LLVM IR
  • Universal Preference System: Customize code generation for each backend with language-specific preferences
  • Advanced Python Support: Object-oriented programming, comprehensions, string methods, augmented assignment
  • Modern Libraries: C++ STL, Rust standard library, Go standard library, Haskell containers, OCaml standard library
  • Clean Architecture: Extensible backend system with abstract interfaces for adding new target languages
  • Type-Safe Generation: Leverages Python type annotations for accurate and safe code translation
  • Runtime Libraries: Enhanced C backend with 50KB+ runtime libraries providing Python-like semantics
  • CLI Interface: Simple command-line tool with preference customization for conversion and building
  • Production-Ready: 1183 passing tests ensuring translation accuracy and code quality
  • LLVM Backend: Native compilation via LLVM IR with O0-O3 optimization levels

Supported Languages

Language Status Extension Build System Advanced Features Benchmarks
C Production .c Makefile / gcc OOP, STC containers, string methods, comprehensions 7/7 (100%)
C++ Production .cpp Makefile / g++ OOP, STL containers, string methods, comprehensions 7/7 (100%)
Rust Production .rs Cargo / rustc OOP, ownership-aware, string methods, comprehensions 7/7 (100%)
Go Production .go go.mod / go build OOP, defer pattern, string methods, comprehensions 7/7 (100%)
Haskell Production .hs Cabal / ghc Pure functional, comprehensions, type safety 7/7 (100%)
OCaml Production .ml dune / ocamlc Functional, pattern matching, mutable refs 7/7 (100%)
LLVM Production .ll llvmlite / clang Native compilation, O0-O3 optimization, multi-platform 7/7 (100%)

Benchmark Results

% make benchmark # ran on m1 macbook air 
================================================================================
BENCHMARK SUMMARY
================================================================================
Total: 7 benchmarks × 7 backends = 49 runs
Success: 49 | Failed: 0

Backend      Success  Compile (s)  Run (s)      Binary (KB)  LOC
--------------------------------------------------------------------------------
c            7/7       0.390        0.275189     94.9         76
cpp          7/7       0.435        0.251988     36.1         51
go           7/7       0.190        0.265097     2365.4       38
haskell      7/7       0.156        0.024035     19944.6      65
llvm         7/7       0.310        0.251354     49.0         321
ocaml        7/7       0.234        0.271373     826.3        27
rust         7/7       0.266        0.250707     443.0        37
===============================================================================

Quick Start

Installation

Install from pypi

pip install multigen

Install from source

git clone https://github.com/shakfu/multigen
cd multigen
pip install -e .

Optional Dependencies

MultiGen has zero required dependencies for core functionality (C, C++, Rust, Go, Haskell, OCaml backends). Optional features can be installed as needed:

# LLVM backend support (native compilation, WebAssembly)
pip install multigen[llvm]

# Z3 theorem prover (formal verification)
pip install multigen[z3]

# All optional dependencies
pip install multigen[all]

Basic Usage

# List available backends
multigen backends

# Convert Python to C (with advanced features)
multigen --target c convert my_script.py

# Convert Python to C++ (with STL support)
multigen --target cpp convert my_script.py

# Convert Python to Rust with build
multigen --target rust build my_script.py

# Convert Python to Go (with enhanced features)
multigen --target go convert my_script.py

# Convert Python to Haskell (with functional programming features)
multigen --target haskell convert my_script.py

# Convert Python to OCaml (with functional programming and pattern matching)
multigen --target ocaml convert my_script.py

# Batch convert all Python files
multigen --target cpp batch --source-dir ./examples

Backend Preferences

Customize code generation for each target language with the --prefer flag:

# Haskell with native comprehensions (idiomatic)
multigen --target haskell convert my_script.py --prefer use_native_comprehensions=true

# C with custom settings
multigen --target c convert my_script.py --prefer use_stc_containers=false --prefer indent_size=2

# C++ with modern features
multigen --target cpp convert my_script.py --prefer cpp_standard=c++20 --prefer use_modern_cpp=true

# Rust with specific edition
multigen --target rust convert my_script.py --prefer rust_edition=2018 --prefer clone_strategy=explicit

# Go with version targeting
multigen --target go convert my_script.py --prefer go_version=1.19 --prefer use_generics=false

# OCaml with functional programming preferences
multigen --target ocaml convert my_script.py --prefer prefer_immutable=true --prefer use_pattern_matching=true

# Multiple preferences
multigen --target haskell build my_script.py \
  --prefer use_native_comprehensions=true \
  --prefer camel_case_conversion=false \
  --prefer strict_data_types=true

Preference System

MultiGen features a preference system that allows you to choose between cross-language consistency (default) and language-specific idiomatic optimizations.

Design Philosophy

  • Default (Consistent): Uses runtime library functions for predictable behavior across all languages
  • Idiomatic (Optimized): Uses native language features for better performance and familiarity

Available Preference Categories

Backend Key Preferences Description
Haskell use_native_comprehensions, camel_case_conversion, strict_data_types Native vs runtime comprehensions, naming, type system
C use_stc_containers, brace_style, indent_size Container choice, code style, memory management
C++ cpp_standard, use_modern_cpp, use_stl_containers Language standard, modern features, STL usage
Rust rust_edition, clone_strategy, use_iterators Edition targeting, ownership patterns, functional style
Go go_version, use_generics, naming_convention Version compatibility, language features, Go idioms
OCaml prefer_immutable, use_pattern_matching, curried_functions Functional style, pattern matching, function curry style

Example: Haskell Comprehensions

Python Source:

def filter_numbers(numbers):
    return [x * 2 for x in numbers if x > 5]

Default (Runtime Consistency):

filterNumbers numbers = listComprehensionWithFilter numbers (\x -> x > 5) (\x -> x * 2)

Native (Idiomatic Haskell):

filterNumbers numbers = [x * 2 | x <- numbers, x > 5]

Example: OCaml Functional Programming

Python Source:

def process_items(items):
    return [item.upper() for item in items if len(item) > 3]

Default (Runtime Consistency):

let process_items items =
  list_comprehension_with_filter items (fun item -> len item > 3) (fun item -> upper item)

Functional (Idiomatic OCaml):

let process_items items =
  List.filter (fun item -> String.length item > 3) items
  |> List.map String.uppercase_ascii

For complete preference documentation, see PREFERENCES.md.

Examples

Simple Functions

Python Input:

def add(x: int, y: int) -> int:
    return x + y

def main() -> None:
    result = add(5, 3)
    print(result)

Generated C++:

#include <iostream>
#include <vector>
#include <unordered_map>
#include "runtime/multigen_cpp_runtime.hpp"

using namespace std;
using namespace multigen;

int add(int x, int y) {
    return (x + y);
}

void main() {
    int result = add(5, 3);
    cout << result << endl;
}

Generated C:

#include <stdio.h>
#include "multigen_runtime.h"

int add(int x, int y) {
    return (x + y);
}

void main() {
    int result = add(5, 3);
    printf("%d\n", result);
}

Generated Go:

package main

import "multigen"

func add(x int, y int) int {
    return (x + y)
}

func main() {
    result := add(5, 3)
    multigen.Print(result)
}

Generated Rust:

// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;

fn add(x: i32, y: i32) -> i32 {
    (x + y)
}

fn main() {
    let mut result = add(5, 3);
    print_value(result);
}

Generated Haskell:

module Main where

import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)

add :: Int -> Int -> Int
add x y = (x + y)

main :: IO ()
main = printValue (add 5 3)

Generated OCaml:

(* Generated OCaml code from Python *)

open Mgen_runtime

let add x y =
  (x + y)

let main () =
  let result = add 5 3 in
  print_value result

let () = print_value "Generated OCaml code executed successfully"

Advanced Features (Object-Oriented Programming)

Python Input:

class Calculator:
    def __init__(self, name: str):
        self.name: str = name
        self.total: int = 0

    def add(self, value: int) -> None:
        self.total += value

    def get_result(self) -> str:
        return self.name.upper() + ": " + str(self.total)

def process() -> list:
    calc = Calculator("math")
    calc.add(10)
    return [calc.get_result() for _ in range(2)]

Generated C++:

#include <iostream>
#include <string>
#include <vector>
#include "runtime/multigen_cpp_runtime.hpp"

using namespace std;
using namespace multigen;

class Calculator {
public:
    std::string name;
    int total;

    Calculator(std::string name) {
        this->name = name;
        this->total = 0;
    }

    void add(int value) {
        this->total += value;
    }

    std::string get_result() {
        return (StringOps::upper(this->name) + (": " + to_string(this->total)));
    }
};

std::vector<std::string> process() {
    Calculator calc("math");
    calc.add(10);
    return list_comprehension(Range(2), [&](auto _) {
        return calc.get_result();
    });
}

Generated Go:

package main

import "multigen"

type Calculator struct {
    Name string
    Total int
}

func NewCalculator(name string) Calculator {
    obj := Calculator{}
    obj.Name = name
    obj.Total = 0
    return obj
}

func (obj *Calculator) Add(value int) {
    obj.Total += value
}

func (obj *Calculator) GetResult() string {
    return (multigen.StrOps.Upper(obj.Name) + (": " + multigen.ToStr(obj.Total)))
}

func process() []interface{} {
    calc := NewCalculator("math")
    calc.Add(10)
    return multigen.Comprehensions.ListComprehension(multigen.NewRange(2), func(item interface{}) interface{} {
        _ := item.(int)
        return calc.GetResult()
    })
}

Generated Rust:

use std::collections::{HashMap, HashSet};

// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;

#[derive(Clone)]
struct Calculator {
    name: String,
    total: i32,
}

impl Calculator {
    fn new(name: String) -> Self {
        Calculator {
            name: name,
            total: 0,
        }
    }

    fn add(&mut self, value: i32) {
        self.total += value;
    }

    fn get_result(&mut self) -> String {
        ((StrOps::upper(&self.name) + ": ".to_string()) + to_string(self.total))
    }
}

fn process() -> Vec<String> {
    let mut calc = Calculator::new("math".to_string());
    calc.add(10);
    Comprehensions::list_comprehension(new_range(2).collect(), |_| calc.get_result())
}

Generated Haskell:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleInstances #-}

module Main where

import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)

data Calculator = Calculator
  { name :: String
  , total :: Int
  } deriving (Show, Eq)

newCalculator :: String -> Calculator
newCalculator name = Calculator { name = name, total = 0 }

add :: Calculator -> Int -> ()
add obj value = ()  -- Haskell immutable approach

getResult :: Calculator -> String
getResult obj = (upper (name obj)) + ": " + (toString (total obj))

process :: [String]
process =
  let calc = newCalculator "math"
  in listComprehension (rangeList (range 2)) (\_ -> getResult calc)

Generated OCaml:

(* Generated OCaml code from Python *)

open Mgen_runtime

type calculator = {
  name : string;
  total : int;
}

let create_calculator name =
  {
    name = name;
    total = 0;
  }

let calculator_add (calculator_obj : calculator) value =
  (* Functional update creating new record *)
  { calculator_obj with total = calculator_obj.total + value }

let calculator_get_result (calculator_obj : calculator) =
  (calculator_obj.name ^ ": " ^ string_of_int calculator_obj.total)

let process () =
  let calc = create_calculator "math" in
  let updated_calc = calculator_add calc 10 in
  list_comprehension (range_list (range 2)) (fun _ -> calculator_get_result updated_calc)

Architecture

MultiGen follows a clean, extensible architecture with well-defined components:

7-Phase Translation Pipeline

  1. Validation: Verify Python source compatibility
  2. Analysis: Analyze code structure and dependencies
  3. Python Optimization: Apply Python-level optimizations
  4. Mapping: Map Python constructs to target language equivalents
  5. Target Optimization: Apply target language-specific optimizations
  6. Generation: Generate target language code
  7. Build: Compile/build using target language toolchain

Frontend (Language-Agnostic)

  • Type Inference: Analyzes Python type annotations and infers types
  • Static Analysis: Validates code compatibility and detects unsupported features
  • AST Processing: Parses and transforms Python abstract syntax tree

Backends (Language-Specific)

Each backend implements abstract interfaces:

  • AbstractEmitter: Code generation for target language
  • AbstractFactory: Factory for backend components
  • AbstractBuilder: Build system integration
  • AbstractContainerSystem: Container and collection handling

Runtime Libraries (C Backend)

  • Error Handling (multigen_error_handling.h/.c): Python-like exception system
  • Memory Management (multigen_memory_ops.h/.c): Safe allocation and cleanup
  • Python Operations (multigen_python_ops.h/.c): Python built-ins and semantics
  • String Operations (multigen_string_ops.h/.c): String methods with memory safety
  • STC Integration (multigen_stc_bridge.h/.c): Smart Template Container bridge

CLI Commands

Convert

Convert Python files to target language:

multigen --target <language> convert <input.py>
multigen --target rust convert example.py

Build

Convert and compile/build the result:

multigen --target <language> build <input.py>
multigen --target go build --makefile example.py  # Generate build file
multigen --target c build example.py              # Direct compilation

Batch

Process multiple files:

multigen --target <language> batch --source-dir <dir>
multigen --target rust batch --source-dir ./src --build

Backends

List available language backends:

multigen backends

Clean

Clean build artifacts:

multigen clean

Development

Running Tests

make test           # Run all 1183 tests
make lint           # Run code linting with ruff
make type-check     # Run type checking with mypy

Test Organization

MultiGen maintains a test suite organized into focused modules:

  • test_backend_c_*.py: C backend tests (191 tests total)
    • Core functionality, OOP, comprehensions, string methods, runtime libraries
  • test_backend_cpp_*.py: C++ backend tests (104 tests)
    • STL integration, modern C++ features, OOP support
  • test_backend_rust_*.py: Rust backend tests (176 tests)
    • Ownership patterns, memory safety, standard library
  • test_backend_go_*.py: Go backend tests (95 tests)
    • Go idioms, standard library, concurrency patterns
  • test_backend_haskell_*.py: Haskell backend tests (93 tests)
    • Functional programming, type safety, comprehensions
  • test_backend_ocaml_*.py: OCaml backend tests (51 tests)
    • Functional programming, pattern matching, immutability
  • test_backend_llvm_*.py: LLVM backend tests (130 tests)
    • Native compilation, optimization levels, IR generation

Adding New Backends

To add support for a new target language:

  1. Create backend directory: src/multigen/backends/mylang/
  2. Implement required abstract interfaces:
    • MyLangBackend(LanguageBackend): Main backend class
    • MyLangFactory(AbstractFactory): Component factory
    • MyLangEmitter(AbstractEmitter): Code generation
    • MyLangBuilder(AbstractBuilder): Build system integration
    • MyLangContainerSystem(AbstractContainerSystem): Container handling
    • MyLangPreferences(BasePreferences): Language-specific preferences
  3. Register backend in src/multigen/backends/registry.py
  4. Add tests in tests/test_backend_mylang_*.py
  5. Update documentation

See existing backends (C, C++, Rust, Go, Haskell, OCaml, LLVM) for implementation examples.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Advanced Features

Supported Python Features

All backends support core Python features:

  • Object-Oriented Programming: Classes, methods, constructors, instance variables, method calls
  • Augmented Assignment: All operators (+=, -=, *=, /=, //=, %=, |=, ^=, &=, <<=, >>=)
  • String Operations: upper(), lower(), strip(), find(), replace(), split()
  • Comprehensions: List, dict, and set comprehensions with range iteration and conditional filtering
  • Control Structures: if/elif/else, while loops, for loops with range()
  • Built-in Functions: abs(), bool(), len(), min(), max(), sum()
  • Type Inference: Automatic type detection from annotations and assignments

Container Support by Language

  • C: STC (Smart Template Container) library with optimized C containers (864KB integrated library)
  • C++: STL containers (std::vector, std::unordered_map, std::unordered_set)
  • Rust: Standard library collections (Vec, HashMap, HashSet) with memory safety
  • Go: Standard library containers with idiomatic Go patterns
  • Haskell: Standard library containers with type-safe functional operations
  • OCaml: Standard library with immutable data structures and pattern matching

Test Coverage

MultiGen maintains test coverage ensuring translation accuracy:

  • 1183 total tests across all components and backends
  • 49/49 benchmarks passing (100%) across all 7 backends
  • Comprehensive backend coverage testing all major Python features
  • Test categories: basics, OOP, comprehensions, string methods, augmented assignment, control flow, integration
  • All tests passing with zero regressions (100%)

Development Roadmap

Completed Milestones

  • Multi-language backend system with C, C++, Rust, Go, Haskell, and OCaml support
  • Advanced C runtime integration with 50KB+ of runtime libraries
  • Sophisticated Python-to-C conversion with complete function and control flow support
  • Object-oriented programming support across all backends
  • Advanced Python language features: comprehensions, string methods, augmented assignment
  • Complete STC library integration (864KB Smart Template Container library)
  • Architecture consolidation with unified C backend module
  • Professional test organization with 821 tests in focused, single-responsibility files
  • Universal preference system with language-specific customization
  • Production-ready code generation with clean, efficient output
  • 7 production-ready backends (C++, C, Rust, Go, Haskell, OCaml, LLVM) with 100% benchmark success
  • Exception handling (try/except/raise) across all backends
  • Context managers (with statement) across all backends

Future Development

  • Advanced Frontend Analysis: Integrate optimization detection and static analysis engine
  • STC Performance Optimization: Container specialization and memory layout optimization
  • Formal Verification: Theorem proving and memory safety proofs integration
  • Cross-Language Runtime: Extend runtime concepts to other backends (C++, Rust, Go)
  • Performance Benchmarking: Comprehensive performance analysis across all target languages
  • IDE Integration: Language server protocol support for MultiGen syntax
  • Web Interface: Online code conversion tool
  • Plugin System: External backend support and extensibility

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multigen-0.1.114.tar.gz (802.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multigen-0.1.114-py3-none-any.whl (702.0 kB view details)

Uploaded Python 3

File details

Details for the file multigen-0.1.114.tar.gz.

File metadata

  • Download URL: multigen-0.1.114.tar.gz
  • Upload date:
  • Size: 802.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for multigen-0.1.114.tar.gz
Algorithm Hash digest
SHA256 03b300d4d3d0696e709770fac84251ec3b00d98100407af155ce405782010cc6
MD5 799b34cc46a0e65e2374068d94e1094c
BLAKE2b-256 3a6e10194bf99f86060f208b5fcbc8e70610e944a43c22de8ca92c6aaa0d2fa1

See more details on using hashes here.

File details

Details for the file multigen-0.1.114-py3-none-any.whl.

File metadata

  • Download URL: multigen-0.1.114-py3-none-any.whl
  • Upload date:
  • Size: 702.0 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for multigen-0.1.114-py3-none-any.whl
Algorithm Hash digest
SHA256 ff63c2c7e74a4af8659abb90cce0b2b10298dd50f10c33885ed19acf7e1bd20c
MD5 860a1fe5deefd8182b14fe99ee0914b5
BLAKE2b-256 43f45be9a2e6b13dc3ce3c28e003cdcba36b7de7ea74ff373ec0019cc57a7643

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page