Skip to main content

Python-to-Many code translation

Project description

MultiGen: Multi-Language Code Generator

Documentation

MultiGen is a Python-to-multiple-languages code generator that translates Python code to C, C++, Rust, Go, Haskell, OCaml, and LLVM IR while preserving semantics and performance characteristics.

Overview

MultiGen extends the CGen (Python-to-C) project into a multi-language translation system with enhanced runtime libraries, code generation, and a clean backend architecture.

Key Features

  • Multi-Language Support: Generate code for C, C++, Rust, Go, Haskell, OCaml, and LLVM IR
  • Universal Preference System: Customize code generation for each backend with language-specific preferences
  • Advanced Python Support: Object-oriented programming, comprehensions, string methods, augmented assignment
  • Modern Libraries: C++ STL, Rust standard library, Go standard library, Haskell containers, OCaml standard library
  • Clean Architecture: Extensible backend system with abstract interfaces for adding new target languages
  • Type-Safe Generation: Leverages Python type annotations for accurate and safe code translation
  • Runtime Libraries: Enhanced C backend with 50KB+ runtime libraries providing Python-like semantics
  • CLI Interface: Simple command-line tool with conversion, building, validation (mgen check), and batch processing
  • Production-Ready: 1353 passing tests ensuring translation accuracy and code quality
  • LLVM Backend: Native compilation via LLVM IR with O0-O3 optimization levels

Supported Languages

Language Status Extension Build System Advanced Features Benchmarks
C Production .c Makefile / gcc OOP, STC containers, string methods, comprehensions 7/7 (100%)
C++ Production .cpp Makefile / g++ OOP, STL containers, string methods, comprehensions 7/7 (100%)
Rust Production .rs Cargo / rustc OOP, ownership-aware, string methods, comprehensions 7/7 (100%)
Go Production .go go.mod / go build OOP, defer pattern, string methods, comprehensions 7/7 (100%)
Haskell Production .hs Cabal / ghc Pure functional, comprehensions, type safety 7/7 (100%)
OCaml Production .ml dune / ocamlc Functional, pattern matching, mutable refs 7/7 (100%)
LLVM Production .ll llvmlite / clang Native compilation, O0-O3 optimization, multi-platform 7/7 (100%)

Benchmark Results

% make benchmark # ran on m1 macbook air 
================================================================================
BENCHMARK SUMMARY
================================================================================
Total: 7 benchmarks × 7 backends = 49 runs
Success: 49 | Failed: 0

Backend      Success  Compile (s)  Run (s)      Binary (KB)  LOC
--------------------------------------------------------------------------------
c            7/7       0.390        0.275189     94.9         76
cpp          7/7       0.435        0.251988     36.1         51
go           7/7       0.190        0.265097     2365.4       38
haskell      7/7       0.156        0.024035     19944.6      65
llvm         7/7       0.310        0.251354     49.0         321
ocaml        7/7       0.234        0.271373     826.3        27
rust         7/7       0.266        0.250707     443.0        37
===============================================================================

Quick Start

Installation

Install from pypi

pip install multigen

Install from source

git clone https://github.com/shakfu/multigen
cd multigen
pip install -e .

Optional Dependencies

MultiGen has zero required dependencies for core functionality (C, C++, Rust, Go, Haskell, OCaml backends). Optional features can be installed as needed:

# LLVM backend support (native compilation, WebAssembly)
pip install multigen[llvm]

# Z3 theorem prover (formal verification)
pip install multigen[z3]

# All optional dependencies
pip install multigen[all]

Basic Usage

# List available backends
multigen backends

# Convert Python to C (with advanced features)
multigen --target c convert my_script.py

# Convert Python to C++ (with STL support)
multigen --target cpp convert my_script.py

# Convert Python to Rust with build
multigen --target rust build my_script.py

# Convert Python to Go (with enhanced features)
multigen --target go convert my_script.py

# Convert Python to Haskell (with functional programming features)
multigen --target haskell convert my_script.py

# Convert Python to OCaml (with functional programming and pattern matching)
multigen --target ocaml convert my_script.py

# Batch convert all Python files
multigen --target cpp batch --source-dir ./examples

Backend Preferences

Customize code generation for each target language with the --prefer flag:

# Haskell with native comprehensions (idiomatic)
multigen --target haskell convert my_script.py --prefer use_native_comprehensions=true

# C with custom settings
multigen --target c convert my_script.py --prefer use_stc_containers=false --prefer indent_size=2

# C++ with modern features
multigen --target cpp convert my_script.py --prefer cpp_standard=c++20 --prefer use_modern_cpp=true

# Rust with specific edition
multigen --target rust convert my_script.py --prefer rust_edition=2018 --prefer clone_strategy=explicit

# Go with version targeting
multigen --target go convert my_script.py --prefer go_version=1.19 --prefer use_generics=false

# OCaml with functional programming preferences
multigen --target ocaml convert my_script.py --prefer prefer_immutable=true --prefer use_pattern_matching=true

# Multiple preferences
multigen --target haskell build my_script.py \
  --prefer use_native_comprehensions=true \
  --prefer camel_case_conversion=false \
  --prefer strict_data_types=true

Preference System

MultiGen features a preference system that allows you to choose between cross-language consistency (default) and language-specific idiomatic optimizations.

Design Philosophy

  • Default (Consistent): Uses runtime library functions for predictable behavior across all languages
  • Idiomatic (Optimized): Uses native language features for better performance and familiarity

Available Preference Categories

Backend Key Preferences Description
Haskell use_native_comprehensions, camel_case_conversion, strict_data_types Native vs runtime comprehensions, naming, type system
C use_stc_containers, brace_style, indent_size Container choice, code style, memory management
C++ cpp_standard, use_modern_cpp, use_stl_containers Language standard, modern features, STL usage
Rust rust_edition, clone_strategy, use_iterators Edition targeting, ownership patterns, functional style
Go go_version, use_generics, naming_convention Version compatibility, language features, Go idioms
OCaml prefer_immutable, use_pattern_matching, curried_functions Functional style, pattern matching, function curry style

Example: Haskell Comprehensions

Python Source:

def filter_numbers(numbers):
    return [x * 2 for x in numbers if x > 5]

Default (Runtime Consistency):

filterNumbers numbers = listComprehensionWithFilter numbers (\x -> x > 5) (\x -> x * 2)

Native (Idiomatic Haskell):

filterNumbers numbers = [x * 2 | x <- numbers, x > 5]

Example: OCaml Functional Programming

Python Source:

def process_items(items):
    return [item.upper() for item in items if len(item) > 3]

Default (Runtime Consistency):

let process_items items =
  list_comprehension_with_filter items (fun item -> len item > 3) (fun item -> upper item)

Functional (Idiomatic OCaml):

let process_items items =
  List.filter (fun item -> String.length item > 3) items
  |> List.map String.uppercase_ascii

For complete preference documentation, see PREFERENCES.md.

Examples

Simple Functions

Python Input:

def add(x: int, y: int) -> int:
    return x + y

def main() -> None:
    result = add(5, 3)
    print(result)

Generated C++:

#include <iostream>
#include <vector>
#include <unordered_map>
#include "runtime/multigen_cpp_runtime.hpp"

using namespace std;
using namespace multigen;

int add(int x, int y) {
    return (x + y);
}

void main() {
    int result = add(5, 3);
    cout << result << endl;
}

Generated C:

#include <stdio.h>
#include "multigen_runtime.h"

int add(int x, int y) {
    return (x + y);
}

void main() {
    int result = add(5, 3);
    printf("%d\n", result);
}

Generated Go:

package main

import "multigen"

func add(x int, y int) int {
    return (x + y)
}

func main() {
    result := add(5, 3)
    multigen.Print(result)
}

Generated Rust:

// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;

fn add(x: i32, y: i32) -> i32 {
    (x + y)
}

fn main() {
    let mut result = add(5, 3);
    print_value(result);
}

Generated Haskell:

module Main where

import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)

add :: Int -> Int -> Int
add x y = (x + y)

main :: IO ()
main = printValue (add 5 3)

Generated OCaml:

(* Generated OCaml code from Python *)

open Mgen_runtime

let add x y =
  (x + y)

let main () =
  let result = add 5 3 in
  print_value result

let () = print_value "Generated OCaml code executed successfully"

Advanced Features (Object-Oriented Programming)

Python Input:

class Calculator:
    def __init__(self, name: str):
        self.name: str = name
        self.total: int = 0

    def add(self, value: int) -> None:
        self.total += value

    def get_result(self) -> str:
        return self.name.upper() + ": " + str(self.total)

def process() -> list:
    calc = Calculator("math")
    calc.add(10)
    return [calc.get_result() for _ in range(2)]

Generated C++:

#include <iostream>
#include <string>
#include <vector>
#include "runtime/multigen_cpp_runtime.hpp"

using namespace std;
using namespace multigen;

class Calculator {
public:
    std::string name;
    int total;

    Calculator(std::string name) {
        this->name = name;
        this->total = 0;
    }

    void add(int value) {
        this->total += value;
    }

    std::string get_result() {
        return (StringOps::upper(this->name) + (": " + to_string(this->total)));
    }
};

std::vector<std::string> process() {
    Calculator calc("math");
    calc.add(10);
    return list_comprehension(Range(2), [&](auto _) {
        return calc.get_result();
    });
}

Generated Go:

package main

import "multigen"

type Calculator struct {
    Name string
    Total int
}

func NewCalculator(name string) Calculator {
    obj := Calculator{}
    obj.Name = name
    obj.Total = 0
    return obj
}

func (obj *Calculator) Add(value int) {
    obj.Total += value
}

func (obj *Calculator) GetResult() string {
    return (multigen.StrOps.Upper(obj.Name) + (": " + multigen.ToStr(obj.Total)))
}

func process() []interface{} {
    calc := NewCalculator("math")
    calc.Add(10)
    return multigen.Comprehensions.ListComprehension(multigen.NewRange(2), func(item interface{}) interface{} {
        _ := item.(int)
        return calc.GetResult()
    })
}

Generated Rust:

use std::collections::{HashMap, HashSet};

// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;

#[derive(Clone)]
struct Calculator {
    name: String,
    total: i32,
}

impl Calculator {
    fn new(name: String) -> Self {
        Calculator {
            name: name,
            total: 0,
        }
    }

    fn add(&mut self, value: i32) {
        self.total += value;
    }

    fn get_result(&mut self) -> String {
        ((StrOps::upper(&self.name) + ": ".to_string()) + to_string(self.total))
    }
}

fn process() -> Vec<String> {
    let mut calc = Calculator::new("math".to_string());
    calc.add(10);
    Comprehensions::list_comprehension(new_range(2).collect(), |_| calc.get_result())
}

Generated Haskell:

{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleInstances #-}

module Main where

import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)

data Calculator = Calculator
  { name :: String
  , total :: Int
  } deriving (Show, Eq)

newCalculator :: String -> Calculator
newCalculator name = Calculator { name = name, total = 0 }

add :: Calculator -> Int -> ()
add obj value = ()  -- Haskell immutable approach

getResult :: Calculator -> String
getResult obj = (upper (name obj)) + ": " + (toString (total obj))

process :: [String]
process =
  let calc = newCalculator "math"
  in listComprehension (rangeList (range 2)) (\_ -> getResult calc)

Generated OCaml:

(* Generated OCaml code from Python *)

open Mgen_runtime

type calculator = {
  name : string;
  total : int;
}

let create_calculator name =
  {
    name = name;
    total = 0;
  }

let calculator_add (calculator_obj : calculator) value =
  (* Functional update creating new record *)
  { calculator_obj with total = calculator_obj.total + value }

let calculator_get_result (calculator_obj : calculator) =
  (calculator_obj.name ^ ": " ^ string_of_int calculator_obj.total)

let process () =
  let calc = create_calculator "math" in
  let updated_calc = calculator_add calc 10 in
  list_comprehension (range_list (range 2)) (fun _ -> calculator_get_result updated_calc)

Architecture

MultiGen follows a clean, extensible architecture with well-defined components:

7-Phase Translation Pipeline

  1. Validation: Verify Python source compatibility
  2. Analysis: Analyze code structure and dependencies
  3. Python Optimization: Apply Python-level optimizations
  4. Mapping: Map Python constructs to target language equivalents
  5. Target Optimization: Apply target language-specific optimizations
  6. Generation: Generate target language code
  7. Build: Compile/build using target language toolchain

Frontend (Language-Agnostic)

  • Type Inference: Analyzes Python type annotations and infers types
  • Static Analysis: Validates code compatibility and detects unsupported features
  • AST Processing: Parses and transforms Python abstract syntax tree

Backends (Language-Specific)

Each backend implements abstract interfaces:

  • AbstractEmitter: Code generation for target language
  • AbstractFactory: Factory for backend components
  • AbstractBuilder: Build system integration
  • AbstractContainerSystem: Container and collection handling

Runtime Libraries (C Backend)

  • Error Handling (multigen_error_handling.h/.c): Python-like exception system
  • Memory Management (multigen_memory_ops.h/.c): Safe allocation and cleanup
  • Python Operations (multigen_python_ops.h/.c): Python built-ins and semantics
  • String Operations (multigen_string_ops.h/.c): String methods with memory safety
  • STC Integration (multigen_stc_bridge.h/.c): Smart Template Container bridge

CLI Commands

Convert

Convert Python files to target language:

multigen --target <language> convert <input.py>
multigen --target rust convert example.py

Build

Convert and compile/build the result:

multigen --target <language> build <input.py>
multigen --target go build --makefile example.py  # Generate build file
multigen --target c build example.py              # Direct compilation

Batch

Process multiple files:

multigen --target <language> batch --source-dir <dir>
multigen --target rust batch --source-dir ./src --build

Backends

List available language backends:

multigen backends

Check

Validate Python files against the supported subset without converting:

multigen check my_script.py              # Validate a file
multigen check --report my_script.py     # Full feature support report
multigen check file1.py file2.py         # Validate multiple files

Clean

Clean build artifacts:

multigen clean

Development

Running Tests

make test           # Run all 1353 tests
make lint           # Run code linting with ruff
make typecheck      # Run type checking with mypy

Test Organization

MultiGen maintains a test suite organized into focused modules:

  • test_backend_c_*.py: C backend tests (191 tests total)
    • Core functionality, OOP, comprehensions, string methods, runtime libraries
  • test_backend_cpp_*.py: C++ backend tests (104 tests)
    • STL integration, modern C++ features, OOP support
  • test_backend_rust_*.py: Rust backend tests (176 tests)
    • Ownership patterns, memory safety, standard library
  • test_backend_go_*.py: Go backend tests (95 tests)
    • Go idioms, standard library, concurrency patterns
  • test_backend_haskell_*.py: Haskell backend tests (93 tests)
    • Functional programming, type safety, comprehensions
  • test_backend_ocaml_*.py: OCaml backend tests (51 tests)
    • Functional programming, pattern matching, immutability
  • test_backend_llvm_*.py: LLVM backend tests (130 tests)
    • Native compilation, optimization levels, IR generation

Adding New Backends

To add support for a new target language:

  1. Create backend directory: src/multigen/backends/mylang/
  2. Implement required abstract interfaces:
    • MyLangBackend(LanguageBackend): Main backend class
    • MyLangFactory(AbstractFactory): Component factory
    • MyLangEmitter(AbstractEmitter): Code generation
    • MyLangBuilder(AbstractBuilder): Build system integration
    • MyLangContainerSystem(AbstractContainerSystem): Container handling
    • MyLangPreferences(BasePreferences): Language-specific preferences
  3. Register backend in src/multigen/backends/registry.py
  4. Add tests in tests/test_backend_mylang_*.py
  5. Update documentation

See existing backends (C, C++, Rust, Go, Haskell, OCaml, LLVM) for implementation examples.

Contributing

  1. Fork the repository
  2. Create a feature branch
  3. Add tests for new functionality
  4. Ensure all tests pass
  5. Submit a pull request

License

MIT License - see LICENSE file for details.

Advanced Features

Supported Python Features

All backends support core Python features:

  • Object-Oriented Programming: Classes, methods, constructors, instance variables, method calls
  • Augmented Assignment: All operators (+=, -=, *=, /=, //=, %=, |=, ^=, &=, <<=, >>=)
  • String Operations: upper(), lower(), strip(), find(), replace(), split()
  • Comprehensions: List, dict, and set comprehensions with range iteration and conditional filtering
  • Control Structures: if/elif/else, while loops, for loops with range()
  • Built-in Functions: abs(), bool(), len(), min(), max(), sum()
  • Type Inference: Automatic type detection from annotations and assignments
  • Slicing: List slicing (arr[1:3], arr[1:], arr[:2]) and string slicing (s[1:3])
  • F-String Format Specs: f"{x:.2f}", f"{n:x}", f"{n:d}" with precision and radix formatting
  • Exception Handling: try/except/else/finally, raise, 6 exception types
  • Context Managers: with open(...) as f: for file I/O
  • Generators: yield, yield from, generator expressions (eager collection)

Container Support by Language

  • C: STC (Smart Template Container) library with optimized C containers (864KB integrated library)
  • C++: STL containers (std::vector, std::unordered_map, std::unordered_set)
  • Rust: Standard library collections (Vec, HashMap, HashSet) with memory safety
  • Go: Standard library containers with idiomatic Go patterns
  • Haskell: Standard library containers with type-safe functional operations
  • OCaml: Standard library with immutable data structures and pattern matching

Test Coverage

MultiGen maintains test coverage ensuring translation accuracy:

  • 1353 total tests across all components and backends
  • 49/49 benchmarks passing (100%) across all 7 backends
  • Comprehensive backend coverage testing all major Python features
  • Test categories: basics, OOP, comprehensions, string methods, augmented assignment, control flow, integration, exception handling, context managers, generators, slicing, f-string format specs
  • All tests passing with zero regressions (100%)

Development Roadmap

Completed Milestones

  • Multi-language backend system with C, C++, Rust, Go, Haskell, and OCaml support
  • Advanced C runtime integration with 50KB+ of runtime libraries
  • Sophisticated Python-to-C conversion with complete function and control flow support
  • Object-oriented programming support across all backends
  • Advanced Python language features: comprehensions, string methods, augmented assignment
  • Complete STC library integration (864KB Smart Template Container library)
  • Architecture consolidation with unified C backend module
  • Professional test organization with 1353 tests in focused, single-responsibility files
  • Universal preference system with language-specific customization
  • Production-ready code generation with clean, efficient output
  • 7 production-ready backends (C++, C, Rust, Go, Haskell, OCaml, LLVM) with 100% benchmark success
  • Exception handling (try/except/else/finally/raise) across all backends
  • Context managers (with statement) across all backends
  • Generator/yield support (eager collection) across all backends
  • List and string slicing across 6/7 backends
  • F-string format specifications across all backends
  • mgen check CLI command for validation without conversion

Future Development

  • Advanced Frontend Analysis: Integrate optimization detection and static analysis engine
  • STC Performance Optimization: Container specialization and memory layout optimization
  • Formal Verification: Theorem proving and memory safety proofs integration
  • Cross-Language Runtime: Extend runtime concepts to other backends (C++, Rust, Go)
  • Performance Benchmarking: Comprehensive performance analysis across all target languages
  • IDE Integration: Language server protocol support for MultiGen syntax
  • Web Interface: Online code conversion tool
  • Plugin System: External backend support and extensibility

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

multigen-0.1.116.tar.gz (813.8 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

multigen-0.1.116-py3-none-any.whl (706.9 kB view details)

Uploaded Python 3

File details

Details for the file multigen-0.1.116.tar.gz.

File metadata

  • Download URL: multigen-0.1.116.tar.gz
  • Upload date:
  • Size: 813.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for multigen-0.1.116.tar.gz
Algorithm Hash digest
SHA256 51f2d1362140fdf0119c664966ac57c18cc69d6194db692941fcad203577d5ca
MD5 5debff3f04ead9e884374622a63d75d4
BLAKE2b-256 a311c4275ba792374eada196c1b7cea153d1e647f4a566da3ba18248499fcf8c

See more details on using hashes here.

File details

Details for the file multigen-0.1.116-py3-none-any.whl.

File metadata

  • Download URL: multigen-0.1.116-py3-none-any.whl
  • Upload date:
  • Size: 706.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.13.2

File hashes

Hashes for multigen-0.1.116-py3-none-any.whl
Algorithm Hash digest
SHA256 113bd59dd089883dbc546460ee83431be6e274212ea972f7f6cd4012792e6561
MD5 d64f275d998e04ad36c8f3978cabb9fd
BLAKE2b-256 0cfd4085fec25b4032edf03ceca4d2f7bfcbcdbda9fc8d36268f426198bea9bb

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page