Python-to-Many code translation
Project description
MultiGen: Multi-Language Code Generator
MultiGen is a Python-to-multiple-languages code generator that translates Python code to C, C++, Rust, Go, Haskell, OCaml, and LLVM IR while preserving semantics and performance characteristics.
Overview
MultiGen extends the CGen (Python-to-C) project into a multi-language translation system with enhanced runtime libraries, code generation, and a clean backend architecture.
Key Features
- Multi-Language Support: Generate code for C, C++, Rust, Go, Haskell, OCaml, and LLVM IR
- Universal Preference System: Customize code generation for each backend with language-specific preferences
- Advanced Python Support: Object-oriented programming, comprehensions, string methods, augmented assignment
- Modern Libraries: C++ STL, Rust standard library, Go standard library, Haskell containers, OCaml standard library
- Clean Architecture: Extensible backend system with abstract interfaces for adding new target languages
- Type-Safe Generation: Leverages Python type annotations for accurate and safe code translation
- Runtime Libraries: Enhanced C backend with 50KB+ runtime libraries providing Python-like semantics
- CLI Interface: Simple command-line tool with conversion, building, validation (
mgen check), and batch processing - Production-Ready: 1353 passing tests ensuring translation accuracy and code quality
- LLVM Backend: Native compilation via LLVM IR with O0-O3 optimization levels
Supported Languages
| Language | Status | Extension | Build System | Advanced Features | Benchmarks |
|---|---|---|---|---|---|
| C | Production | .c |
Makefile / gcc | OOP, STC containers, string methods, comprehensions | 7/7 (100%) |
| C++ | Production | .cpp |
Makefile / g++ | OOP, STL containers, string methods, comprehensions | 7/7 (100%) |
| Rust | Production | .rs |
Cargo / rustc | OOP, ownership-aware, string methods, comprehensions | 7/7 (100%) |
| Go | Production | .go |
go.mod / go build | OOP, defer pattern, string methods, comprehensions | 7/7 (100%) |
| Haskell | Production | .hs |
Cabal / ghc | Pure functional, comprehensions, type safety | 7/7 (100%) |
| OCaml | Production | .ml |
dune / ocamlc | Functional, pattern matching, mutable refs | 7/7 (100%) |
| LLVM | Production | .ll |
llvmlite / clang | Native compilation, O0-O3 optimization, multi-platform | 7/7 (100%) |
Benchmark Results
% make benchmark # ran on m1 macbook air
================================================================================
BENCHMARK SUMMARY
================================================================================
Total: 7 benchmarks × 7 backends = 49 runs
Success: 49 | Failed: 0
Backend Success Compile (s) Run (s) Binary (KB) LOC
--------------------------------------------------------------------------------
c 7/7 0.390 0.275189 94.9 76
cpp 7/7 0.435 0.251988 36.1 51
go 7/7 0.190 0.265097 2365.4 38
haskell 7/7 0.156 0.024035 19944.6 65
llvm 7/7 0.310 0.251354 49.0 321
ocaml 7/7 0.234 0.271373 826.3 27
rust 7/7 0.266 0.250707 443.0 37
===============================================================================
Quick Start
Installation
Install from pypi
pip install multigen
Install from source
git clone https://github.com/shakfu/multigen
cd multigen
pip install -e .
Optional Dependencies
MultiGen has zero required dependencies for core functionality (C, C++, Rust, Go, Haskell, OCaml backends). Optional features can be installed as needed:
# LLVM backend support (native compilation, WebAssembly)
pip install multigen[llvm]
# Z3 theorem prover (formal verification)
pip install multigen[z3]
# All optional dependencies
pip install multigen[all]
Basic Usage
# List available backends
multigen backends
# Convert Python to C (with advanced features)
multigen --target c convert my_script.py
# Convert Python to C++ (with STL support)
multigen --target cpp convert my_script.py
# Convert Python to Rust with build
multigen --target rust build my_script.py
# Convert Python to Go (with enhanced features)
multigen --target go convert my_script.py
# Convert Python to Haskell (with functional programming features)
multigen --target haskell convert my_script.py
# Convert Python to OCaml (with functional programming and pattern matching)
multigen --target ocaml convert my_script.py
# Batch convert all Python files
multigen --target cpp batch --source-dir ./examples
Backend Preferences
Customize code generation for each target language with the --prefer flag:
# Haskell with native comprehensions (idiomatic)
multigen --target haskell convert my_script.py --prefer use_native_comprehensions=true
# C with custom settings
multigen --target c convert my_script.py --prefer use_stc_containers=false --prefer indent_size=2
# C++ with modern features
multigen --target cpp convert my_script.py --prefer cpp_standard=c++20 --prefer use_modern_cpp=true
# Rust with specific edition
multigen --target rust convert my_script.py --prefer rust_edition=2018 --prefer clone_strategy=explicit
# Go with version targeting
multigen --target go convert my_script.py --prefer go_version=1.19 --prefer use_generics=false
# OCaml with functional programming preferences
multigen --target ocaml convert my_script.py --prefer prefer_immutable=true --prefer use_pattern_matching=true
# Multiple preferences
multigen --target haskell build my_script.py \
--prefer use_native_comprehensions=true \
--prefer camel_case_conversion=false \
--prefer strict_data_types=true
Preference System
MultiGen features a preference system that allows you to choose between cross-language consistency (default) and language-specific idiomatic optimizations.
Design Philosophy
- Default (Consistent): Uses runtime library functions for predictable behavior across all languages
- Idiomatic (Optimized): Uses native language features for better performance and familiarity
Available Preference Categories
| Backend | Key Preferences | Description |
|---|---|---|
| Haskell | use_native_comprehensions, camel_case_conversion, strict_data_types |
Native vs runtime comprehensions, naming, type system |
| C | use_stc_containers, brace_style, indent_size |
Container choice, code style, memory management |
| C++ | cpp_standard, use_modern_cpp, use_stl_containers |
Language standard, modern features, STL usage |
| Rust | rust_edition, clone_strategy, use_iterators |
Edition targeting, ownership patterns, functional style |
| Go | go_version, use_generics, naming_convention |
Version compatibility, language features, Go idioms |
| OCaml | prefer_immutable, use_pattern_matching, curried_functions |
Functional style, pattern matching, function curry style |
Example: Haskell Comprehensions
Python Source:
def filter_numbers(numbers):
return [x * 2 for x in numbers if x > 5]
Default (Runtime Consistency):
filterNumbers numbers = listComprehensionWithFilter numbers (\x -> x > 5) (\x -> x * 2)
Native (Idiomatic Haskell):
filterNumbers numbers = [x * 2 | x <- numbers, x > 5]
Example: OCaml Functional Programming
Python Source:
def process_items(items):
return [item.upper() for item in items if len(item) > 3]
Default (Runtime Consistency):
let process_items items =
list_comprehension_with_filter items (fun item -> len item > 3) (fun item -> upper item)
Functional (Idiomatic OCaml):
let process_items items =
List.filter (fun item -> String.length item > 3) items
|> List.map String.uppercase_ascii
For complete preference documentation, see PREFERENCES.md.
Examples
Simple Functions
Python Input:
def add(x: int, y: int) -> int:
return x + y
def main() -> None:
result = add(5, 3)
print(result)
Generated C++:
#include <iostream>
#include <vector>
#include <unordered_map>
#include "runtime/multigen_cpp_runtime.hpp"
using namespace std;
using namespace multigen;
int add(int x, int y) {
return (x + y);
}
void main() {
int result = add(5, 3);
cout << result << endl;
}
Generated C:
#include <stdio.h>
#include "multigen_runtime.h"
int add(int x, int y) {
return (x + y);
}
void main() {
int result = add(5, 3);
printf("%d\n", result);
}
Generated Go:
package main
import "multigen"
func add(x int, y int) int {
return (x + y)
}
func main() {
result := add(5, 3)
multigen.Print(result)
}
Generated Rust:
// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;
fn add(x: i32, y: i32) -> i32 {
(x + y)
}
fn main() {
let mut result = add(5, 3);
print_value(result);
}
Generated Haskell:
module Main where
import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)
add :: Int -> Int -> Int
add x y = (x + y)
main :: IO ()
main = printValue (add 5 3)
Generated OCaml:
(* Generated OCaml code from Python *)
open Mgen_runtime
let add x y =
(x + y)
let main () =
let result = add 5 3 in
print_value result
let () = print_value "Generated OCaml code executed successfully"
Advanced Features (Object-Oriented Programming)
Python Input:
class Calculator:
def __init__(self, name: str):
self.name: str = name
self.total: int = 0
def add(self, value: int) -> None:
self.total += value
def get_result(self) -> str:
return self.name.upper() + ": " + str(self.total)
def process() -> list:
calc = Calculator("math")
calc.add(10)
return [calc.get_result() for _ in range(2)]
Generated C++:
#include <iostream>
#include <string>
#include <vector>
#include "runtime/multigen_cpp_runtime.hpp"
using namespace std;
using namespace multigen;
class Calculator {
public:
std::string name;
int total;
Calculator(std::string name) {
this->name = name;
this->total = 0;
}
void add(int value) {
this->total += value;
}
std::string get_result() {
return (StringOps::upper(this->name) + (": " + to_string(this->total)));
}
};
std::vector<std::string> process() {
Calculator calc("math");
calc.add(10);
return list_comprehension(Range(2), [&](auto _) {
return calc.get_result();
});
}
Generated Go:
package main
import "multigen"
type Calculator struct {
Name string
Total int
}
func NewCalculator(name string) Calculator {
obj := Calculator{}
obj.Name = name
obj.Total = 0
return obj
}
func (obj *Calculator) Add(value int) {
obj.Total += value
}
func (obj *Calculator) GetResult() string {
return (multigen.StrOps.Upper(obj.Name) + (": " + multigen.ToStr(obj.Total)))
}
func process() []interface{} {
calc := NewCalculator("math")
calc.Add(10)
return multigen.Comprehensions.ListComprehension(multigen.NewRange(2), func(item interface{}) interface{} {
_ := item.(int)
return calc.GetResult()
})
}
Generated Rust:
use std::collections::{HashMap, HashSet};
// Include MultiGen Rust runtime
mod multigen_rust_runtime;
use multigen_rust_runtime::*;
#[derive(Clone)]
struct Calculator {
name: String,
total: i32,
}
impl Calculator {
fn new(name: String) -> Self {
Calculator {
name: name,
total: 0,
}
}
fn add(&mut self, value: i32) {
self.total += value;
}
fn get_result(&mut self) -> String {
((StrOps::upper(&self.name) + ": ".to_string()) + to_string(self.total))
}
}
fn process() -> Vec<String> {
let mut calc = Calculator::new("math".to_string());
calc.add(10);
Comprehensions::list_comprehension(new_range(2).collect(), |_| calc.get_result())
}
Generated Haskell:
{-# LANGUAGE OverloadedStrings #-}
{-# LANGUAGE FlexibleInstances #-}
module Main where
import MultiGenRuntime
import qualified Data.Map as Map
import qualified Data.Set as Set
import Data.Map (Map)
import Data.Set (Set)
data Calculator = Calculator
{ name :: String
, total :: Int
} deriving (Show, Eq)
newCalculator :: String -> Calculator
newCalculator name = Calculator { name = name, total = 0 }
add :: Calculator -> Int -> ()
add obj value = () -- Haskell immutable approach
getResult :: Calculator -> String
getResult obj = (upper (name obj)) + ": " + (toString (total obj))
process :: [String]
process =
let calc = newCalculator "math"
in listComprehension (rangeList (range 2)) (\_ -> getResult calc)
Generated OCaml:
(* Generated OCaml code from Python *)
open Mgen_runtime
type calculator = {
name : string;
total : int;
}
let create_calculator name =
{
name = name;
total = 0;
}
let calculator_add (calculator_obj : calculator) value =
(* Functional update creating new record *)
{ calculator_obj with total = calculator_obj.total + value }
let calculator_get_result (calculator_obj : calculator) =
(calculator_obj.name ^ ": " ^ string_of_int calculator_obj.total)
let process () =
let calc = create_calculator "math" in
let updated_calc = calculator_add calc 10 in
list_comprehension (range_list (range 2)) (fun _ -> calculator_get_result updated_calc)
Architecture
MultiGen follows a clean, extensible architecture with well-defined components:
7-Phase Translation Pipeline
- Validation: Verify Python source compatibility
- Analysis: Analyze code structure and dependencies
- Python Optimization: Apply Python-level optimizations
- Mapping: Map Python constructs to target language equivalents
- Target Optimization: Apply target language-specific optimizations
- Generation: Generate target language code
- Build: Compile/build using target language toolchain
Frontend (Language-Agnostic)
- Type Inference: Analyzes Python type annotations and infers types
- Static Analysis: Validates code compatibility and detects unsupported features
- AST Processing: Parses and transforms Python abstract syntax tree
Backends (Language-Specific)
Each backend implements abstract interfaces:
- AbstractEmitter: Code generation for target language
- AbstractFactory: Factory for backend components
- AbstractBuilder: Build system integration
- AbstractContainerSystem: Container and collection handling
Runtime Libraries (C Backend)
- Error Handling (
multigen_error_handling.h/.c): Python-like exception system - Memory Management (
multigen_memory_ops.h/.c): Safe allocation and cleanup - Python Operations (
multigen_python_ops.h/.c): Python built-ins and semantics - String Operations (
multigen_string_ops.h/.c): String methods with memory safety - STC Integration (
multigen_stc_bridge.h/.c): Smart Template Container bridge
CLI Commands
Convert
Convert Python files to target language:
multigen --target <language> convert <input.py>
multigen --target rust convert example.py
Build
Convert and compile/build the result:
multigen --target <language> build <input.py>
multigen --target go build --makefile example.py # Generate build file
multigen --target c build example.py # Direct compilation
Batch
Process multiple files:
multigen --target <language> batch --source-dir <dir>
multigen --target rust batch --source-dir ./src --build
Backends
List available language backends:
multigen backends
Check
Validate Python files against the supported subset without converting:
multigen check my_script.py # Validate a file
multigen check --report my_script.py # Full feature support report
multigen check file1.py file2.py # Validate multiple files
Clean
Clean build artifacts:
multigen clean
Development
Running Tests
make test # Run all 1353 tests
make lint # Run code linting with ruff
make typecheck # Run type checking with mypy
Test Organization
MultiGen maintains a test suite organized into focused modules:
test_backend_c_*.py: C backend tests (191 tests total)- Core functionality, OOP, comprehensions, string methods, runtime libraries
test_backend_cpp_*.py: C++ backend tests (104 tests)- STL integration, modern C++ features, OOP support
test_backend_rust_*.py: Rust backend tests (176 tests)- Ownership patterns, memory safety, standard library
test_backend_go_*.py: Go backend tests (95 tests)- Go idioms, standard library, concurrency patterns
test_backend_haskell_*.py: Haskell backend tests (93 tests)- Functional programming, type safety, comprehensions
test_backend_ocaml_*.py: OCaml backend tests (51 tests)- Functional programming, pattern matching, immutability
test_backend_llvm_*.py: LLVM backend tests (130 tests)- Native compilation, optimization levels, IR generation
Adding New Backends
To add support for a new target language:
- Create backend directory:
src/multigen/backends/mylang/ - Implement required abstract interfaces:
MyLangBackend(LanguageBackend): Main backend classMyLangFactory(AbstractFactory): Component factoryMyLangEmitter(AbstractEmitter): Code generationMyLangBuilder(AbstractBuilder): Build system integrationMyLangContainerSystem(AbstractContainerSystem): Container handlingMyLangPreferences(BasePreferences): Language-specific preferences
- Register backend in
src/multigen/backends/registry.py - Add tests in
tests/test_backend_mylang_*.py - Update documentation
See existing backends (C, C++, Rust, Go, Haskell, OCaml, LLVM) for implementation examples.
Contributing
- Fork the repository
- Create a feature branch
- Add tests for new functionality
- Ensure all tests pass
- Submit a pull request
License
MIT License - see LICENSE file for details.
Advanced Features
Supported Python Features
All backends support core Python features:
- Object-Oriented Programming: Classes, methods, constructors, instance variables, method calls
- Augmented Assignment: All operators (
+=,-=,*=,/=,//=,%=,|=,^=,&=,<<=,>>=) - String Operations:
upper(),lower(),strip(),find(),replace(),split() - Comprehensions: List, dict, and set comprehensions with range iteration and conditional filtering
- Control Structures: if/elif/else, while loops, for loops with range()
- Built-in Functions:
abs(),bool(),len(),min(),max(),sum() - Type Inference: Automatic type detection from annotations and assignments
- Slicing: List slicing (
arr[1:3],arr[1:],arr[:2]) and string slicing (s[1:3]) - F-String Format Specs:
f"{x:.2f}",f"{n:x}",f"{n:d}"with precision and radix formatting - Exception Handling:
try/except/else/finally,raise, 6 exception types - Context Managers:
with open(...) as f:for file I/O - Generators:
yield,yield from, generator expressions (eager collection)
Container Support by Language
- C: STC (Smart Template Container) library with optimized C containers (864KB integrated library)
- C++: STL containers (
std::vector,std::unordered_map,std::unordered_set) - Rust: Standard library collections (
Vec,HashMap,HashSet) with memory safety - Go: Standard library containers with idiomatic Go patterns
- Haskell: Standard library containers with type-safe functional operations
- OCaml: Standard library with immutable data structures and pattern matching
Test Coverage
MultiGen maintains test coverage ensuring translation accuracy:
- 1353 total tests across all components and backends
- 49/49 benchmarks passing (100%) across all 7 backends
- Comprehensive backend coverage testing all major Python features
- Test categories: basics, OOP, comprehensions, string methods, augmented assignment, control flow, integration, exception handling, context managers, generators, slicing, f-string format specs
- All tests passing with zero regressions (100%)
Development Roadmap
Completed Milestones
- Multi-language backend system with C, C++, Rust, Go, Haskell, and OCaml support
- Advanced C runtime integration with 50KB+ of runtime libraries
- Sophisticated Python-to-C conversion with complete function and control flow support
- Object-oriented programming support across all backends
- Advanced Python language features: comprehensions, string methods, augmented assignment
- Complete STC library integration (864KB Smart Template Container library)
- Architecture consolidation with unified C backend module
- Professional test organization with 1353 tests in focused, single-responsibility files
- Universal preference system with language-specific customization
- Production-ready code generation with clean, efficient output
- 7 production-ready backends (C++, C, Rust, Go, Haskell, OCaml, LLVM) with 100% benchmark success
- Exception handling (try/except/else/finally/raise) across all backends
- Context managers (with statement) across all backends
- Generator/yield support (eager collection) across all backends
- List and string slicing across 6/7 backends
- F-string format specifications across all backends
mgen checkCLI command for validation without conversion
Future Development
- Advanced Frontend Analysis: Integrate optimization detection and static analysis engine
- STC Performance Optimization: Container specialization and memory layout optimization
- Formal Verification: Theorem proving and memory safety proofs integration
- Cross-Language Runtime: Extend runtime concepts to other backends (C++, Rust, Go)
- Performance Benchmarking: Comprehensive performance analysis across all target languages
- IDE Integration: Language server protocol support for MultiGen syntax
- Web Interface: Online code conversion tool
- Plugin System: External backend support and extensibility
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file multigen-0.1.116.tar.gz.
File metadata
- Download URL: multigen-0.1.116.tar.gz
- Upload date:
- Size: 813.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
51f2d1362140fdf0119c664966ac57c18cc69d6194db692941fcad203577d5ca
|
|
| MD5 |
5debff3f04ead9e884374622a63d75d4
|
|
| BLAKE2b-256 |
a311c4275ba792374eada196c1b7cea153d1e647f4a566da3ba18248499fcf8c
|
File details
Details for the file multigen-0.1.116-py3-none-any.whl.
File metadata
- Download URL: multigen-0.1.116-py3-none-any.whl
- Upload date:
- Size: 706.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.2.0 CPython/3.13.2
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
113bd59dd089883dbc546460ee83431be6e274212ea972f7f6cd4012792e6561
|
|
| MD5 |
d64f275d998e04ad36c8f3978cabb9fd
|
|
| BLAKE2b-256 |
0cfd4085fec25b4032edf03ceca4d2f7bfcbcdbda9fc8d36268f426198bea9bb
|