Code RL
Project description
code-rl: Reinforcement Learning for Code Generation
Overview
code-rl
is a Python package designed to train language models in code generation through reinforcement learning methods. It seamlessly integrates with the OpenAI Gym environment, offering a structured framework to assess the quality of code generated by these models. As of version 0.0.3, code-rl
exclusively supports the C programming language. Future iterations of the package aim to expand its capabilities to include support for Java and Golang.
Installation
To install code-rl
, run the following command:
pip install code-rl
Ensure that you have Python 3.x installed before installation.
Getting Started
Prerequisites
System support: Linux and Mac.
You need to have gcc 6+ or clang 11+ preinstalled
Basic Usage
Provide a simple example of how to use your package. For example:
from coderl import CodeCompilerEnv
# dummy model
def model(prompt):
return "int main(){return 0;}"
env = CodeCompilerEnv()
# Example of running a training episode
observation = "Give me a code to print hello world"
for _ in range(1000):
action = model(observation)
observation, reward, done, info = env.step(action)
if done:
break
# Update your prompt based on the stderr/stdout provided at info
print(reward)
print(info)
API Reference
CodeCompilerEnv
: The Gym environment for code evaluation.reset()
: Resets the environment to its initial state.step(action)
: Executes an action in the environment.
Examples
from coderl import CodeCompilerEnv
env = CodeCompilerEnv()
code = """
#include<stdio.h>
int main(){
printf("Hello World");
return 0;
}"""
result = env.step(code)
print(result) # (1, 1, True, {'stdout': 'Hello World'})
code = """
#include<stdio.h>
struct Point {
int x, y;
};
int main(){
if (1){
printf("Hello World");
}
struct Point p = { 1 }; // Not initializing y will throw error in -Wextra
printf("%d", p.x );
return 0;
}"""
result = env.step(code)
print(result) # (0, -1, True, {'stderr': 'temp_code.c: ...'})
code = """
#include<stdio.h>
int main(){
int x; // unused variable -Wall will trigger warning
printf("Hello World");
return 0;
}"""
result = env.step(code)
print(result) # (0, -2, True, {'stderr': 'temp_code.c:4:9: error: unused variable ‘x’ ...'})
code = """
#include<stdio.h>
void foo(void){
return;
}
int main(){
printf("Hello World");
foo();
}
"""
result = env.step(code)
print(result) # (0, -2, True, {'stderr': 'temp_code.c:4:9: error: unused variable ‘x’ ...'})
Contributing
We warmly welcome contributions to code-rl
and value your efforts to improve and expand this package. If you're interested in contributing, please start by forking the repository and submitting your changes through a pull request. We encourage you to adhere to established Python coding standards (PEP 8) for consistency. When submitting a pull request, please provide a clear description of the changes and any relevant issue numbers. We also recommend adding tests for new features to ensure reliability. For substantial changes, please open an issue first to discuss what you would like to change. Your contributions play a significant role in the development of code-rl
, and we look forward to collaborating with the community!
License
MIT License
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
File details
Details for the file code_rl-0.0.9-py3-none-any.whl
.
File metadata
- Download URL: code_rl-0.0.9-py3-none-any.whl
- Upload date:
- Size: 10.6 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 88cfc716dd43820e2bcf8f9cca09baedfabdf35611dd9bac7fb47d302ff6212b |
|
MD5 | c02feb468274c801d8d6279a93167bb4 |
|
BLAKE2b-256 | 295d307f65c3856d5ec3cef2016588efef84fb21a6940d98f64214876aaa77f9 |