Framework for code synthesis and AI4SE research
Project description
Synthegrator
Synthegrator is a framework for code generation problems. It simplifies the process of loading common datasets and solving them with language models.
Installation
pip install "synthegrator @ git+https://github.com/DaiseyCode/Synthegrator.git"
Also, for execution you will need to install docker.
Example
Let's take a look at an example of how we can run a solver over the HumanEval dataset, which collects 164 function synthesis problems.
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df
# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())
# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
# ^ Make sure to add your API key. See https://github.com/DNGros/lmwrapper
solver = LmCodeSolverAutoRegressive(lm)
# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
solver=solver,
problems=problems,
max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
evals,
pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
Architecture
Guiding Design Requirements
- DR-1 Support Diverse Datasets and Tasks. We want an architecture that can support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 Consistent & Efficient Execution. Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 Adaptable to State-of-the-Art Models. This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models that might do complex retrieval or reasoning
- DR-4 Maintainable. Try to follow best practices around automated testing and continuous integration.
Diagram
TODO, add docs walking through each component
Datasets and Solvers
docs TODO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
synthegrator-0.9.3.4.tar.gz
(121.5 kB
view details)
Built Distribution
File details
Details for the file synthegrator-0.9.3.4.tar.gz
.
File metadata
- Download URL: synthegrator-0.9.3.4.tar.gz
- Upload date:
- Size: 121.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d03a119d54dda983f0402af499fec228e7d07d299cf0befa9b96ff6fba368b7b |
|
MD5 | 80520d223eefe5d487adbe1ebd3d13e5 |
|
BLAKE2b-256 | 982381508dcdc2ef179391cdfc6ad809fe89a7aec2dd0c6e491fac0399aa31bd |
File details
Details for the file synthegrator-0.9.3.4-py3-none-any.whl
.
File metadata
- Download URL: synthegrator-0.9.3.4-py3-none-any.whl
- Upload date:
- Size: 149.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 55fcb5b984b76af8fc11280e985e121bf08108486f2b42677d289ab622b95096 |
|
MD5 | 19e194de28d9ad709128311c946990fb |
|
BLAKE2b-256 | 9bce8ea5a6b92281032cec47c5f45ff7951e72d62d28b197c2282fd15ca6ff08 |