Framework for code synthesis and AI4SE research
Project description
Synthegrator
Synthegrator is a framework for code generation problems. It simplifies the process of loading common datasets and solving them with language models.
Installation
pip install synthegrator
Also, for execution you will need to install docker.
Example
Let's take a look at an example of how we can run a solver over the HumanEval dataset, which collects 164 function synthesis problems.
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df
# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())
# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
# ^ Make sure to add your API key to OPENAI_API_KEY or a file.
# See https://github.com/DaiseyCode/lmwrapper for more.
solver = LmCodeSolverAutoRegressive(lm)
# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
solver=solver,
problems=problems,
max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
evals,
pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
Architecture
Guiding Design Requirements
- DR-1 Support Diverse Datasets and Tasks. We want an architecture that can support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 Consistent & Efficient Execution. Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 Adaptable to State-of-the-Art Models. This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models that might do complex retrieval or reasoning
- DR-4 Maintainable. Try to follow best practices around automated testing and continuous integration.
Diagram
TODO, add docs walking through each component
Datasets and Solvers
docs TODO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
synthegrator-0.9.5.1.tar.gz
(3.2 MB
view details)
Built Distribution
File details
Details for the file synthegrator-0.9.5.1.tar.gz
.
File metadata
- Download URL: synthegrator-0.9.5.1.tar.gz
- Upload date:
- Size: 3.2 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | e29fd8528f98a0398041f9dcdf2554f1bff56676070c3894edf7888335361e23 |
|
MD5 | b16f913e92a82220e39e95e7c116c264 |
|
BLAKE2b-256 | d42408c2fa3382cc0965c252b64be5f9463e3d7716fb0db3d82479bd5ce66700 |
File details
Details for the file synthegrator-0.9.5.1-py3-none-any.whl
.
File metadata
- Download URL: synthegrator-0.9.5.1-py3-none-any.whl
- Upload date:
- Size: 3.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7e09917b5c00c920797b96f5b8c355d1436712e6dedc34bf038b29ec01fa05d |
|
MD5 | e5f9d084337a7368057f89c3211d1ec8 |
|
BLAKE2b-256 | 30adac1e43cf694b63aefca22ab824a7baa031d34a55bf80feff510c746379a8 |