Framework for code synthesis and AI4SE research
Project description
Synthegrator
Synthegrator is a framework for code generation problems. It simplifies the process of loading common datasets and solving them with language models.
Installation
pip install "synthegrator @ git+https://github.com/DaiseyCode/Synthegrator.git"
Also, for execution you will need to install docker.
Example
Let's take a look at an example of how we can run a solver over the HumanEval dataset, which collects 164 function synthesis problems.
# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df
# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())
# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
# ^ Make sure to add your API key. See https://github.com/DNGros/lmwrapper
solver = LmCodeSolverAutoRegressive(lm)
# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
solver=solver,
problems=problems,
max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
evals,
pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())
Architecture
Guiding Design Requirements
- DR-1 Support Diverse Datasets and Tasks. We want an architecture that can support a diverse tasks (including potentially complex, repository-level tasks).
- DR-2 Consistent & Efficient Execution. Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
- DR-3 Adaptable to State-of-the-Art Models. This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models that might do complex retrieval or reasoning
- DR-4 Maintainable. Try to follow best practices around automated testing and continuous integration.
Diagram
TODO, add docs walking through each component
Datasets and Solvers
docs TODO
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
synthegrator-0.9.3.5.tar.gz
(121.5 kB
view details)
Built Distribution
File details
Details for the file synthegrator-0.9.3.5.tar.gz
.
File metadata
- Download URL: synthegrator-0.9.3.5.tar.gz
- Upload date:
- Size: 121.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | d0163267cda63c8615b3e93d2b7241d31e4b26f1406ba9086d537d4cb70d8eb2 |
|
MD5 | 37725206e02437b7fb28437f49a2cf50 |
|
BLAKE2b-256 | 4b7bb74906dec69c3f0e3ef14fa04a6d9e2cb87946791be27d828b4c7c7c3918 |
File details
Details for the file synthegrator-0.9.3.5-py3-none-any.whl
.
File metadata
- Download URL: synthegrator-0.9.3.5-py3-none-any.whl
- Upload date:
- Size: 149.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.0.0 CPython/3.10.14
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 92436936c1b5475d630fedd35c5575001156b3d2094221d0d8c98e4e8d33b71f |
|
MD5 | c9a7fd2f8edcefe18286934d482230e2 |
|
BLAKE2b-256 | 4e801caf60789efc4712473a7fec054737866827d7b428a378f246c715646195 |