Skip to main content

Framework for code synthesis and AI4SE research

Project description

Synthegrator

Synthegrator is a framework for code generation problems. It simplifies the process of loading common datasets and solving them with language models.

Installation

pip install "synthegrator @ git+https://github.com/DaiseyCode/Synthegrator.git"

Also, for execution you will need to install docker.

Example

Let's take a look at an example of how we can run a solver over the HumanEval dataset, which collects 164 function synthesis problems.

# Imports
from lmwrapper.openai_wrapper import get_open_ai_lm, OpenAiModelNames
from synthegrator.code_solver import LmCodeSolverAutoRegressive
from synthegrator.execution_threading import solve_and_evaluate_problems
from synthegrator.synthdatasets.human_eval import yield_human_eval
from synthegrator.df_converters import solution_evals_to_df

# Loading of a selection of AI4SE Datasets
problems = list(yield_human_eval())

# Create a solver that can solve a problem
lm = get_open_ai_lm(OpenAiModelNames.gpt_3_5_turbo_instruct)
#    ^ Make sure to add your API key. See https://github.com/DNGros/lmwrapper
solver = LmCodeSolverAutoRegressive(lm)

# Generate code and execute problems testcases
evals = list(solve_and_evaluate_problems(
    solver=solver,
    problems=problems,
    max_threads_eval=4,
))
# Convert to a dataframe
df = solution_evals_to_df(
    evals, 
    pickle_gzip_whole_solution_eval=True
)
print("Fraction Passing", df.main_metric__is_success.mean())

Architecture

Guiding Design Requirements

  • DR-1 Support Diverse Datasets and Tasks. We want an architecture that can support a diverse tasks (including potentially complex, repository-level tasks).
  • DR-2 Consistent & Efficient Execution. Experiments often involve running LLM-generated code. We want this to be fast, efficient, and reasonably secure.
  • DR-3 Adaptable to State-of-the-Art Models. This includes models like those from OpenAI or on HuggingFace. Additionally be adaptable to models that might do complex retrieval or reasoning
  • DR-4 Maintainable. Try to follow best practices around automated testing and continuous integration.

Diagram

Alt synthegrator diagram

TODO, add docs walking through each component

Datasets and Solvers

docs TODO

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

synthegrator-0.9.3.5.tar.gz (121.5 kB view details)

Uploaded Source

Built Distribution

synthegrator-0.9.3.5-py3-none-any.whl (149.0 kB view details)

Uploaded Python 3

File details

Details for the file synthegrator-0.9.3.5.tar.gz.

File metadata

  • Download URL: synthegrator-0.9.3.5.tar.gz
  • Upload date:
  • Size: 121.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.0.0 CPython/3.10.14

File hashes

Hashes for synthegrator-0.9.3.5.tar.gz
Algorithm Hash digest
SHA256 d0163267cda63c8615b3e93d2b7241d31e4b26f1406ba9086d537d4cb70d8eb2
MD5 37725206e02437b7fb28437f49a2cf50
BLAKE2b-256 4b7bb74906dec69c3f0e3ef14fa04a6d9e2cb87946791be27d828b4c7c7c3918

See more details on using hashes here.

File details

Details for the file synthegrator-0.9.3.5-py3-none-any.whl.

File metadata

File hashes

Hashes for synthegrator-0.9.3.5-py3-none-any.whl
Algorithm Hash digest
SHA256 92436936c1b5475d630fedd35c5575001156b3d2094221d0d8c98e4e8d33b71f
MD5 c9a7fd2f8edcefe18286934d482230e2
BLAKE2b-256 4e801caf60789efc4712473a7fec054737866827d7b428a378f246c715646195

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page