Add your description here

Project description

open-llm-swe

Note: This README is outdated - Much of the improvements is now focused on the team and conversation aspect.

This project explores various concepts to enhance the performance of Large Language Model (LLM) agents in producing code for non-trivial codebases. We have identified several key challenges that LLMs face when tasked with software development.

Data Model

memaid

classDiagram
class LLMTool

    class LLMAgent {
        +List[LLMTool] tools
    }

    class LLMMessage {
        
    }

    class ConversationParticipant {
        <<enumeration>>
        Human
        System
        LLMAgent
    }

    class ConversationMessage {
        +ConversationParticipant sender
        +ConversationParticipant recipient
        +String content
    }

    class Conversation {
        +List[ConversationParticipant] participants
        +List[ConversationMessage] conversation_history
    }

    class Team {
        +List[LLMAgent] agents
        +LLMAgent teamManager
        +Workspace workspace
    }

    class Project {
        +List[Team] teams
        +String description
        +Workspace workspace
    }

    LLMAgent "0..*" --> "0..*" LLMTool : has
    ConversationMessage --> "2" ConversationParticipant : has
    Conversation "1" --> "2..*" ConversationParticipant : has
    Conversation "1" --> "0..*" ConversationMessage : has
    Team "1" --> "1..*" LLMAgent : has
    Team "1" --> "1" LLMAgent : has teamManager
    Project "1" --> "1..*" Team : has
    ConversationParticipant <|-- LLMAgent
    ConversationParticipant <|-- Human
    ConversationParticipant <|-- System

Identified Challenges

1. Limitations in Handling Large Codebases

LLMs are constrained by their context window, which limits the amount of information they can process at once. This presents significant challenges when dealing with large, complex codebases:

Limited context window: LLMs can only reason within a finite amount of input text, making it difficult to understand extensive codebases in their entirety.
Computational and financial costs: Even if we could feed an entire codebase into the context, it would incur substantial computational and financial expenses.
Inefficient information retrieval: Basic retrieval-augmented generation (RAG) might not be sufficient for accurately finding and utilizing relevant code snippets. Using embeddings to retrieve fixed-size chunks might be too simplistic due to the limitations in embedding matching accuracy.

To mitigate these issues, we need to develop tools that allow LLMs to efficiently navigate large codebases, similar to how human developers use abstractions to work within their limited working memory. For instance:

Providing only the required information to the LLM based on the task at hand (e.g., directory structure for deciding where to place a utility subroutine, or specific code sections for debugging).
Creating sophisticated retrieval mechanisms that can accurately locate and present relevant code snippets to the LLM.

2. Inherent Limitations of LLM Behavior

Several aspects of LLM behavior can impede their effectiveness in software development tasks:

Overreliance on initial outputs: LLMs tend to provide quick initial responses, which may not always be the most accurate or well-thought-out solutions.
Reluctance to provide negative or empty answers: LLMs might struggle to admit when they don't have enough information or when no action is necessary.
Difficulty in maintaining long-term context: LLMs may struggle to keep track of information across multiple interactions or code sections, leading to inconsistencies in larger projects.
Lack of meta-cognitive abilities: LLMs might have difficulty assessing their own understanding and capabilities in relation to the specific codebase they're working on.
Uncertainty quantification: LLMs may struggle to accurately express their level of certainty about their understanding or proposals, potentially leading to overconfident or unreliable outputs.

Project Overview

Our approach divides the code generation process into two main phases:

Requirements Gathering
Code Generation

We incorporate several advanced concepts to improve the overall performance and reliability of the system.

Requirements Gathering Process

graph TD
A[User Input] --> B[PM Agent: Requirements Gathering]
B -->|Clarification Needed| C[User Feedback]
C --> B
B -->|Requirements Clear| D{Approach Selection}
D -->|TDD Approach| E[QA Agent: Test Case Generation]
D -->|Prototype Approach| F[Prototype Generator]
E & F --> G[User Review]
G -->|Approved| H[Code Generator Agent]
G -->|Revisions Needed| B
H --> I[Code Reviewer Agent]
I --> J[User Final Review]
J -->|Approved| K[Integration with Codebase]
J -->|Revisions Needed| H

The requirements gathering phase involves:

PM Agent for clarifying and refining user requirements
Approach selection between Test-Driven Development (TDD) and Prototype-First development
User review and feedback loops

Advanced Code Generation System

graph TD
A[Code Generation Request] --> B[Orchestrator Agent]
B --> C[Context Retrieval System]
C --> D[Hierarchical Retriever]
C --> E[Adaptive Retriever]
D & E --> F[Context Synthesizer]
F --> G[Exploration Agent]
G --> H{Confidence Threshold Met?}
H -->|No| I[Additional Exploration]
I --> G
H -->|Yes| J[Code Generation Agent]
J --> K[Uncertainty Quantifier]
K --> L[Proposal Evaluator]
L -->|Meets Criteria| M[Code Integration Agent]
L -->|Doesn't Meet Criteria| N[Refinement Loop]
N --> J
M --> O[Final Output]

    subgraph Modular Specialized Agents
    P[Syntax Specialist]
    Q[Logic Flow Specialist]
    R[Optimization Specialist]
    S[Security Specialist]
    end

    J -.-> P & Q & R & S

    subgraph External Resources
    T[Project Knowledge Base]
    U[Coding Standards]
    V[External APIs/Libraries]
    end

    T & U & V -.-> B & C & G & J & L

The code generation phase incorporates several concepts:

Hierarchical Retrieval: Efficiently manages large codebases by fetching context at different levels of abstraction.
Iterative Exploration: Ensures thorough understanding of the codebase before code generation begins.
Confidence Threshold: Prevents premature code generation by ensuring sufficient understanding of the task and context.
Explicit Uncertainty Quantification: Helps identify areas that may need human attention or further refinement.
Unknown State Handling: Improves the system's ability to acknowledge and work with incomplete information.
Modular Architecture: Utilizes specialized agents for different aspects of code generation, such as syntax, logic flow, optimization, and security.
Adaptive Retrieval: Improves relevance of retrieved information over time through learning.
Context Synthesis: Combines and summarizes retrieved information for efficient use by other components.
Proposal Evaluation: Assesses generated code against project requirements and standards.
Refinement Loop: Allows for iterative improvement of generated code based on evaluation results.

Getting Started

The project uses poetry as the python package manager and builder. It uses NPM as the package manager for JS.

Only Anthropic is supported as the backing LLM API for now, although support for the other LLM APIs can be added relatively easily if there is demand for it.

LLM Sonder

src is the python API behind the agent, conversation and team creation.

Installing packages:

poetry install

Creating a basic agent to play around with

poetry run python -m src.agents.basic_agent

Creating an LLM team

poetry run python -m src.agents.team_creator_agent

Agent Dashboard

agent_dashboard currently houses the Web UI for the project.

This is comprised of the following

AgentTrace - The monitoring dashboard for generated messages
Conversations - Creating conversations and messages using the web UI.

Installing packages:

cd agent_dashboard/frontend
npm install

Running the backend:

poetry run python -m agent_dashboard.backend.server

Running the frontend:

cd agent_dashboard/frontend
npm run dev

Contributing

Conventions

All serializable objects expose to_dict and from_dict methods
The serialized object contain a class_name key to identify the original object

License

(Include license information here)

Contact

Feel free to contact me at zhuweiji1997@gmail.com or raise an issue on the github repo!

Project details

Release history Release notifications | RSS feed

0.1.8

Nov 27, 2024

0.1.7

Nov 27, 2024

0.1.6

Nov 27, 2024

0.1.5

Nov 27, 2024

0.1.4

Nov 27, 2024

0.1.3

Nov 27, 2024

0.1.2

Nov 27, 2024

0.1.1

Nov 27, 2024

This version

0.1.0

Nov 3, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

llm_team-0.1.0.tar.gz (37.3 kB view details)

Uploaded Nov 3, 2024 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

llm_team-0.1.0-py3-none-any.whl (44.3 kB view details)

Uploaded Nov 3, 2024 Python 3

File details

Details for the file llm_team-0.1.0.tar.gz.

File metadata

Download URL: llm_team-0.1.0.tar.gz
Upload date: Nov 3, 2024
Size: 37.3 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.26

File hashes

Hashes for llm_team-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`348bd1fc2e52409e71c0811849ed430512d3c1549db4f38577312e049e42a6f0`
MD5	`b8291ea38985154209237c4e32be1768`
BLAKE2b-256	`82b38b0197991527723231d6a62e4b5869f4e49d989e2c602c0137cabc90de84`

See more details on using hashes here.

File details

Details for the file llm_team-0.1.0-py3-none-any.whl.

File metadata

Download URL: llm_team-0.1.0-py3-none-any.whl
Upload date: Nov 3, 2024
Size: 44.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: uv/0.4.26

File hashes

Hashes for llm_team-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`fcb75910071d23e90f0a595a437edcc8232d28f903d445c656db1486a28d2255`
MD5	`fd65c0712197673f3c3719c96881cfb7`
BLAKE2b-256	`75919132da15d022e95d02ac24f4a8a23d9ac11fd0b366fe44c10d547c30e344`

See more details on using hashes here.

llm-team 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Project description

open-llm-swe

Data Model

Identified Challenges

1. Limitations in Handling Large Codebases

2. Inherent Limitations of LLM Behavior

Project Overview

Requirements Gathering Process

Advanced Code Generation System

Getting Started

LLM Sonder

Agent Dashboard

Contributing

Conventions

License

Contact

Project details

Verified details

Maintainers

Unverified details

Meta

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes