Add your description here
Project description
open-llm-swe
Note: This README is outdated - Much of the improvements is now focused on the team and conversation aspect.
This project explores various concepts to enhance the performance of Large Language Model (LLM) agents in producing code for non-trivial codebases. We have identified several key challenges that LLMs face when tasked with software development.
Data Model
classDiagram
class LLMTool
class LLMAgent {
+List[LLMTool] tools
}
class LLMMessage {
}
class ConversationParticipant {
<<enumeration>>
Human
System
LLMAgent
}
class ConversationMessage {
+ConversationParticipant sender
+ConversationParticipant recipient
+String content
}
class Conversation {
+List[ConversationParticipant] participants
+List[ConversationMessage] conversation_history
}
class Team {
+List[LLMAgent] agents
+LLMAgent teamManager
+Workspace workspace
}
class Project {
+List[Team] teams
+String description
+Workspace workspace
}
LLMAgent "0..*" --> "0..*" LLMTool : has
ConversationMessage --> "2" ConversationParticipant : has
Conversation "1" --> "2..*" ConversationParticipant : has
Conversation "1" --> "0..*" ConversationMessage : has
Team "1" --> "1..*" LLMAgent : has
Team "1" --> "1" LLMAgent : has teamManager
Project "1" --> "1..*" Team : has
ConversationParticipant <|-- LLMAgent
ConversationParticipant <|-- Human
ConversationParticipant <|-- System
Identified Challenges
1. Limitations in Handling Large Codebases
LLMs are constrained by their context window, which limits the amount of information they can process at once. This presents significant challenges when dealing with large, complex codebases:
- Limited context window: LLMs can only reason within a finite amount of input text, making it difficult to understand extensive codebases in their entirety.
- Computational and financial costs: Even if we could feed an entire codebase into the context, it would incur substantial computational and financial expenses.
- Inefficient information retrieval: Basic retrieval-augmented generation (RAG) might not be sufficient for accurately finding and utilizing relevant code snippets. Using embeddings to retrieve fixed-size chunks might be too simplistic due to the limitations in embedding matching accuracy.
To mitigate these issues, we need to develop tools that allow LLMs to efficiently navigate large codebases, similar to how human developers use abstractions to work within their limited working memory. For instance:
- Providing only the required information to the LLM based on the task at hand (e.g., directory structure for deciding where to place a utility subroutine, or specific code sections for debugging).
- Creating sophisticated retrieval mechanisms that can accurately locate and present relevant code snippets to the LLM.
2. Inherent Limitations of LLM Behavior
Several aspects of LLM behavior can impede their effectiveness in software development tasks:
-
Overreliance on initial outputs: LLMs tend to provide quick initial responses, which may not always be the most accurate or well-thought-out solutions.
-
Reluctance to provide negative or empty answers: LLMs might struggle to admit when they don't have enough information or when no action is necessary.
-
Difficulty in maintaining long-term context: LLMs may struggle to keep track of information across multiple interactions or code sections, leading to inconsistencies in larger projects.
-
Lack of meta-cognitive abilities: LLMs might have difficulty assessing their own understanding and capabilities in relation to the specific codebase they're working on.
-
Uncertainty quantification: LLMs may struggle to accurately express their level of certainty about their understanding or proposals, potentially leading to overconfident or unreliable outputs.
Project Overview
Our approach divides the code generation process into two main phases:
- Requirements Gathering
- Code Generation
We incorporate several advanced concepts to improve the overall performance and reliability of the system.
Requirements Gathering Process
graph TD
A[User Input] --> B[PM Agent: Requirements Gathering]
B -->|Clarification Needed| C[User Feedback]
C --> B
B -->|Requirements Clear| D{Approach Selection}
D -->|TDD Approach| E[QA Agent: Test Case Generation]
D -->|Prototype Approach| F[Prototype Generator]
E & F --> G[User Review]
G -->|Approved| H[Code Generator Agent]
G -->|Revisions Needed| B
H --> I[Code Reviewer Agent]
I --> J[User Final Review]
J -->|Approved| K[Integration with Codebase]
J -->|Revisions Needed| H
The requirements gathering phase involves:
- PM Agent for clarifying and refining user requirements
- Approach selection between Test-Driven Development (TDD) and Prototype-First development
- User review and feedback loops
Advanced Code Generation System
graph TD
A[Code Generation Request] --> B[Orchestrator Agent]
B --> C[Context Retrieval System]
C --> D[Hierarchical Retriever]
C --> E[Adaptive Retriever]
D & E --> F[Context Synthesizer]
F --> G[Exploration Agent]
G --> H{Confidence Threshold Met?}
H -->|No| I[Additional Exploration]
I --> G
H -->|Yes| J[Code Generation Agent]
J --> K[Uncertainty Quantifier]
K --> L[Proposal Evaluator]
L -->|Meets Criteria| M[Code Integration Agent]
L -->|Doesn't Meet Criteria| N[Refinement Loop]
N --> J
M --> O[Final Output]
subgraph Modular Specialized Agents
P[Syntax Specialist]
Q[Logic Flow Specialist]
R[Optimization Specialist]
S[Security Specialist]
end
J -.-> P & Q & R & S
subgraph External Resources
T[Project Knowledge Base]
U[Coding Standards]
V[External APIs/Libraries]
end
T & U & V -.-> B & C & G & J & L
The code generation phase incorporates several concepts:
-
Hierarchical Retrieval: Efficiently manages large codebases by fetching context at different levels of abstraction.
-
Iterative Exploration: Ensures thorough understanding of the codebase before code generation begins.
-
Confidence Threshold: Prevents premature code generation by ensuring sufficient understanding of the task and context.
-
Explicit Uncertainty Quantification: Helps identify areas that may need human attention or further refinement.
-
Unknown State Handling: Improves the system's ability to acknowledge and work with incomplete information.
-
Modular Architecture: Utilizes specialized agents for different aspects of code generation, such as syntax, logic flow, optimization, and security.
-
Adaptive Retrieval: Improves relevance of retrieved information over time through learning.
-
Context Synthesis: Combines and summarizes retrieved information for efficient use by other components.
-
Proposal Evaluation: Assesses generated code against project requirements and standards.
-
Refinement Loop: Allows for iterative improvement of generated code based on evaluation results.
Getting Started
The project uses poetry as the python package manager and builder. It uses NPM as the package manager for JS.
Only Anthropic is supported as the backing LLM API for now, although support for the other LLM APIs can be added relatively easily if there is demand for it.
LLM Sonder
src is the python API behind the agent, conversation and team creation.
Installing packages:
poetry install
Creating a basic agent to play around with
poetry run python -m src.agents.basic_agent
Creating an LLM team
poetry run python -m src.agents.team_creator_agent
Agent Dashboard
agent_dashboard currently houses the Web UI for the project.
This is comprised of the following
- AgentTrace - The monitoring dashboard for generated messages
- Conversations - Creating conversations and messages using the web UI.
Installing packages:
cd agent_dashboard/frontend
npm install
Running the backend:
poetry run python -m agent_dashboard.backend.server
Running the frontend:
cd agent_dashboard/frontend
npm run dev
Contributing
Conventions
- All serializable objects expose
to_dictandfrom_dictmethods - The serialized object contain a
class_namekey to identify the original object
License
(Include license information here)
Contact
Feel free to contact me at zhuweiji1997@gmail.com or raise an issue on the github repo!
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file llm_team-0.1.0.tar.gz.
File metadata
- Download URL: llm_team-0.1.0.tar.gz
- Upload date:
- Size: 37.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.26
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
348bd1fc2e52409e71c0811849ed430512d3c1549db4f38577312e049e42a6f0
|
|
| MD5 |
b8291ea38985154209237c4e32be1768
|
|
| BLAKE2b-256 |
82b38b0197991527723231d6a62e4b5869f4e49d989e2c602c0137cabc90de84
|
File details
Details for the file llm_team-0.1.0-py3-none-any.whl.
File metadata
- Download URL: llm_team-0.1.0-py3-none-any.whl
- Upload date:
- Size: 44.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: uv/0.4.26
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
fcb75910071d23e90f0a595a437edcc8232d28f903d445c656db1486a28d2255
|
|
| MD5 |
fd65c0712197673f3c3719c96881cfb7
|
|
| BLAKE2b-256 |
75919132da15d022e95d02ac24f4a8a23d9ac11fd0b366fe44c10d547c30e344
|