TestTeller : A versatile RAG AI agent for generating test cases from project documentation (PRDs, Contracts, Design Docs, etc.) and project code, leveraging LLMs.
Project description
TestTeller RAG Agent
TestTeller RAG Agent is a versatile CLI-based RAG (Retrieval Augmented Generation) agent designed to generate software test cases. It leverages Google's Gemini LLM and ChromaDB as a vector store. The agent can process various input sources, including PRD documentation, API contracts, technical design documents (HLD/LLD), and code from GitHub repositories or local folders.
The agent aims to produce both:
- Technical Test Cases: Focusing on individual components, APIs, and system architecture.
- User Journey Test Cases: Driven by customer-backward scenarios and end-to-end flows.
Features
- Multi-Source Ingestion:
- Documents:
.docx,.pdf,.xlsx,.txt,.md - Code: Clones public/private GitHub repositories or reads from local folders (supports various programming languages via file extensions).
- Documents:
- RAG Pipeline:
- Uses Google Gemini for generating embeddings and for text generation.
- Utilizes ChromaDB for efficient similarity search and retrieval of relevant context.
- Text chunking for effective processing of large documents and code files.
- Comprehensive Test Case Generation:
- Generates both technical component-level and user-journey-driven test cases.
- Prompt-engineered to guide the LLM for specific test case formats and considerations.
- Command-Line Interface (CLI):
- User-friendly CLI built with Typer for all operations (ingestion, generation, status, clearing data).
Project Structure
testteller-rag-agent/
├── main.py # CLI entry point
├── agent.py # Core RAG Agent logic
├── config.py # Configuration (Pydantic settings)
├── data_ingestion/
│ ├── __init__.py
│ ├── document_loader.py # Handles .docx, .pdf, .xlsx, .txt
│ ├── code_loader.py # Handles GitHub/local code loading
│ └── text_splitter.py # Text chunking logic
├── vector_store/
│ ├── __init__.py
│ └── chromadb_manager.py # ChromaDB interactions
├── llm/
│ ├── __init__.py
│ └── gemini_client.py # Gemini LLM and embedding interactions
├── prompts.py # Prompt templates for test case generation
├── utils/
│ ├── __init__.py
│ ├── helpers.py # Logging setup
│ └── retry_utils.py # Tenacity retry decorators
├── .env.example # Example .env file
├── .env # Local environment configurations (Gitignored)
├── requirements.txt # Python dependencies
└── README.md # This file
Prerequisites
- Python 3.9+
- Access to Google Gemini API (requires an API key from Google AI Studio).
- (Optional) GitHub Personal Access Token (PAT) if you intend to clone private repositories. The token needs
reposcope.
Setup
-
Clone the Repository (if applicable):
git clone <your-repo-url> cd testteller_rag_agent
-
Create and Activate a Virtual Environment:
python -m venv venv # On macOS/Linux: source venv/bin/activate # On Windows: # venv\Scripts\activate
-
Install Dependencies:
pip install -r requirements.txt
-
Configure Environment Variables:
- Copy the
.env.examplefile to.env:cp .env.example .env
- Edit the
.envfile and add your Google Gemini API key:GOOGLE_API_KEY="YOUR_GEMINI_API_KEY" # Optional: For private GitHub repos # GITHUB_TOKEN="YOUR_GITHUB_PAT" # You can override other default settings from config.py here: # LOG_LEVEL="DEBUG" # For more verbose logging # CHROMA_DB_PATH="./my_vector_data" # DEFAULT_COLLECTION_NAME="my_project_kb"
- Important: Replace
"YOUR_GEMINI_API_KEY"with your actual key.
- Copy the
Usage (CLI)
The main interface to the agent is through main.py.
python main.py --help
1. Ingesting Data
You need to ingest relevant documents and code into a ChromaDB collection before generating test cases.
Ingest Documents:
- From a directory (processes all supported files recursively):
python main.py ingest-docs ./path/to/your/documents/ --collection-name project_alpha_docs
- From a single document file:
python main.py ingest-docs ./path/to/your/prd.pdf --collection-name project_alpha_docs
Ingest Code:
- From a GitHub repository:
python main.py ingest-code https://github.com/owner/repo.git --collection-name project_alpha_code
- From a local code folder:
python main.py ingest-code ./path/to/your/local_codebase/ --collection-name project_alpha_code
- To prevent deletion of a cloned GitHub repository after ingestion (useful for debugging):
python main.py ingest-code https://github.com/owner/repo.git --collection-name project_alpha_code --no-cleanup-github
Note: You can use the same collection name for both documents and code, or separate them.
2. Generating Test Cases
Once data is ingested, you can ask the agent to generate test cases.
python main.py generate "Generate test cases for the user login feature based on the PRD and API docs." --collection-name project_alpha_docs
# Specify number of retrieved context documents and output file
python main.py generate "Create API tests for the /users endpoint considering success and failure scenarios." \
--collection-name project_alpha_code \
--num-retrieved 7 \
--output-file user_api_tests.md
3. Checking Collection Status
To see how many items are in a specific collection:
python main.py status --collection-name project_alpha_docs
4. Clearing Data
To remove all data from a collection and associated temporary files (like cloned repos):
# Will ask for confirmation
python main.py clear-data --collection-name project_alpha_docs
# Force clear without confirmation
python main.py clear-data --collection-name project_alpha_docs --force
Configuration
Key configurations can be managed via:
- The
.envfile for sensitive keys and common overrides. - The
config.pyfile for default values and Pydantic settings schema.
Refer to config.py for all available settings (e.g., chunk size, model names, log level).
Logging
- Logs are output to the console.
- Log format can be set to
text(default) orjsonvia theLOG_FORMATenvironment variable or inconfig.py. JSON logs are recommended for production environments for easier parsing by log management systems. - Log level can be controlled by
LOG_LEVEL(e.g.,INFO,DEBUG).
Troubleshooting
TypeError: Expected str, not <class 'pydantic.types.SecretStr'>:- Ensure your
GOOGLE_API_KEYis correctly set in.env. - Make sure
llm/gemini_client.pyis callingsettings.google_api_key.get_secret_value()when configuringgenai. - Delete all
__pycache__directories and*.pycfiles in your project and try again.
- Ensure your
TypeError: BaseEventLoop.run_in_executor() got an unexpected keyword argument '...':- This usually means
functools.partialwas not used correctly to bind arguments for functions run in the thread executor. Ensure the wrapper methods (like_run_collection_methodinchromadb_manager.pyor the pattern ingemini_client.py) correctly bind all keyword arguments to the target function.
- This usually means
- Authentication Issues with GitHub:
- Ensure your
GITHUB_TOKEN(if used) has the correctreposcope for private repositories. - For public repositories, no token is usually needed.
- Consider setting up SSH keys for Git if HTTPS token authentication is problematic.
- Ensure your
- ChromaDB Issues:
- Ensure the
CHROMA_DB_PATHis writable. - If you encounter persistent issues, try deleting the ChromaDB storage directory and re-ingesting.
- Ensure the
- Gemini API Errors:
- Check your API key and ensure it has the necessary permissions.
- If you hit rate limits, consider implementing exponential backoff or retry logic in your calls.
- Ensure the
google_genaipackage is up-to-date.
- Document Ingestion Issues:
- Ensure the file formats are supported and not corrupted.
- For large documents, consider increasing the
CHUNK_SIZEinconfig.py. - If you encounter memory issues, try processing smaller batches of files.
- Code Ingestion Issues:
- Ensure the local folder or GitHub repository is accessible.
- For large codebases, consider increasing the
CHUNK_SIZEor processing files in smaller batches. - If cloning a GitHub repo fails, check your network connection and GitHub access permissions.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file testteller-0.1.0.tar.gz.
File metadata
- Download URL: testteller-0.1.0.tar.gz
- Upload date:
- Size: 30.0 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
3f3bc556187a2f123c41d18ee0709e74816749b6409347ff11050bdd880513a5
|
|
| MD5 |
af6d38ca17a7007396f676d7da051c57
|
|
| BLAKE2b-256 |
4f58873a9b803af84ed25981217e971fea2af52c1ce781a2035c34d6e72a1938
|
File details
Details for the file testteller-0.1.0-py3-none-any.whl.
File metadata
- Download URL: testteller-0.1.0-py3-none-any.whl
- Upload date:
- Size: 30.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.13.3
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
c9f92c52763c6f1be22f3683b626d4328499fbb89be090d712fef22b7b3dff46
|
|
| MD5 |
a14db4c439890e94e9d201752805bcc8
|
|
| BLAKE2b-256 |
1d95804bce45d3c090befa46b8c1498891643688c27cb9e459a9984f1f6a7e85
|