Generate startup ideas grounded in real YC data using Retrieval-Augmented Generation (RAG).

These details have not been verified by PyPI

Project links

Project description

RAGVenture

RAGVenture is an intelligent startup idea generator powered by Retrieval-Augmented Generation (RAG). It helps entrepreneurs generate innovative startup ideas by learning from successful companies, combining the power of large language models with real-world startup data.

Why RAGVenture?

Traditional startup ideation tools either rely on expensive API calls or generate ideas without real-world context. RAGVenture solves this by:

Completely FREE: Runs entirely on your machine with no API costs - zero API keys required!
Smart Model Management: Automatically handles model deprecation and failures with intelligent fallback
Data-Driven: Learns from real startup data to ground suggestions in reality
Context-Aware: Understands patterns from successful startups
Intelligent: Uses RAG to combine LLM capabilities with precise information retrieval
Resilient: Works offline with local models when external APIs are unavailable
Production-Ready: 177 tests with comprehensive coverage, Docker runtime fixes, and monitoring

System Requirements

Python 3.11 or higher
8GB RAM minimum (16GB recommended)
2GB disk space for models and data
Operating Systems:
- Linux (recommended)
- macOS
- Windows (with WSL for best performance)

Quick Start

Installation:

# Clone the repository
git clone https://github.com/valginer0/RAGVenture.git
cd RAGVenture

# Create virtual environment
python -m venv .venv

# Activate virtual environment
# On Windows:
.venv\Scripts\activate
# On Unix or MacOS:
source .venv/bin/activate

# Install dependencies
pip install -r requirements.txt

# Install spaCy language model for market analysis
python -m spacy download en_core_web_sm

Environment Setup (Optional - system works completely FREE without any setup!):

# Optional: HuggingFace token for enhanced remote models (system works completely FREE without it)
export HUGGINGFACE_TOKEN="your-token-here"  # Get from huggingface.co

# Smart model management (enabled by default)
export RAG_SMART_MODELS=true
export RAG_MODEL_CHECK_INTERVAL=3600
export RAG_MODEL_TIMEOUT=60

# Optional: LangChain tracing (debugging)
export LANGCHAIN_TRACING_V2=true
export LANGCHAIN_ENDPOINT="https://api.smith.langchain.com"
export LANGCHAIN_API_KEY="your-langsmith-api-key"
export LANGCHAIN_PROJECT="your-project-name"

Generate Ideas:

# Generate 3 startup ideas in the AI domain
python -m rag_startups.cli generate-all "AI" --num-ideas 3

# Generate ideas without market analysis
python -m rag_startups.cli generate-all "fintech" --num-ideas 2 --no-market

# Check model health and status
python -m rag_startups.cli models status

# Use custom startup data file
python -m rag_startups.cli generate-all "education" --file custom_startups.json

Features & Capabilities

Core Features

Intelligent Idea Generation:
- Uses RAG to combine LLM knowledge with real startup data
- Generates contextually relevant and grounded ideas
- Provides structured output with problem, solution, and market analysis

Command-Line Interface

Commands:

generate-all: Generate startup ideas with market analysis
- Required argument: Topic or domain (e.g., "AI", "fintech")
- Options:
  - --num-ideas: Number of ideas (1-5, default: 1)
  - --file: Custom startup data file (default: yc_startups.json)
  - --market/--no-market: Include/exclude market analysis
  - --temperature: Model creativity (0.0-1.0)
  - --print-examples: Show relevant examples

Smart Model Management

Automatic Fallback: Falls back to local models when external APIs fail
Model Migration Intelligence: Handles model deprecation (e.g., Mistral v0.2→v0.3) automatically
Health Monitoring: Continuous model health checks and status reporting
Local Resilience: Works completely offline with local models
CLI Management: models command for status, testing, and diagnostics

Technical Features

Smart Analysis:
- Semantic search for relevant examples
- Automatic metadata extraction
- Pattern recognition from successful startups
Performance Optimized:
- One-time embedding generation (~22s)
- Fast idea generation (~0.5s per idea)
- Efficient data processing (~0.1s load time)
Production Quality:
- 31 comprehensive unit tests
- Automated code formatting
- Extensive error handling

Performance

Typical processing times on a standard machine:

Initial Setup: ~22s (one-time embedding generation)
Data Loading: ~0.1s
Idea Generation: ~0.5s per idea

Docker Support

For containerized deployment, we provide both CPU and GPU support.

Prerequisites

Docker and Docker Compose
For GPU support:
- NVIDIA GPU with CUDA
- NVIDIA Container Toolkit
- nvidia-docker2

Quick Start with Docker

# CPU Version (recommended - fully tested)
docker-compose up app-cpu

# GPU Version (with NVIDIA support)
docker-compose up app-gpu

# Run with custom data file
docker-compose run --rm app-cpu python -m rag_startups.cli generate-all fintech --num-ideas 1 --file /app/yc_startups.json

Docker Status: ✅ Production Ready - All runtime issues resolved, works end-to-end with real data.

Development Setup

Clone and setup:

git clone https://github.com/valginer0/RAGVenture.git
cd RAGVenture
python -m venv .venv
source .venv/bin/activate  # On Windows: .venv\Scripts\activate

Install development dependencies:

pip install -r requirements.txt
pre-commit install  # Sets up automatic code formatting

Run tests:

pytest tests/  # Should show 178 passing tests

Testing & Offline Policy

This project enforces fully offline, deterministic tests:

Tests block outbound HTTP(S) by default via an autouse fixture in tests/conftest.py that patches requests.sessions.Session.request.
Autouse fixtures also mock model-loading/network paths:
- huggingface_hub.model_info in rag_startups/cli.py preflight
- transformers.pipeline at all call sites (e.g., rag_startups.embed_master, rag_startups.core.rag_chain, CLI)
- huggingface_hub.InferenceClient and the bound imports used by rag_startups/idea_generator/generator.py
- rag_startups.embed_master.calculate_result is replaced with a deterministic helper during tests
Offline env vars are forced: HUGGINGFACE_HUB_OFFLINE=1, TRANSFORMERS_OFFLINE=1.
To explicitly allow network in a specific test, add the marker: @pytest.mark.allow_network.

Runtime (non-test) CLI runs are allowed to use the network and will honor your .env.

Data Requirements

RAGVenture works with startup data in JSON format. Two options:

Use YC Data (Recommended):

Download from Y Combinator

Convert CSV to JSON:

python -m rag_startups.data.convert_yc_data input.csv -o startups.json

Use Custom Data:
- Prepare JSON file with required fields
- See docs/data_format.md for schema

Troubleshooting

Embedding Generation Time:
- First run takes ~22s to generate embeddings
- Subsequent runs use cached embeddings
- GPU can significantly speed up this process
Common Issues:
- Missing HUGGINGFACE_TOKEN: Sign up at huggingface.co
- Memory errors: Reduce batch size with --max-lines
- GPU errors: Ensure CUDA toolkit is properly installed

Documentation

docs/api.md: API documentation
docs/examples.md: Usage examples
docs/data_format.md: Data schema
CONTRIBUTING.md: Development guidelines

Contributing

See CONTRIBUTING.md for development setup and guidelines.

License

This project is licensed under the MIT License - see LICENSE for details.

Startup Names and Legal Considerations

Name Generation

Each generated startup name includes a unique identifier (e.g., "TechStartup-x7y9z")
This identifier ensures technical uniqueness within the tool
The unique identifier is NOT a substitute for legal name verification

Important Notes for Users

Generated names are suggestions only
The uniqueness of a name at generation time does not guarantee its availability
Users must perform their own due diligence before using any name

Name Verification Resources

USPTO Trademark Database: https://www.uspto.gov/trademarks
State Business Registries
Domain Name Availability Tools
Professional Legal Counsel

Future Features

Name availability checking tool (planned)
Integration with business registry APIs

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

0.9.2

Aug 11, 2025

0.9.0

Aug 1, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

rag_startups-0.9.2.tar.gz (103.1 kB view details)

Uploaded Aug 11, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

rag_startups-0.9.2-py3-none-any.whl (78.3 kB view details)

Uploaded Aug 11, 2025 Python 3

File details

Details for the file rag_startups-0.9.2.tar.gz.

File metadata

Download URL: rag_startups-0.9.2.tar.gz
Upload date: Aug 11, 2025
Size: 103.1 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.10

File hashes

Hashes for rag_startups-0.9.2.tar.gz
Algorithm	Hash digest
SHA256	`5a0f648bada61650cfaebfcbe2c58e25807ae69b9e0423d778134e5eb7a7f5df`
MD5	`ecc2d8e0e9966e41b77599338f7b1334`
BLAKE2b-256	`998d89e30dca7fdbe0d5a6623bf82fc58b6025cc8fc751b45d12909bf04d348f`

See more details on using hashes here.

File details

Details for the file rag_startups-0.9.2-py3-none-any.whl.

File metadata

Download URL: rag_startups-0.9.2-py3-none-any.whl
Upload date: Aug 11, 2025
Size: 78.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.11.10

File hashes

Hashes for rag_startups-0.9.2-py3-none-any.whl
Algorithm	Hash digest
SHA256	`6d5a8c096d30820eaf20f85b4a931af6f890e89aa969bcc874e4b11a8a178fba`
MD5	`374c0d7e96286fd5d119dfa6a15e154d`
BLAKE2b-256	`25351db047d8ffb6aba49fbe2cd4a388f15eaaf1534f21f046f4b883c2f5cf79`

See more details on using hashes here.

rag-startups 0.9.2

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

RAGVenture

Why RAGVenture?

System Requirements

Quick Start

Features & Capabilities

Core Features

Command-Line Interface

Smart Model Management

Technical Features

Performance

Docker Support

Prerequisites

Quick Start with Docker

Development Setup

Testing & Offline Policy

Data Requirements

Troubleshooting

Documentation

Contributing

License

Startup Names and Legal Considerations

Name Generation

Important Notes for Users

Name Verification Resources

Future Features

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes