Skip to main content

Lumen: Intelligently prepares your codebase context for any LLM, solving context window limits with smart retrieval and providing deep project understanding.

Project description

💡 Lumen - Illuminate Your Codebase for AI

License: MIT PyPI version Python Version


The Context Challenge: Bridging Code and AI Understanding

Large Language Models (LLMs) offer transformative potential for software development – from debugging and refactoring to documentation and architectural analysis. However, their effectiveness is fundamentally limited by the context window: the amount of information they can process at one time.

Providing an LLM with the necessary context for a complex query about your project is a significant challenge:

  • Manual Effort: Copying and pasting file structures, code snippets, and dependencies for a large codebase is time-consuming and prone to errors.
  • Context Limits: Even with large context models, providing the entire codebase is often impossible, expensive, or leads to the "lost in the middle" problem where relevant information is overlooked amidst noise.
  • Lack of Structure: Simply dumping files doesn't help the AI understand the relationships between different parts of your project.

Introducing Lumen: Intelligent Code Context for Any LLM

Lumen is a command-line tool designed to solve the AI context problem. It scans your project, understands its structure, and intelligently selects and formats the most relevant code context for a given natural language query.

Unlike tools that aim to replace your coding environment or fully automate tasks, Lumen focuses on perfecting the input you provide to the AI. It empowers you to use any LLM (public APIs like Gemini, Claude, ChatGPT, or local models) with a comprehensive, yet focused, understanding of your specific codebase, enabling more accurate and insightful AI responses.

Stop struggling with context windows. Give your AI the precise information it needs, powered by Lumen.


Key Features

  • Intelligent Context Retrieval: Uses advanced techniques (like vector embeddings) to find and include only the code chunks most relevant to your specific question, overcoming context window limitations for large projects.
  • Clear Project Structure: Generates a JSON representation of your directory tree, providing the AI with essential architectural context.
  • Full or Focused Context Modes: Choose between providing the entire project content (for smaller projects or general overview) or using intelligent search for query-specific context.
  • Highly Customizable: Configure which folders and files are included or skipped, control output formatting, and adjust indexing parameters.
  • Private & Secure: Operates 100% locally on your machine for local projects. Your code content is never sent to external services during context generation or indexing.
  • Flexible Output: Copies the generated prompt to your clipboard or saves it to a text file in your project directory.
  • GitHub Repository Support: Analyze public GitHub repositories directly by providing a URL. Lumen handles temporary cloning and cleanup.

Prerequisites

Before installing Lumen, ensure you have the following installed and correctly configured on your system. Lumen is a Python tool and relies on standard development environments.

  1. Python (3.7 or higher):

    • How to Check: Open your terminal or command prompt and type python --version or python3 --version.
    • Installation & Environment Setup:
      • Windows: Download the installer from python.org. Crucially, during installation, ensure you check the box that says "Add Python to PATH". This makes python and pip commands available from any terminal window. If you missed this, you might need to reinstall or manually add Python to your system's Environment Variables.
      • macOS: Python 3 is often pre-installed or easily available via Homebrew (brew install python). Ensure the Homebrew bin directory is in your PATH (usually set up automatically). You can verify Python and Pip availability by opening a new terminal window after installation.
      • Linux (Debian/Ubuntu):
        sudo apt update
        sudo apt install python3 python3-pip
        
      • Linux (Fedora/CentOS/RHEL):
        sudo dnf install python3 python3-pip
        # or
        sudo yum install python3 python3-pip
        
      • Ensure python3 and pip3 (or symlinks like python and pip) are in your PATH. Installing via package managers typically handles this.
    • Pip: Python's package installer. It's usually installed with Python 3.7+.
      • How to Check: Type pip --version or pip3 --version.
      • How to Upgrade (Recommended): python -m pip install --upgrade pip or python3 -m pip install --upgrade pip.
  2. Git: (Required only if you plan to use the GitHub repository feature (-g flag)).

    • How to Check: Type git --version.
    • Installation:
      • Windows: Download from git-scm.com. Follow the installer steps, ensuring Git is added to your PATH (a default option).
      • macOS: Easiest via Homebrew: brew install git. Or download from git-scm.com. Command Line Tools for Xcode also include Git.
      • Linux: Use your distribution's package manager (as shown for Python, but replace python with git).

Installation

Install Lumen easily using pip:

pip install lum


Usage

Lumen is primarily a command-line tool (lum).

1. Generate Full Context for Current Directory (Output to Clipboard): Navigate to your project's root directory in your terminal and run:

lum (This is the default behavior. The complete, structured prompt including structure and file contents is copied to your clipboard. Suitable for smaller projects or general overview.)

2. Generate Full Context for a Specific Project Path:

lum /path/to/your/project

3. Generate Intelligent, Query-Specific Context: (coming soon, in development for now !) For larger projects, provide a natural language query to get only the most relevant code chunks:

`lum --query "Explain how users are authenticated"`

This triggers the embedding-based retrieval.

  • Indexing: The first time you use --query on a project, Lumen will build a local index of your codebase (chunking files, generating embeddings). This takes time depending on project size and your hardware. Subsequent queries are fast.
  • Prompt: The generated prompt will include your query, the project structure, and the content of the top relevant code chunks found.

4. Force Re-indexing for Queries: (In dev too) If your code changes significantly after indexing, you might want to rebuild the index:

lum --query "check payment processing logic" --reindex This clears the old index for the project and builds a new one.

5. Control Number of Relevant Chunks (Query Mode): (In dev too) Specify how many top relevant code chunks to include in the prompt:

lum --query "What are the main API endpoints?" --top-k 20 Overrides the default set in the configuration.

6. Save Prompt to a Text File: Creates a .txt file in the analyzed project's root directory.

lum -t my_project_prompt (This will create my_project_prompt.txt)

7. Analyze a Public GitHub Repository: (Requires Git to be installed!)

lum -g https://github.com/user/public-repository-name (Lumen will clone the repo temporarily, generate the prompt (full dump or query-based if --query is also used), and then clean up the cloned repository.)

8. Customize Output (Hide Elements):

  • Hide the default introduction text:

lum -hd intro

  • Hide the --- File: path/to/file.py --- titles (Not Recommended - can confuse AI, automatically hidden in --query mode):

lum -hd title

  • Hide both:

lum -hd intro,title

9. Configure Lumen:

  • Open the configuration file (config.json) for editing:

lum -c (This opens the file in your default editor. The file is located in your user's configuration directory, e.g., ~/.ptap/config.json - this path might change to ~/.lumen/config.json in a future rename)

  • Reset the configuration file to its default settings:

lum -r


Configuration (~/.ptap/config.json or ~/.lumen/config.json)

You can customize Lumen's behavior by editing its configuration file (use lum -c to open it). Key options include:

  • intro_text: The default text prepended to every prompt. Modify it to suit your needs.
  • show_intro: true or false to always show/hide the intro text by default.
  • title_text: The format string for file/chunk titles (e.g., --- File: {file} ---). {file} is the placeholder for the relative path.
  • show_title: true or false to always show/hide file titles by default (ignored in --query mode for individual chunks).
  • skipped_folders: A list of folder names to completely ignore during scanning (e.g., .git, __pycache__, node_modules).
  • embedding_model_name: The name of the Sentence Transformer model to use for embeddings (e.g., "BAAI/bge-base-en-v1.5"). Choose a model suitable for code embeddings, often listed on the MTEB leaderboard.
  • max_chunk_tokens: The maximum number of tokens per code chunk during indexing. Ensure this is less than the chosen embedding model's maximum sequence length.
  • overlap_tokens: The number of tokens to overlap between consecutive chunks during indexing. This is crucial for ensuring relevant information isn't split between chunks.
  • search_top_k: The default number of top relevant chunks to retrieve when using --query.

This version of the configuration is abit too updated (in the future), do not take into account the last 4 variables (from embedding_model_name) please !

Future Objectives (Roadmap)

Lumen is under active development. Key areas for future focus include:

  • Advanced Chunking: Exploring language-aware or more intelligent chunking strategies beyond fixed-size blocks to better capture semantic units like functions and classes.
  • IDE Integrations: Developing extensions for popular IDEs (VS Code, JetBrains) to provide a more seamless workflow for generating context directly from your development environment.
  • Enhanced Relevance Tuning: Improving the embedding and retrieval process with techniques specifically tailored for code, potentially including code-specific embedding models.
  • Context Summarization: Exploring ways to provide high-level summaries of less critical files or sections to include alongside detailed code context.
  • Performance Optimizations: Continuously improving indexing and search speed, especially for very large repositories.
  • Broader Language Support: Ensuring robust handling of file types and structures across an even wider range of programming languages.
  • Team Features & Collaboration: Investigating features for teams to manage project profiles, share context, and collaborate on AI-assisted tasks.

Limitations

  • AI Interpretation: The quality of the AI's response still ultimately depends on the capabilities of the LLM you use.
  • Very Large Projects: While embeddings will help, extremely massive projects (millions of lines) may still present challenges in balancing context size and relevance.
  • File Types: Primarily designed for text-based source code and configuration files. Binary files or unusual encodings are not supported.

Contributing

Contributions, issues, and feature requests are welcome! Feel free to check the issues page or submit a pull request. Adherence to code quality and project goals is appreciated.


License

This project is licensed under the MIT License - see the LICENSE file for details.


Author

Developed by Far3k


Empower your AI with the context it needs. Install Lumen today.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pylumen-0.0.tar.gz (23.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

pylumen-0.0-py3-none-any.whl (16.1 kB view details)

Uploaded Python 3

File details

Details for the file pylumen-0.0.tar.gz.

File metadata

  • Download URL: pylumen-0.0.tar.gz
  • Upload date:
  • Size: 23.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pylumen-0.0.tar.gz
Algorithm Hash digest
SHA256 36717868479ea6e714574628fe20271f8958bbd9b4b650fad3a04ca0b7548e4a
MD5 68973efc50e2b02563c66ba91daff149
BLAKE2b-256 909c9dd9a1beb754a992ddbdb9188312dcb8570d360935f31221d6ebdf51bcb9

See more details on using hashes here.

File details

Details for the file pylumen-0.0-py3-none-any.whl.

File metadata

  • Download URL: pylumen-0.0-py3-none-any.whl
  • Upload date:
  • Size: 16.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for pylumen-0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 486965511a80c9339576f5bca3dd56a922e261f5e1aafb364c2377617c1db580
MD5 beab06defe4ad819f316b080e2670e0d
BLAKE2b-256 2ad9062be9b558a200429f1f88fdb14215175dbaad71b039be6516a0fc52d211

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page