Skip to main content

AI-powered Git commit grouping tool for semantic code organization

Project description

๐Ÿง  GroupIt

AI-powered Git commit grouping tool for semantic code organization

GroupIt revolutionizes how you manage Git commits by intelligently analyzing your code changes and automatically grouping related modifications into logical, semantic commits. Using advanced AI models and structural analysis, it transforms chaotic development workflows into clean, meaningful commit histories that tell the story of your code evolution.

Python 3.12+ License: MIT GitHub Issues

๐Ÿš€ Features

  • ๐ŸŽฏ Intelligent Grouping: AI-powered semantic analysis groups related changes across multiple files
  • ๐Ÿ” Multi-Language Support: Comprehensive support for 50+ programming languages and file types
  • ๐Ÿค– LLM Integration: Compatible with OpenAI GPT-4, Google Gemini, and local Ollama models
  • ๐Ÿ“Š Structural Analysis: DBSCAN clustering with architectural pattern recognition
  • ๐Ÿ’ฌ Conventional Commits: Automatic generation of conventional commit messages
  • ๐ŸŒŠ 4-Stage Pipeline: Primary grouping โ†’ Summary generation โ†’ Semantic grouping โ†’ Message generation
  • ๐ŸŽจ Beautiful CLI: Rich terminal interface with colored output and progress indicators
  • โšก Performance Optimized: Caching, parallel processing, and batch operations

๐Ÿ› ๏ธ Installation

from PyPI

pip install groupit

๐Ÿ”ง Configuration

LLM Provider Setup

GroupIt supports multiple LLM providers. Choose one based on your needs:

OpenAI (Recommended)

export OPENAI_API_KEY="your-api-key-here"

Google Gemini

export GEMINI_API_KEY="your-api-key-here"

Ollama (Local/Free)

# Install Ollama first: https://ollama.ai
ollama pull llama3.2  # or your preferred model
# No API key required for local models

Environment Variables

# LLM Configuration
export GROUPIT_LLM_PROVIDER="openai"  # openai, gemini, ollama
export GROUPIT_LLM_TEMPERATURE="0.3"

# Clustering Parameters
export GROUPIT_CLUSTERING_EPS="0.35"
export GROUPIT_CLUSTERING_MIN_SAMPLES="2"

# Performance Settings
export GROUPIT_ENABLE_CACHING="true"
export GROUPIT_MAX_WORKERS="4"

# Logging
export GROUPIT_LOG_LEVEL="INFO"
export GROUPIT_DEBUG="false"

๐Ÿ“– Usage

Quick Start

# Analyze staged changes with OpenAI
groupit analyze --staged --llm openai

# Analyze working directory with Gemini
groupit analyze --llm gemini --output results.json

# Use local Ollama (no API key needed)
groupit analyze --llm ollama --model llama3.2

# Create commits from analysis results
groupit commit results.json --execute

Command Reference

Analyze Changes

groupit analyze [OPTIONS]

Options:
  --staged              Analyze only staged changes
  --llm PROVIDER        LLM provider (openai, gemini, ollama)
  --api-key KEY         API key for LLM provider
  --model MODEL         Specific model to use
  --temperature TEMP    LLM temperature (0.0-2.0)
  --eps FLOAT           DBSCAN clustering epsilon
  --min-samples INT     DBSCAN minimum samples
  --output FILE         Save results to JSON file
  --verbose, -v         Enable verbose output
  --quiet, -q           Suppress non-essential output

Create Commits

groupit commit results.json [OPTIONS]

Options:
  --execute             Actually create commits (default: dry-run)
  --auto-confirm        Don't ask for confirmation
  --force               Force creation even if repo is dirty

Check Status

groupit status [OPTIONS]

Options:
  --json                Output in JSON format
  --detailed            Show detailed information

Validate Configuration

groupit validate [OPTIONS]

Options:
  --llm-provider PROVIDER  Validate specific provider
  --api-key KEY           API key to validate
  --fix                   Attempt to fix issues

๐Ÿ”„ How It Works

GroupIt uses a sophisticated 4-stage pipeline to transform your messy code changes into clean, semantic commits:

graph TB
    A[Code Changes] --> B[Stage 1: Primary Grouping]
    B --> C[Stage 2: Summary Generation]
    C --> D[Stage 3: Semantic Grouping]
    D --> E[Stage 4: Message Generation]
    E --> F[Final Commit Groups]

    B1[DBSCAN Clustering<br/>+ Structural Analysis<br/>+ Graph-based Similarity]
    C1[LLM-generated<br/>Natural Language<br/>Summaries]
    D1[AI-powered Semantic<br/>Analysis & Merging<br/>Data Flow Detection]
    E1[Conventional Commit<br/>Message Generation<br/>with Scope & Type]

    B -.-> B1
    C -.-> C1
    D -.-> D1
    E -.-> E1

    style A fill:#e1f5fe
    style F fill:#e8f5e8
    style B fill:#fff3e0
    style C fill:#f3e5f5
    style D fill:#e0f2f1
    style E fill:#fce4ec

Stage 1: Primary Grouping

  • DBSCAN clustering on code similarity vectors
  • Structural analysis of file relationships and imports
  • Architectural pattern recognition (page-component relationships, data flow)
  • Graph-based similarity using NetworkX for dependency analysis

Stage 2: Summary Generation

  • Natural language summaries of each group using LLM
  • Context extraction from file types, directories, and patterns
  • Fallback heuristics for robust operation without LLM

Stage 3: Semantic Grouping

  • Advanced LLM analysis for data flow patterns
  • Cross-group relationship detection
  • User journey mapping to identify related features
  • Intelligent merging of semantically related groups

Stage 4: Message Generation

  • Conventional commit message generation
  • Automatic scope detection from file paths and types
  • Type classification (feat, fix, refactor, docs, etc.)
  • Multi-line messages with detailed descriptions

๐Ÿ“ Language Support

GroupIt supports 50+ programming languages and file types:

Category Languages/Types
Frontend JavaScript, TypeScript, React (JSX/TSX), Vue, Svelte, HTML, CSS, SCSS/Sass
Backend Python, Java, Kotlin, Scala, C/C++, C#, Go, Rust, PHP, Ruby, Swift
Mobile iOS (Swift, Objective-C), Android (Java, Kotlin), React Native, Flutter
Infrastructure Docker, Kubernetes, Terraform, CI/CD (GitHub Actions, GitLab CI)
Data & Config JSON, YAML, TOML, XML, SQL, CSV, Environment files
Build Systems Maven, Gradle, npm/yarn, pip, Cargo, CMake, Makefile
Game Development Unity (C#), Unreal Engine, Godot (GDScript)
Blockchain Solidity, Vyper

๐ŸŽฏ Examples

Example 1: Feature Development

# You've been working on a user authentication feature
# Modified: login.tsx, auth.service.ts, user.model.ts, auth.test.ts

groupit analyze --staged --llm openai

Result:

Group 1: feat(auth): implement user login functionality
โ”œโ”€โ”€ login.tsx - Login form component
โ”œโ”€โ”€ auth.service.ts - Authentication service
โ””โ”€โ”€ user.model.ts - User data model

Group 2: test(auth): add authentication tests
โ””โ”€โ”€ auth.test.ts - Test suite for auth functionality

Example 2: Refactoring

# Refactored database layer across multiple files
groupit analyze --llm gemini --temperature 0.2

Result:

Group 1: refactor(database): modernize connection handling
โ”œโ”€โ”€ db/connection.py - Connection pool implementation
โ”œโ”€โ”€ db/models.py - Model base classes
โ””โ”€โ”€ config/database.yml - Database configuration

Group 2: refactor(database): optimize query performance
โ”œโ”€โ”€ repositories/user_repo.py - User queries
โ””โ”€โ”€ repositories/product_repo.py - Product queries

๐Ÿ”ง Advanced Configuration

Configuration File

Create groupit.json in your project root:

{
  "llm": {
    "provider": "openai",
    "model": "gpt-4",
    "temperature": 0.3,
    "timeout": 30
  },
  "clustering": {
    "eps": 0.35,
    "min_samples": 2,
    "alpha": 0.4,
    "max_iterations": 2
  },
  "performance": {
    "enable_caching": true,
    "max_workers": 4,
    "batch_size": 5
  },
  "logging": {
    "level": "INFO",
    "enable_file": false
  }
}

Use with:

groupit analyze --config groupit.json

Performance Tuning

For large repositories:

# Increase batch size for better throughput
groupit analyze --batch-size 10 --max-workers 8

# Reduce clustering sensitivity for fewer groups
groupit analyze --eps 0.5 --min-samples 3

# Use caching for repeated analysis
groupit analyze --no-caching=false

๐Ÿค Contributing

We welcome contributions! Please see our Contributing Guide for details.

Development Setup

git clone https://github.com/jarry3369/groupit.git
cd groupit
uv sync --dev
uv run pytest

Running Tests

# Run all tests
uv run pytest

# Run with coverage
uv run pytest --cov=groupit --cov-report=html

# Run specific test category
uv run pytest tests/unit/
uv run pytest tests/integration/

๐Ÿ“Š Performance

GroupIt is optimized for real-world development workflows:

  • Processing Speed: ~100-500 changes per minute
  • Memory Usage: < 512MB for typical repositories
  • API Efficiency: Batch processing minimizes LLM API calls
  • Caching: Intelligent caching reduces repeated analysis

Benchmarks on common scenarios:

  • Small changes (1-10 files): < 10 seconds
  • Medium refactoring (10-50 files): 30-60 seconds
  • Large feature (50+ files): 2-5 minutes

๐Ÿ› Troubleshooting

Common Issues

LLM API Key Issues:

# Verify your API key is set
groupit validate --llm-provider openai

# Test with a different provider
groupit analyze --llm ollama  # No API key needed

Clustering Problems:

# Adjust clustering sensitivity
groupit analyze --eps 0.5 --min-samples 1

# Enable debug mode for detailed logs
groupit analyze --debug

Performance Issues:

# Disable caching temporarily
groupit analyze --no-caching

# Reduce batch size
groupit analyze --batch-size 2

Getting Help

๐Ÿ“„ License

This project is licensed under the MIT License - see the LICENSE file for details.


Made with โค๏ธ by jarry3369

Transform your commit history from chaos to clarity with GroupIt.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

groupit-0.1.3.tar.gz (63.6 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

groupit-0.1.3-py3-none-any.whl (83.9 kB view details)

Uploaded Python 3

File details

Details for the file groupit-0.1.3.tar.gz.

File metadata

  • Download URL: groupit-0.1.3.tar.gz
  • Upload date:
  • Size: 63.6 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for groupit-0.1.3.tar.gz
Algorithm Hash digest
SHA256 384f0e2860b290f7091b9028e913b67746f9c9acfe4a4e712285e6572668c0a6
MD5 334854db5881e133222754fad6649143
BLAKE2b-256 6ddfcbd4090c62f10226183fc373753f88561e5f11d09ceeb0ce76cca9cd4a73

See more details on using hashes here.

Provenance

The following attestation bundles were made for groupit-0.1.3.tar.gz:

Publisher: publish.yml on jarry3369/groupit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file groupit-0.1.3-py3-none-any.whl.

File metadata

  • Download URL: groupit-0.1.3-py3-none-any.whl
  • Upload date:
  • Size: 83.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.13.7

File hashes

Hashes for groupit-0.1.3-py3-none-any.whl
Algorithm Hash digest
SHA256 44657b949723e75ddced8ccba8956a7ef5e7771fc0accc79997448fa95383bd7
MD5 d9c5812bcc856bf81d317b9372f8c761
BLAKE2b-256 6d262ba4de6b3370ba5996df2720026ce1f67d8043d29d4ca99cf4524d7f4459

See more details on using hashes here.

Provenance

The following attestation bundles were made for groupit-0.1.3-py3-none-any.whl:

Publisher: publish.yml on jarry3369/groupit

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page