An empirical Python library for Mining Software Repositories (MSR) in Green IT research

These details have not been verified by PyPI

Project links

Project description

greenmining

An empirical Python library for Mining Software Repositories (MSR) in Green IT research.

Overview

greenmining is a research-grade Python library designed for empirical Mining Software Repositories (MSR) studies in Green IT. It enables researchers and practitioners to:

Mine repositories at scale - Fetch and analyze GitHub repositories via GraphQL API with configurable filters
Batch analysis with parallelism - Analyze multiple repositories concurrently with configurable worker pools
Classify green commits - Detect 122 sustainability patterns from the Green Software Foundation (GSF) catalog
Analyze any repository by URL - Direct PyDriller-based analysis with support for private repositories
Measure energy consumption - RAPL, CodeCarbon, and CPU Energy Meter backends for power profiling
Carbon footprint reporting - CO2 emissions calculation with 20+ country profiles and cloud region support
Power regression detection - Identify commits that increased energy consumption
Method-level analysis - Per-method complexity and metrics via Lizard integration
Version power comparison - Compare power consumption across software versions
Generate research datasets - Statistical analysis, temporal trends, and publication-ready reports
Web dashboard - Flask-based interactive visualization of analysis results

Whether you're conducting MSR research, analyzing green software adoption, or measuring the energy footprint of codebases, GreenMining provides the empirical toolkit you need.

Installation

Via pip

pip install greenmining

From source

git clone https://github.com/adam-bouafia/greenmining.git
cd greenmining
pip install -e .

With Docker

docker pull adambouafia/greenmining:latest

Quick Start

Python API

Basic Pattern Detection

from greenmining import GSF_PATTERNS, is_green_aware, get_pattern_by_keywords

# Check available patterns
print(f"Total patterns: {len(GSF_PATTERNS)}")  # 122 patterns across 15 categories

# Detect green awareness in commit messages
commit_msg = "Optimize Redis caching to reduce energy consumption"
if is_green_aware(commit_msg):
    patterns = get_pattern_by_keywords(commit_msg)
    print(f"Matched patterns: {patterns}")
    # Output: ['Cache Static Data', 'Use Efficient Cache Strategies']

Fetch Repositories with Custom Keywords

from greenmining import fetch_repositories

# Fetch repositories with custom search keywords
repos = fetch_repositories(
    github_token="your_github_token",  # Required: GitHub personal access token
    max_repos=50,                      # Maximum number of repositories to fetch
    min_stars=500,                     # Minimum star count filter
    keywords="kubernetes cloud-native", # Search keywords (space-separated)
    languages=["Python", "Go"],        # Programming language filters
    created_after="2020-01-01",        # Filter by creation date (YYYY-MM-DD)
    created_before="2024-12-31",       # Filter by creation date (YYYY-MM-DD)
    pushed_after="2023-01-01",         # Filter by last push date (YYYY-MM-DD)
    pushed_before="2024-12-31"         # Filter by last push date (YYYY-MM-DD)
)

print(f"Found {len(repos)} repositories")
for repo in repos[:5]:
    print(f"- {repo.full_name} ({repo.stars} stars)")

Parameters:

github_token (str, required): GitHub personal access token for API authentication
max_repos (int, default=100): Maximum number of repositories to fetch
min_stars (int, default=100): Minimum GitHub stars filter
keywords (str, default="microservices"): Space-separated search keywords
languages (list[str], optional): Programming language filters (e.g., ["Python", "Go", "Java"])
created_after (str, optional): Filter repos created after date (format: "YYYY-MM-DD")
created_before (str, optional): Filter repos created before date (format: "YYYY-MM-DD")
pushed_after (str, optional): Filter repos pushed after date (format: "YYYY-MM-DD")
pushed_before (str, optional): Filter repos pushed before date (format: "YYYY-MM-DD")

Analyze Repository Commits

from greenmining.services.commit_extractor import CommitExtractor
from greenmining.services.data_analyzer import DataAnalyzer
from greenmining import fetch_repositories

# Fetch repositories with custom keywords
repos = fetch_repositories(
    github_token="your_token",
    max_repos=10,
    keywords="serverless edge-computing"
)

# Initialize commit extractor with parameters
extractor = CommitExtractor(
    exclude_merge_commits=True,      # Skip merge commits (default: True)
    exclude_bot_commits=True,        # Skip bot commits (default: True)
    min_message_length=10            # Minimum commit message length (default: 10)
)

# Initialize analyzer with advanced features
analyzer = DataAnalyzer(
    enable_diff_analysis=False,      # Enable code diff analysis (slower but more accurate)
    patterns=None,                   # Custom pattern dict (default: GSF_PATTERNS)
    batch_size=10                    # Batch processing size (default: 10)
)

# Extract commits from first repo
commits = extractor.extract_commits(
    repository=repos[0],             # PyGithub Repository object
    max_commits=50,                  # Maximum commits to extract per repository
    since=None,                      # Start date filter (datetime object, optional)
    until=None                       # End date filter (datetime object, optional)
)

**CommitExtractor Parameters:**
- `exclude_merge_commits` (bool, default=True): Skip merge commits during extraction
- `exclude_bot_commits` (bool, default=True): Skip commits from bot accounts
- `min_message_length` (int, default=10): Minimum length for commit message to be included

**DataAnalyzer Parameters:**
- `enable_diff_analysis` (bool, default=False): Enable code diff analysis (slower)
- `patterns` (dict, optional): Custom pattern dictionary (default: GSF_PATTERNS)
- `batch_size` (int, default=10): Number of commits to process in each batch

# Analyze commits for green patterns
results = []
for commit in commits:
    result = analyzer.analyze_commit(commit)
    if result['green_aware']:
        results.append(result)
        print(f"Green commit found: {commit.message[:50]}...")
        print(f"  Patterns: {result['known_pattern']}")

Access Sustainability Patterns Data

from greenmining import GSF_PATTERNS

# Get all patterns by category
cloud_patterns = {
    pid: pattern for pid, pattern in GSF_PATTERNS.items()
    if pattern['category'] == 'cloud'
}
print(f"Cloud patterns: {len(cloud_patterns)}")  # 40 patterns

ai_patterns = {
    pid: pattern for pid, pattern in GSF_PATTERNS.items()
    if pattern['category'] == 'ai'
}
print(f"AI/ML patterns: {len(ai_patterns)}")  # 19 patterns

# Get pattern details
cache_pattern = GSF_PATTERNS['gsf_001']
print(f"Pattern: {cache_pattern['name']}")
print(f"Category: {cache_pattern['category']}")
print(f"Keywords: {cache_pattern['keywords']}")
print(f"Impact: {cache_pattern['sci_impact']}")

# List all available categories
categories = set(p['category'] for p in GSF_PATTERNS.values())
print(f"Available categories: {sorted(categories)}")
# Output: ['ai', 'async', 'caching', 'cloud', 'code', 'data', 
#          'database', 'general', 'infrastructure', 'microservices',
#          'monitoring', 'network', 'networking', 'resource', 'web']

Advanced Analysis: Temporal Trends

from greenmining.services.data_aggregator import DataAggregator
from greenmining.analyzers.temporal_analyzer import TemporalAnalyzer
from greenmining.analyzers.qualitative_analyzer import QualitativeAnalyzer

# Initialize aggregator with all advanced features
aggregator = DataAggregator(
    config=None,                        # Config object (optional)
    enable_stats=True,                  # Enable statistical analysis (correlations, trends)
    enable_temporal=True,               # Enable temporal trend analysis
    temporal_granularity="quarter"      # Time granularity: day/week/month/quarter/year
)

# Optional: Configure temporal analyzer separately
temporal_analyzer = TemporalAnalyzer(
    granularity="quarter"               # Time period granularity for grouping commits
)

# Optional: Configure qualitative analyzer for validation sampling
qualitative_analyzer = QualitativeAnalyzer(
    sample_size=30,                     # Number of samples for manual validation
    stratify_by="pattern"               # Stratification method: pattern/repository/time/random
)

# Aggregate results with temporal insights
aggregated = aggregator.aggregate(
    analysis_results=analysis_results,  # List of analysis result dictionaries
    repositories=repositories           # List of PyGithub repository objects
)

**DataAggregator Parameters:**
- `config` (Config, optional): Configuration object
- `enable_stats` (bool, default=False): Enable pattern correlations and effect size analysis
- `enable_temporal` (bool, default=False): Enable temporal trend analysis over time
- `temporal_granularity` (str, default="quarter"): Time granularity (day/week/month/quarter/year)

**TemporalAnalyzer Parameters:**
- `granularity` (str, default="quarter"): Time period for grouping (day/week/month/quarter/year)

**QualitativeAnalyzer Parameters:**
- `sample_size` (int, default=30): Number of commits to sample for validation
- `stratify_by` (str, default="pattern"): Stratification method (pattern/repository/time/random)

# Access temporal analysis results
temporal = aggregated['temporal_analysis']
print(f"Time periods analyzed: {len(temporal['periods'])}")

# View pattern adoption trends over time
for period_data in temporal['periods']:
    print(f"{period_data['period']}: {period_data['commit_count']} commits, "
          f"{period_data['green_awareness_rate']:.1%} green awareness")

# Access pattern evolution insights
evolution = temporal.get('pattern_evolution', {})
print(f"Emerging patterns: {evolution.get('emerging', [])}")
print(f"Stable patterns: {evolution.get('stable', [])}")

Generate Custom Reports

from greenmining.services.data_aggregator import DataAggregator
from greenmining.config import Config

config = Config()
aggregator = DataAggregator(config)

# Load analysis results
results = aggregator.load_analysis_results()

# Generate statistics
stats = aggregator.calculate_statistics(results)
print(f"Total commits analyzed: {stats['total_commits']}")
print(f"Green-aware commits: {stats['green_aware_count']}")
print(f"Top patterns: {stats['top_patterns'][:5]}")

# Export to CSV
aggregator.export_to_csv(results, "output.csv")

URL-Based Repository Analysis

from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer

analyzer = LocalRepoAnalyzer(
    max_commits=200,
    cleanup_after=True,
)

result = analyzer.analyze_repository("https://github.com/pallets/flask")

print(f"Repository: {result.name}")
print(f"Commits analyzed: {result.total_commits}")
print(f"Green-aware: {result.green_commits} ({result.green_commit_rate:.1%})")

for commit in result.commits[:5]:
    if commit.green_aware:
        print(f"  {commit.message[:60]}...")

Batch Analysis with Parallelism

from greenmining import analyze_repositories

results = analyze_repositories(
    urls=[
        "https://github.com/kubernetes/kubernetes",
        "https://github.com/istio/istio",
        "https://github.com/envoyproxy/envoy",
    ],
    max_commits=100,
    parallel_workers=3,
    energy_tracking=True,
    energy_backend="auto",
)

for result in results:
    print(f"{result.name}: {result.green_commit_rate:.1%} green")

Private Repository Analysis

from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer

# HTTPS with token
analyzer = LocalRepoAnalyzer(github_token="ghp_xxxx")
result = analyzer.analyze_repository("https://github.com/company/private-repo")

# SSH with key
analyzer = LocalRepoAnalyzer(ssh_key_path="~/.ssh/id_rsa")
result = analyzer.analyze_repository("git@github.com:company/private-repo.git")

Power Regression Detection

from greenmining.analyzers import PowerRegressionDetector

detector = PowerRegressionDetector(
    test_command="pytest tests/ -x",
    energy_backend="rapl",
    threshold_percent=5.0,
    iterations=5,
)

regressions = detector.detect(
    repo_path="/path/to/repo",
    baseline_commit="v1.0.0",
    target_commit="HEAD",
)

for regression in regressions:
    print(f"Commit {regression.sha[:8]}: +{regression.power_increase:.1f}%")

Version Power Comparison

from greenmining.analyzers import VersionPowerAnalyzer

analyzer = VersionPowerAnalyzer(
    test_command="pytest tests/",
    energy_backend="rapl",
    iterations=10,
    warmup_iterations=2,
)

report = analyzer.analyze_versions(
    repo_path="/path/to/repo",
    versions=["v1.0", "v1.1", "v1.2", "v2.0"],
)

print(report.summary())
print(f"Trend: {report.trend}")
print(f"Most efficient: {report.most_efficient}")

Metrics-to-Power Correlation

from greenmining.analyzers import MetricsPowerCorrelator

correlator = MetricsPowerCorrelator()
correlator.fit(
    metrics=["complexity", "nloc", "code_churn"],
    metrics_values={
        "complexity": [10, 20, 30, 40],
        "nloc": [100, 200, 300, 400],
        "code_churn": [50, 100, 150, 200],
    },
    power_measurements=[5.0, 8.0, 12.0, 15.0],
)

print(f"Pearson: {correlator.pearson}")
print(f"Spearman: {correlator.spearman}")
print(f"Feature importance: {correlator.feature_importance}")

Web Dashboard

from greenmining.dashboard import run_dashboard

# Launch interactive dashboard (requires pip install greenmining[dashboard])
run_dashboard(data_dir="./data", host="127.0.0.1", port=5000)

Pipeline Batch Analysis

from greenmining.controllers.repository_controller import RepositoryController
from greenmining.config import Config

config = Config()
controller = RepositoryController(config)

# Run full pipeline programmatically
controller.fetch_repositories(max_repos=50)
controller.extract_commits(max_commits=100)
controller.analyze_commits()
controller.aggregate_results()
controller.generate_report()

print("Analysis complete! Check data/ directory for results.")

Complete Working Example: Full Pipeline

This is a complete, production-ready example that demonstrates the entire analysis pipeline. This example successfully analyzed 100 repositories with 30,543 commits in our testing.

import os
from pathlib import Path
from dotenv import load_dotenv

# Load environment variables
load_dotenv()

# Import from greenmining package
from greenmining import fetch_repositories
from greenmining.services.commit_extractor import CommitExtractor
from greenmining.services.data_analyzer import DataAnalyzer
from greenmining.services.data_aggregator import DataAggregator

# Configuration
token = os.getenv("GITHUB_TOKEN")
output_dir = Path("results")
output_dir.mkdir(exist_ok=True)

# STAGE 1: Fetch Repositories
print("Fetching repositories...")
repositories = fetch_repositories(
    github_token=token,
    max_repos=100,
    min_stars=10,
    keywords="software engineering",
)
print(f"Fetched {len(repositories)} repositories")

# STAGE 2: Extract Commits
print("\nExtracting commits...")
extractor = CommitExtractor(
    github_token=token,
    max_commits=1000,
    skip_merges=True,
    days_back=730,
    timeout=120,
)
all_commits = extractor.extract_from_repositories(repositories)
print(f"Extracted {len(all_commits)} commits")

# Save commits
extractor.save_results(
    all_commits,
    output_dir / "commits.json",
    len(repositories)
)

# STAGE 3: Analyze Commits
print("\nAnalyzing commits...")
analyzer = DataAnalyzer(
    enable_diff_analysis=False,  # Set to True for detailed code analysis (slower)
)
analyzed_commits = analyzer.analyze_commits(all_commits)

# Count green-aware commits
green_count = sum(1 for c in analyzed_commits if c.get("green_aware", False))
green_percentage = (green_count / len(analyzed_commits) * 100) if analyzed_commits else 0
print(f"Analyzed {len(analyzed_commits)} commits")
print(f"Green-aware: {green_count} ({green_percentage:.1f}%)")

# Save analysis
analyzer.save_results(analyzed_commits, output_dir / "analyzed.json")

# STAGE 4: Aggregate Results
print("\nAggregating results...")
aggregator = DataAggregator(
    enable_stats=True,
    enable_temporal=True,
    temporal_granularity="quarter",
)
results = aggregator.aggregate(analyzed_commits, repositories)

# STAGE 5: Save Results
print("\nSaving results...")
aggregator.save_results(
    results,
    output_dir / "aggregated.json",
    output_dir / "aggregated.csv",
    analyzed_commits
)

# Print summary
print("\n" + "="*80)
print("ANALYSIS COMPLETE")
print("="*80)
aggregator.print_summary(results)
print(f"\nResults saved in: {output_dir.absolute()}")

What this example does:

Fetches repositories from GitHub based on keywords and filters
Extracts commits from each repository (up to 1000 per repo)
Analyzes commits for green software patterns
Aggregates results with temporal analysis and statistics
Saves results to JSON and CSV files for further analysis

Expected output files:

commits.json - All extracted commits with metadata
analyzed.json - Commits analyzed for green patterns
aggregated.json - Summary statistics and pattern distributions
aggregated.csv - Tabular format for spreadsheet analysis
metadata.json - Experiment configuration and timing

Performance: This pipeline successfully processed 100 repositories (30,543 commits) in approximately 6.4 hours, identifying 7,600 green-aware commits (24.9%).

Docker Usage

# Interactive shell with Python
docker run -it -v $(pwd)/data:/app/data \
           adambouafia/greenmining:latest python

# Run Python script
docker run -v $(pwd)/data:/app/data \
           adambouafia/greenmining:latest python your_script.py

Configuration

Environment Variables

Create a .env file or set environment variables:

# Required
GITHUB_TOKEN=your_github_personal_access_token

# Optional - Repository Fetching
MAX_REPOS=100
MIN_STARS=100
SUPPORTED_LANGUAGES=Python,Java,Go,JavaScript,TypeScript
SEARCH_KEYWORDS=microservices

# Optional - Commit Extraction
COMMITS_PER_REPO=50
EXCLUDE_MERGE_COMMITS=true
EXCLUDE_BOT_COMMITS=true

# Optional - Analysis Features
ENABLE_DIFF_ANALYSIS=false
BATCH_SIZE=10

# Optional - Temporal Analysis
ENABLE_TEMPORAL=true
TEMPORAL_GRANULARITY=quarter
ENABLE_STATS=true

# Optional - Output
OUTPUT_DIR=./data
REPORT_FORMAT=markdown

Config Object Parameters

from greenmining.config import Config

config = Config(
    # GitHub API
    github_token="your_token",              # GitHub personal access token (required)
    
    # Repository Fetching
    max_repos=100,                          # Maximum repositories to fetch
    min_stars=100,                          # Minimum star threshold
    supported_languages=["Python", "Go"],   # Language filters
    search_keywords="microservices",        # Default search keywords
    
    # Commit Extraction
    max_commits=50,                         # Commits per repository
    exclude_merge_commits=True,             # Skip merge commits
    exclude_bot_commits=True,               # Skip bot commits
    min_message_length=10,                  # Minimum commit message length
    
    # Analysis Options
    enable_diff_analysis=False,             # Enable code diff analysis
    batch_size=10,                          # Batch processing size
    
    # Temporal Analysis
    enable_temporal=True,                   # Enable temporal trend analysis
    temporal_granularity="quarter",         # day/week/month/quarter/year
    enable_stats=True,                      # Enable statistical analysis
    
    # Output Configuration
    output_dir="./data",                    # Output directory path
    repos_file="repositories.json",         # Repositories filename
    commits_file="commits.json",            # Commits filename
    analysis_file="analysis_results.json",  # Analysis results filename
    stats_file="aggregated_statistics.json", # Statistics filename
    report_file="green_analysis.md"         # Report filename
)

Features

Core Capabilities

Pattern Detection: 122 sustainability patterns across 15 categories from the GSF catalog
Keyword Analysis: 321 green software detection keywords
Repository Fetching: GraphQL API with date, star, and language filters
URL-Based Analysis: Direct PyDriller analysis from GitHub URLs (HTTPS and SSH)
Batch Processing: Parallel analysis of multiple repositories with configurable workers
Private Repository Support: Authentication via SSH keys or GitHub tokens
Energy Measurement: RAPL, CodeCarbon, and CPU Energy Meter backends
Carbon Footprint Reporting: CO2 emissions with 20+ country profiles and cloud region support (AWS, GCP, Azure)
Power Regression Detection: Identify commits that increased energy consumption
Metrics-to-Power Correlation: Pearson and Spearman analysis between code metrics and power
Version Power Comparison: Compare power consumption across software versions with trend detection
Method-Level Analysis: Per-method complexity metrics via Lizard integration
Source Code Access: Before/after source code for refactoring detection
Full Process Metrics: All 8 PyDriller process metrics (ChangeSet, CodeChurn, CommitsCount, ContributorsCount, ContributorsExperience, HistoryComplexity, HunksCount, LinesCount)
Statistical Analysis: Correlations, effect sizes, and temporal trends
Multi-format Output: Markdown reports, CSV exports, JSON data
Web Dashboard: Flask-based interactive visualization (pip install greenmining[dashboard])
Docker Support: Pre-built images for containerized analysis

Energy Measurement

greenmining includes built-in energy measurement capabilities for tracking the carbon footprint of your analysis:

Backend Options

Backend	Platform	Metrics	Requirements
RAPL	Linux (Intel/AMD)	CPU/RAM energy (Joules)	`/sys/class/powercap/` access
CodeCarbon	Cross-platform	Energy + Carbon emissions (gCO2)	`pip install codecarbon`
CPU Meter	All platforms	Estimated CPU energy (Joules)	Optional: `pip install psutil`
Auto	All platforms	Best available backend	Automatic detection

Python API

from greenmining.energy import RAPLEnergyMeter, CPUEnergyMeter, get_energy_meter

# Auto-detect best backend
meter = get_energy_meter("auto")
meter.start()
# ... run analysis ...
result = meter.stop()
print(f"Energy: {result.joules:.2f} J")
print(f"Power: {result.watts_avg:.2f} W")

# Integrated energy tracking during analysis
from greenmining.services.local_repo_analyzer import LocalRepoAnalyzer

analyzer = LocalRepoAnalyzer(energy_tracking=True, energy_backend="auto")
result = analyzer.analyze_repository("https://github.com/pallets/flask")
print(f"Analysis energy: {result.energy_metrics['joules']:.2f} J")

Carbon Footprint Reporting

from greenmining.energy import CarbonReporter

reporter = CarbonReporter(
    country_iso="USA",
    cloud_provider="aws",
    region="us-east-1",
)
report = reporter.generate_report(total_joules=3600.0)
print(f"CO2: {report.total_emissions_kg * 1000:.4f} grams")
print(f"Equivalent: {report.tree_months:.2f} tree-months to offset")

Pattern Database

122 green software patterns based on:

Green Software Foundation (GSF) Patterns Catalog
VU Amsterdam 2024 research on ML system sustainability
ICSE 2024 conference papers on sustainable software

Detection Performance

Coverage: 67% of patterns actively detect in real-world commits
Accuracy: 100% true positive rate for green-aware commits
Categories: 15 distinct sustainability domains covered
Keywords: 321 detection terms across all patterns

GSF Pattern Categories

122 patterns across 15 categories:

1. Cloud (40 patterns)

Auto-scaling, serverless computing, right-sizing instances, region selection for renewable energy, spot instances, idle resource detection, cloud-native architectures

2. Web (17 patterns)

CDN usage, caching strategies, lazy loading, asset compression, image optimization, minification, code splitting, tree shaking, prefetching

3. AI/ML (19 patterns)

Model optimization, pruning, quantization, edge inference, batch optimization, efficient training, model compression, hardware acceleration, green ML pipelines

4. Database (5 patterns)

Indexing strategies, query optimization, connection pooling, prepared statements, database views, denormalization for efficiency

5. Networking (8 patterns)

Protocol optimization, connection reuse, HTTP/2, gRPC, efficient serialization, compression, persistent connections

6. Network (6 patterns)

Request batching, GraphQL optimization, API gateway patterns, circuit breakers, rate limiting, request deduplication

7. Caching (2 patterns)

Multi-level caching, cache invalidation strategies, data deduplication, distributed caching

8. Resource (2 patterns)

Resource limits, dynamic allocation, memory management, CPU throttling

9. Data (3 patterns)

Efficient serialization formats, pagination, streaming, data compression

10. Async (3 patterns)

Event-driven architecture, reactive streams, polling elimination, non-blocking I/O

11. Code (4 patterns)

Algorithm optimization, code efficiency, garbage collection tuning, memory profiling

12. Monitoring (3 patterns)

Energy monitoring, performance profiling, APM tools, observability patterns

13. Microservices (4 patterns)

Service decomposition, colocation strategies, graceful shutdown, service mesh optimization

14. Infrastructure (4 patterns)

Alpine containers, Infrastructure as Code, renewable energy regions, container optimization

15. General (8 patterns)

Feature flags, incremental processing, precomputation, background jobs, workflow optimization

Output Files

All outputs are saved to the data/ directory:

repositories.json - Repository metadata
commits.json - Extracted commit data
analysis_results.json - Pattern analysis results
aggregated_statistics.json - Summary statistics
green_analysis_results.csv - CSV export for spreadsheets
green_microservices_analysis.md - Final report

Development

# Clone repository
git clone https://github.com/adam-bouafia/greenmining.git
cd greenmining

# Install development dependencies
pip install -e ".[dev]"

# Run tests
pytest tests/

# Run with coverage
pytest --cov=greenmining tests/

# Format code
black greenmining/ tests/
ruff check greenmining/ tests/

Requirements

Python 3.9+
PyGithub >= 2.1.1
PyDriller >= 2.5
pandas >= 2.2.0

Optional dependencies:

pip install greenmining[energy]      # psutil, codecarbon (energy measurement)
pip install greenmining[dashboard]   # flask (web dashboard)
pip install greenmining[dev]         # pytest, black, ruff, mypy (development)

License

MIT License - See LICENSE for details.

Contributing

Contributions are welcome! Please open an issue or submit a pull request.

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

1.2.9

Feb 3, 2026

1.2.8

Feb 3, 2026

1.2.7

Feb 3, 2026

1.2.6

Feb 3, 2026

1.2.5

Feb 1, 2026

1.2.4

Feb 1, 2026

1.2.3

Feb 1, 2026

1.2.2

Feb 1, 2026

1.2.1

Feb 1, 2026

1.2.0

Jan 31, 2026

1.1.9

Jan 31, 2026

1.1.8

Jan 31, 2026

1.1.7

Jan 31, 2026

1.1.6

Jan 31, 2026

1.1.5

Jan 30, 2026

1.1.4

Jan 30, 2026

This version

1.1.3

Jan 30, 2026

1.0.9

Jan 30, 2026

1.0.8

Jan 30, 2026

1.0.7

Jan 30, 2026

1.0.6

Jan 30, 2026

1.0.5

Jan 29, 2026

1.0.4

Jan 29, 2026

1.0.3

Dec 9, 2025

1.0.2

Dec 8, 2025

1.0.1

Dec 8, 2025

0.1.12

Dec 3, 2025

0.1.11

Dec 2, 2025

0.1.10

Dec 2, 2025

0.1.9

Dec 2, 2025

0.1.8

Dec 2, 2025

0.1.7

Dec 2, 2025

0.1.6

Dec 2, 2025

0.1.4

Dec 2, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

greenmining-1.1.3.tar.gz (95.6 kB view details)

Uploaded Jan 30, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

greenmining-1.1.3-py3-none-any.whl (98.2 kB view details)

Uploaded Jan 30, 2026 Python 3

File details

Details for the file greenmining-1.1.3.tar.gz.

File metadata

Download URL: greenmining-1.1.3.tar.gz
Upload date: Jan 30, 2026
Size: 95.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for greenmining-1.1.3.tar.gz
Algorithm	Hash digest
SHA256	`dc1a7270edc3be485aed4ee954be1526e3414c0289c6ec535d707e8a916826c4`
MD5	`4ac20953ce549a8f0ff42ac499af88b9`
BLAKE2b-256	`2fb6f98d55a6dd1e567e572fbc580aa4dac87c34b0088a7640e2c3b223fee5be`

See more details on using hashes here.

File details

Details for the file greenmining-1.1.3-py3-none-any.whl.

File metadata

Download URL: greenmining-1.1.3-py3-none-any.whl
Upload date: Jan 30, 2026
Size: 98.2 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.13.5

File hashes

Hashes for greenmining-1.1.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`3c3056cbb2dfe9fa7bc63689c2630ce52f61796aaa502b4bb064f24f3e004c6c`
MD5	`f882f9619d98406d342218285240e048`
BLAKE2b-256	`af45ce7f3b9ae02c822ad5ed19810e5cfff0b47e235dfe68b477a1d9efb463cd`

See more details on using hashes here.

greenmining 1.1.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

greenmining

Overview

Installation

Via pip

From source

With Docker

Quick Start

Python API

Basic Pattern Detection

Fetch Repositories with Custom Keywords

Analyze Repository Commits

Access Sustainability Patterns Data

Advanced Analysis: Temporal Trends

Generate Custom Reports

URL-Based Repository Analysis

Batch Analysis with Parallelism

Private Repository Analysis

Power Regression Detection

Version Power Comparison

Metrics-to-Power Correlation

Web Dashboard

Pipeline Batch Analysis

Complete Working Example: Full Pipeline

Docker Usage

Configuration

Environment Variables

Config Object Parameters

Features

Core Capabilities

Energy Measurement

Backend Options

Python API

Carbon Footprint Reporting

Pattern Database

Detection Performance

GSF Pattern Categories

1. Cloud (40 patterns)

2. Web (17 patterns)

3. AI/ML (19 patterns)

4. Database (5 patterns)

5. Networking (8 patterns)

6. Network (6 patterns)

7. Caching (2 patterns)

8. Resource (2 patterns)

9. Data (3 patterns)

10. Async (3 patterns)

11. Code (4 patterns)

12. Monitoring (3 patterns)

13. Microservices (4 patterns)

14. Infrastructure (4 patterns)

15. General (8 patterns)

Output Files

Development

Requirements

License

Contributing

Links

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes