A Python library to perform clustering, modeling, sensitivity analysis and scenario-based evaluations on buildings data

These details have not been verified by PyPI

Project links

Project description

PyBuildingCluster - (Geo)clustering, sensitivity analysis and scenario-based evaluations

pyBuildingCluster Logo

A Python library to perform clustering, modeling, sensitivity analysis and refurbishment scenario-based evaluations of buildings.

Overview

This library provides comprehensive tools for analyzing building energy performance data through clustering, regression modeling, and sensitivity analysis. It was developed by the Energy Efficient Buildings group at EURAC Research as part of the MODERATE project (Horizon Europe grant agreement No 101069834).

Key Features

Clustering Analysis: K-means clustering with automatic cluster number determination using elbow method or silhouette analysis, also DBSCAN and hierarchical clustering are supported.
Regression Modeling: Support for Random Forest, XGBoost, and LightGBM models with automatic model selection
Sensitivity Analysis: Scenario-based analysis to understand parameter impacts on clusters
Parameter Optimization: Optuna-based optimization for finding optimal parameter combinations
CLI Interface: Command-line tools for easy integration into workflows
Extensible Design: Modular architecture for easy customization and extension

Installation

From PyPI (recommended)

pip install pybuildingcluster

From Source

git clone https://github.com/EURAC-EEBgroup/pybuildingcluster.git
cd pybuildingcluster
pip install -e .

Development Installation

git clone https://github.com/EURAC-EEBgroup/pybuildingcluster.git
cd pybuildingcluster
pip install -e ".[dev]"

Quick Start

Example

Here an Example <https://github.com/EURAC-EEBgroup/pyBuildingCluster/tree/master/examples> of pybuildingcluster application, using Energy Performance Certifcate dataset

The example use Synthetic dataset of Energy Performance Certificates of public buildings in Piedmont Region, Italy.

The process of synthesis was carried out using the library provided by MOSTLY AI https://github.com/mostly-ai/mostlyai

MOSTLY AI Logo

All within the synthetic folder it is possible to view the report relative to the generation of the synthetic dataset.

How to use the synthetic dataset and the library.

The synthesized dataset, in addition to preserving the same statistical characteristics as the original data, represents a very useful resource for evaluating potential energy efficiency improvements across a building stock where only some buildings' performance data is known. In fact, by generating synthetic data, more robust assessments can be made, since the analysis can be based on a larger number of buildings that closely resemble those present in the actual territory.

Here Report <https://github.com/EURAC-EEBgroup/pybuildingcluster/tree/master/synthetization/EPC_tabular.html> of the synthetic dataset of EPC.

Once the data is generated, it can be divided into different clusters based on specific properties. In the example provided, the clustering is done using the QHnd property (heating energy demand of the building) and heating degree days.

Elbow Method

Each cluster is then analyzed through a sensitivity analysis of selected parameters.

In this case, the average thermal transmittance of opaque components and the average thermal transmittance of transparent components are used.

Sensitivity Analysis

The analysis shows how varying these parameters can lead to significant reductions in energy consumption for the selected cluster. For instance, the map illustrates that the dark blue areas correspond to the greatest reductions in consumption, as they represent combinations of low values for both selected parameters. However, this may not always represent the best performance-to-cost ratio. In fact, considerable savings can also be achieved by slightly improving these parameters, which requires a lower investment.

Moreover, specific retrofit scenarios can be identified.

Scenarios Analysis

In the example, 10 scenarios were analyzed. Not all of them necessarily lead to benefits—only a few may contribute positively to energy consumption reduction.

To support better decision-making, an HTML report <https://github.com/EURAC-EEBgroup/pybuildingcluster/blob/master/src/pybuildingcluster/examples/report/scenario_analysis_report.html> is generated that allows users to identify the most effective solution applied

Python API

import pandas as pd
import pybuildingcluster as pbui
import numpy as np
import os
from pathlib import Path
from dotenv import load_dotenv
load_dotenv()
# Example .env
'''
Configurazione Paths
CLUSTERING_CSV_PATH=.../data/clustering.csv
DATA_DIR=.../pybuildingcluster/data
RESULTS_DIR=.../pybuildingcluster/results
MODELS_DIR=.../pybuildingcluster/models
'''

# Esplora il dataset
building_data = explore_dataset(building_data)
# Feature columns for clustering
feature_columns = ['QHnd', 'degree_days']
# Feature columns for regression
feature_columns_regression_ = feature_columns_regression(building_data)

#%%
df_cluster = building_data[feature_columns]

# Get optimal number of clusters
optimal_k_elbow  = pbui.ClusteringAnalyzer().determine_optimal_clusters(df_cluster, method="elbow", k_range=(2, 15), plot=True)
optimal_k_silhouette = pbui.ClusteringAnalyzer().determine_optimal_clusters(df_cluster, method="silhouette", k_range=(2, 15), plot=True)


clustering_analyzer = pbui.ClusteringAnalyzer(random_state=42)
clusters = clustering_analyzer.fit_predict(
    data=building_data,
    feature_columns=feature_columns,
    method="silhouette",
    n_clusters=optimal_k_silhouette,
    algorithm="kmeans",
    save_clusters=True,
    output_dir="../examples/example_results"
)

results = pbui.ClusteringAnalyzer(random_state=42).fit_predict(
            data=df,
            feature_columns=feature_columns,
            method="silhouette",
            algorithm="kmeans",
            save_clusters=True,
            output_dir="../examples/example_results"
        )

#  Evaluate metrics
stats = pbui.ClusteringAnalyzer(random_state=42).get_cluster_statistics(
    results['data_with_clusters'], 
    feature_columns
)
print(stats)

# ======= Regression Models =======

models = pbui.RegressionModelBuilder(random_state=42, problem_type="regression").build_models(
    data=building_data,
    clusters=clusters,
    target_column='QHnd',
    feature_columns=feature_columns_regression_,
    models_to_train=['random_forest','xgboost','lightgbm'],
    hyperparameter_tuning="none",
    save_models=False,
    user_features=['average_opaque_surface_transmittance','average_glazed_surface_transmittance']
)

# ======= Sensitivity Analysis =======
# From cluster 1 get limits of average opaque surface transmittance and average glazed surface transmittance
cluster_1 = clusters['data_with_clusters'][clusters['data_with_clusters']['cluster'] == 1]
cluster_1_limits = {
    'average_opaque_surface_transmittance': {'min': float(cluster_1['average_opaque_surface_transmittance'].min()), 'max': float(cluster_1['average_opaque_surface_transmittance'].max())},
    'average_glazed_surface_transmittance': {'min': float(cluster_1['average_glazed_surface_transmittance'].min()), 'max': float(cluster_1['average_glazed_surface_transmittance'].max())}
}

sensitivity_analyzer = pbui.SensitivityAnalyzer(random_state=42)
data_with_clusters = clusters['data_with_clusters']

oat_results = sensitivity_analyzer.sensitivity_analysis(
    cluster_df=data_with_clusters,
    sensitivity_vars=['average_opaque_surface_transmittance', 'average_glazed_surface_transmittance'],
    target='QHnd',
    modello=models[1]['best_model'],
    n_points=20,
    normalize_=True,
    plot_3d=False,
    cluster_id=None,
    feature_columns=models[1]['feature_columns']
)


# # ======= Scenario Analysis =======
# Get scenarios generated by generate_scenario.py

import numpy as np
import pandas as pd

# Extract limits for better readability
opaque_min = cluster_1_limits['average_opaque_surface_transmittance']['min']
opaque_max = cluster_1_limits['average_opaque_surface_transmittance']['max']
glazed_min = cluster_1_limits['average_glazed_surface_transmittance']['min']
glazed_max = cluster_1_limits['average_glazed_surface_transmittance']['max']

print(f"Range opaque transmittance: {opaque_min:.3f} - {opaque_max:.3f}")
print(f"Range glazed transmittance: {glazed_min:.3f} - {glazed_max:.3f}")

# Generate 10 strategic scenarios for energy efficiency
list_dict_scenarios = [
    # 1. OPTIMAL SCENARIO - Maximum efficiency (minimum values)
    {
        'name': 'High Efficiency Optimal', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min,
            'average_glazed_surface_transmittance': glazed_min
        }
    },
    
    # 2. WORST SCENARIO - Minimum efficiency (maximum values)
    {
        'name': 'Low Efficiency Worst', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_max,
            'average_glazed_surface_transmittance': glazed_max
        }
    },
    
    # 3. BALANCED SCENARIO - Intermediate values
    {
        'name': 'Balanced Performance', 
        'parameters': {
            'average_opaque_surface_transmittance': (opaque_min + opaque_max) / 2,
            'average_glazed_surface_transmittance': (glazed_min + glazed_max) / 2
        }
    },
    
    # 4. ENVELOPE OPTIMIZED - Efficient envelope, standard glazed
    {
        'name': 'Optimized Envelope', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.2,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.4
        }
    },
    
    # 5. GLAZED ADVANCED - Efficient glazed, standard envelope  
    {
        'name': 'Advanced Glazing', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.4,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.2
        }
    },
    
    # 6. CONSERVATIVE SCENARIO - Moderate improvement
    {
        'name': 'Conservative Upgrade', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.3,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.3
        }
    },
    
    # 7. AGGRESSIVE SCENARIO - High efficiency
    {
        'name': 'Aggressive Efficiency', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.1,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.15
        }
    },
    
    # 8. CURRENT MARKET STANDARD - Typical market performance
    {
        'name': 'Current Market Standard', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.6,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.5
        }
    },
    
    # 9. ECONOMIC RETROFIT SCENARIO - Cost-effective improvement
    {
        'name': 'Economic Retrofit', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.45,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.35
        }
    },
    
    # 10. HIGH PERFORMANCE SCENARIO - Quasi ottimale ma realistico
    {
        'name': 'High Performance Realistic', 
        'parameters': {
            'average_opaque_surface_transmittance': opaque_min + (opaque_max - opaque_min) * 0.15,
            'average_glazed_surface_transmittance': glazed_min + (glazed_max - glazed_min) * 0.25
        }
    }
]

scenario_results = sensitivity_analyzer.compare_scenarios(
    cluster_df=data_with_clusters,
    scenarios=list_dict_scenarios,
    target='QHnd',
    feature_columns=models[0]['feature_columns'],
    modello=models[0]['best_model']
)
# EVALUATE SCENARIOS AND CREATE REPORT 
sensitivity_analyzer._plot_scenario_results(scenario_results, 'QHnd')
sensitivity_analyzer.create_scenario_report_html(
    scenario_results, 
    list_dict_scenarios, 
    'QHnd', 
    models[0]['feature_columns'],
    output_path = "../examples/example_results/scenario_analysis_report.html")

Web Application

An example of web app that uses pyBuildingCluster is available at: https://tools.eeb.eurac.edu/epc_clustering/piemonte/synthetic_epc

Geoclustering tool

In this case it is also possible to filter the data according to 12 types of tax filters defined, which take into account minimum conditions so that an EPC can be considered good or not.

N.B. Some parts of the app need to be updated and finalized.

Acknowledgment

This work was carried out within European projects: Moderate - Horizon Europe research and innovation programme under grant agreement No 101069834, with the aim of contributing to the development of open products useful for defining plausible scenarios for the decarbonization of the built environment

Moderate Logo

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

1.0.8

Jun 30, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pybuildingcluster-1.0.8.tar.gz (99.8 kB view details)

Uploaded Jun 30, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

pybuildingcluster-1.0.8-py3-none-any.whl (83.3 kB view details)

Uploaded Jun 30, 2025 Python 3

File details

Details for the file pybuildingcluster-1.0.8.tar.gz.

File metadata

Download URL: pybuildingcluster-1.0.8.tar.gz
Upload date: Jun 30, 2025
Size: 99.8 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for pybuildingcluster-1.0.8.tar.gz
Algorithm	Hash digest
SHA256	`557bd4b50744b389a0386d6e23c9bd029fe697a46016fb76dc8e05c0c034c02d`
MD5	`70ba5523bab249ed924f1f12f1312664`
BLAKE2b-256	`f4bcab4d6eb95f4374d5ef3d3eaa7bb97bb0cd4af48254882c0429d7ffc3c43e`

See more details on using hashes here.

File details

Details for the file pybuildingcluster-1.0.8-py3-none-any.whl.

File metadata

Download URL: pybuildingcluster-1.0.8-py3-none-any.whl
Upload date: Jun 30, 2025
Size: 83.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.12.6

File hashes

Hashes for pybuildingcluster-1.0.8-py3-none-any.whl
Algorithm	Hash digest
SHA256	`090a5fe8eb22a813028c1e1c914a4d8c038df8dbc34f4bd1358ba59d4e65c666`
MD5	`a1a0781408902de03c99836fcf9d2b84`
BLAKE2b-256	`b755267626a790111ae6896835332609b371cca3ba9b687b093e2ccf1e0d20ef`

See more details on using hashes here.

pybuildingcluster 1.0.8

Navigation

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Project description

PyBuildingCluster - (Geo)clustering, sensitivity analysis and scenario-based evaluations

Overview

Key Features

Installation

From PyPI (recommended)

From Source

Development Installation

Quick Start

Example

How to use the synthetic dataset and the library.

Python API

Web Application

Acknowledgment

Project details

Verified details

Maintainers

Meta

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes