Write a python function to calculate your metric(s), and run it over all clusters in your data. Find the cuts of the data driving your metrics.
Project description
hot-spot-analysis
A brief description of your package.
Installation
pip install hot-spot-analysis
Python Import
from hot_spot_analysis.hot_spot_analysis import HotSpotAnalyzer
HSA = HotSpotAnalyzer(...)
Quickstart
Short Theoretical Demonstration:
If we have 3 columns [a, b, c], and we want to cut our data using those columns we would have to group our data as such to know all of the interactions' impact on our metric of interest. And this problem becomes increasingly complicated as we increase the number of columns.
Interacting 3 columns: [a, b, c] -> 7 valid data cuts
- @ depth = 1: [a,b,c] <- 3 data cuts
- @ depth = 2: [ab,ac,bc] <- 3 data cuts
- @ depth = 3: [abc] <- 1 data cuts
A simple example of Hot Spot Analysis (HSA)
Example - Input Data
column1 | column2 | Value |
---|---|---|
A | X | 10 |
A | Y | 20 |
B | X | 30 |
B | Y | 40 |
C | X | 50 |
C | Y | 60 |
Example - Simple metric function
# Metric function
def metric_function(group):
return {
'sum_value': group['Value'].sum()
}
Example Run HSA
from hot_spot_analysis.hot_spot_analysis import HotSpotAnalyzer
HSA = HotSpotAnalyzer(
data=example_data, # See above
target_cols=["column1", "column2"],
objective_function=metric_function, # See above
)
HSA.run_hsa()
hsa_data = HSA.export_hsa_output_df()
print(hsa_data.head(10))
Below is a simplified example of the HSA output
group | n_rows | sum_value |
---|---|---|
{'column1': 'A'} | 2 | 30 |
{'column1': 'B'} | 2 | 70 |
{'column1': 'C'} | 2 | 110 |
{'column2': 'X'} | 3 | 90 |
{'column2': 'Y'} | 3 | 120 |
{'column1': 'A', 'column2': 'X'} | 1 | 10 |
{'column1': 'A', 'column2': 'Y'} | 1 | 20 |
{'column1': 'B', 'column2': 'X'} | 1 | 30 |
{'column1': 'B', 'column2': 'Y'} | 1 | 40 |
{'column1': 'C', 'column2': 'X'} | 1 | 50 |
{'column1': 'C', 'column2': 'Y'} | 1 | 60 |
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
hot_spot_analysis-1.0.4.tar.gz
(14.3 kB
view hashes)
Built Distribution
Close
Hashes for hot_spot_analysis-1.0.4-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 54c62f3f9cf3e268720c0a63a8c6eeb7780c51830dd97bebe50c46621142e350 |
|
MD5 | 705c4d3136b8b4c1b50e80765748f5a2 |
|
BLAKE2b-256 | ef5ef0402896be6b2c41c37e1f1f370cd3769ee6680d0b1b74cc5d0fccd7d6a7 |