Skip to main content

An extensible toolkit for Goal Oriented Analysis of Data

Project description

GOAD🐐 is the GOAT - Goal Oriented Analysis of Data

uv image

GOAD🐐 - When your data analysis is so fire🔥 it's got rizz✨

GOAD🐐

GOAD🐐 is a flexible Python package for analyzing, transforming, and visualizing data with an emphasis on statistical distribution fitting and modular visualization components.

📊 Features

  • Composable & extendable plotting system - Build complex visualizations by combining simple components. You can extend the existing components with your own.
  • Statistical distribution fitting - Automatically fit and compare distributions to your data. The distribution registry is extendable with additional distributions.
  • Extendable data transformation pipelines - Chain and reuse data transformations into pipelines. Again, extendable with custom transformation components.

Before GOAD🐐 : mid data After GOAD🐐 : data got infinity aura

🚀 Quick Start

Installation

Using uv:

uv install goad-toolkit

Or, if you prefer your dependencies to be installed 100x slower, with pip:

pip install goad-toolkit

📋 Demo: Linear Model Analysis

GOAD🐐 includes a comprehensive demo that shows how to use its components together.

Main capabilities

In the demo/linear.py file you can see a showcase of the main capabilities of GOAD🐐:

  • create a data processing pipeline
  • components are extendable, so you can easily add your own steps to a pipeline
  • create visualisations by stacking components. BasePlot will handle boilerplate.
  • the DistributionFitter will try to fit a few common distributions, and add statistical tests for you
  • The results work together with the visualizer.PlotFits class to show the results

The main strenght of this module is not that these elements are there (even thought they are very useful). Its superpower is that everything is extendable: so you can use this as a start, and extend it with your own visualisations and analytics.

POV: Your data just got GOADed🐐 and now it's giving main character energy

📚 Core Components

🔄 Extendable Data Transforms

GOAD🐐 provides a pipeline approach to transform your data:

from goad_toolkit.datatransforms import Pipeline, ShiftValues, ZScaler

# Create a pipeline
pipeline = Pipeline()

# Add transformations
pipeline.add(ShiftValues, name="shift_deaths", column="deaths", period=-14)
pipeline.add(ZScaler, name="scale_tests", column="positivetests", rename=True)

# Apply all transformations
result = pipeline.apply(data)

Available transforms include:

  • ShiftValues - Shift values in a column by a specified period
  • DiffValues - Calculate the difference between consecutive values
  • SelectDataRange - Select rows within a specified date range
  • RollingAvg - Calculate rolling average of a column
  • ZScaler - Standardize values in a column

You can extend the pipeline with your own transformations by subclassing BaseTransform. The Zscaler is implemented as follows:

class ZScaler(TransformBase):
    """Standardize the values in a column."""
    def transform(
        self, data: pd.DataFrame, column: str, rename: bool = False
    ) -> pd.DataFrame:
        """Standardize the values in a column."""
        if rename:
            colname = f"{column}_zscore"
        else:
            colname = column
        data[colname] = (data[column] - data[column].mean()) / data[column].std()
        return data

📊 Visualization System

GOAD🐐 visualization system is built on a composable architecture that allows you to build complex plots by combining simpler components:

from goad_toolkit.visualizer import PlotSettings, ResidualPlot

# Create plot settings
plotsettings = PlotSettings(
        xlabel="date",
        ylabel="normalized values",
        title="Z-Scores of Deaths and Positive Tests",
    )

class LinePlot(BasePlot):
    """Plot a line plot using seaborn."""
    def build(self, data: pd.DataFrame, **kwargs):
        sns.lineplot(data=data, ax=self.ax, **kwargs)
        return self.fig, self.ax


class ComparePlot(BasePlot):
    def build(self, data: pd.DataFrame, x: str, y1: str, y2: str, **kwargs):
        compare = LinePlot(self.settings)
        self.plot_on(compare, data=data, x=x, y=y1, label=y1, **kwargs)
        self.plot_on(compare, data=data, x=x, y=y2, label=y2, **kwargs)
        plt.xticks(rotation=45)

        return self.fig, self.ax

compareplot = ComparePlot(plotsettings)
compareplot.plot(
        data=data, x="date", y1="deaths_shifted_zscore", y2="positivetests_zscore"
    )

zscore This extendable strategy lets BasePlot handle the boilerplate, while you can focus on creating the visualizations you need. It is also easier to reuse components in different contexts.

📈 Distribution Fitting

GOAD🐐 includes tools for fitting statistical distributions to your data:

from goad_toolkit.analytics import DistributionFitter
from goad_toolkit.visualizer import PlotSettings, FitPlotSettings, PlotFits

fitter = DistributionFitter()
fits = fitter.fit(data["residual"], discrete=False) # we have to decide if the data is discrete or not
best = fitter.best(fits)
settings = PlotSettings(
    figsize=(12, 6), title="Residuals", xlabel="error", ylabel="probability"
)
fitplotsettings = FitPlotSettings(bins=30, max_fits=3)
fitplotter = PlotFits(settings)
fig = fitplotter.plot(
    data=data["residual"], fit_results=fits, fitplotsettings=fitplotsettings
)

For the kstest, the null hypothesis is that the two distributions are identical. In this example, the p-values are below 0.05, so we can reject the null hypothesis and conclude that the data does not follow any of these.

The plots are sorted by log-likelihood, which means there is no good fit with a distribution in this case. residuals

🧩 Extending with Custom Distributions

You can easily register new distributions:

from goad_toolkit.distributions import DistributionRegistry
from scipy import stats

# Create registry
registry = DistributionRegistry()

# Register a new distribution
registry.register_distribution(
    name="negative_binomial",
    dist=stats.nbinom,
    is_discrete=True,
    num_params=2
)

# Now it will be used automatically in the  DistributionFitter for discrete fits
from goad_toolkit.analytics import DistributionFitter
fitter = DistributionFitter()
print(fitter.registry) # shows all registered distributions

🔧 Advanced Usage: Composing Plots

GOAD🐐 has a powerful plotting system that allows you to combine plot elements:

from goad_toolkit.visualizer import BasePlot, LinePlot, BarWithDates, VerticalDate

# Use a base plot to create a composite
class MyCompositePlot(BasePlot):
    def build(self, data: pd.DataFrame, x: str, y1: str, y2: str, special_date: str):
        # Plot the first component - a line plot
        line_plot = LinePlot(self.settings)
        self.plot_on(line_plot, data=data, x=x, y=y1, label=y1)

        # Plot the second component - a bar chart
        bar_plot = BarWithDates(self.settings)
        self.plot_on(bar_plot, data=data, x=x, y=y2)

        # Add a vertical line
        vline = VerticalDate(self.settings)
        self.plot_on(vline, date=special_date, label="Important Event")
        return self.fig, self.ax

🤝 Contributing

Contributions are welcome! Please feel free to submit a Pull Request.


GOAD🐐 - When your data analysis is so fire🔥 it's got rizz✨

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

goad_toolkit-0.1.4-py3-none-any.whl (17.9 kB view details)

Uploaded Python 3

File details

Details for the file goad_toolkit-0.1.4-py3-none-any.whl.

File metadata

File hashes

Hashes for goad_toolkit-0.1.4-py3-none-any.whl
Algorithm Hash digest
SHA256 ae826176db4f346cee060389c1e995394afa78a6be83897bc1e84d97499bccf9
MD5 f550096093be797d96391009c6e7117a
BLAKE2b-256 e5f5f3ec1794b9bae6c61851892e5ef4defc901d29c57b7f9081b8a8f1615909

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page