Skip to main content

xleda is a Microsoft Excel powered EDA tool for Python data

Project description

License: MIT PyPI - Version PyPI - Python Version Downloads Buy Me A Coffee

xleda is a Microsoft Excel powered EDA tool for Python data.

  • Produces a Microsoft Excel workbook from a pandas dataframe that is highly optimized to both perform and document the activity of Exploratory Data Analysis .

  • Visually explore your data, navigate with your keyboard, take field or record notes, create lists of fields/records for editing, round-trip your edits/analysis back into python, share your workbook with other contributors.

  • There are some amazing EDA tools for Python. You shouldn't have to start from scratch to include Microsoft Excel among them.

  • xleda provides a good start to a robust EDA.

  • See some example xleda workbooks.


Example Top View
An xleda workbook made with diamond data.



Requirements/Compatibility

  • Requires Microsoft Excel to create the workbook.

  • It has been tested on Windows though it should also work on MacOS.

  • xleda workbooks should work in anything that reads Microsoft Excel workbooks.

Quick Start

from xleda import FieldAnalysis
import seaborn as sns

# < your dataframe goes here >
df = sns.load_dataset("titanic")
 

# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Create workbook
xleda.create_workbook()



Basic xleda Components

Field Metadata

Basic Metadata
Basic Metadata.

Overview

  • The Overview worksheet rotates the field metadata 90 degrees so that you can sort/filter fields by their name, metadata, or notes/definitions/etc.

Composition Table
Sorting fields from MLB data by memory usage.

Per Field Charts

Two charts are produced for each column in your dataframe.

  1. A composition table showing the top 5 values per column and their percentages.
  2. A histogram/KDE showing min/max, distribution, and mean

Default Charts
Default charts for MLB player height.

Source Data Table

  • A copy of your source data is included as an Excel table so you can visually inspect it, sort/filter it, etc.

  • Includes a way to make lists of individual records. More on that below.

  • Includes a way to round-trip your source data back into Python so that you can use Excel to replace values, delete records/columns/etc. More on that below.

Source Data
Source Data Table for MLB player data.

Blanks

  • A pivot table configured to drill down into where blank values are.
  • The first 10 fields of the source data are added by default.

Blanks
Blanks with Titanic survivor data.

Pivot

  • A bare-bones pivot table ready to be configured.
  • The first 10 fields of the source data are added by default.

Pivot
Pivot using Titanic survivor data.


xleda Configuration

input_df | Mandatory

  • A pandas dataframe of any size

name | Mandatory

  • Name of the dataset. Also used for the file name of the created workbook

theme_color | Optional

  • Sets the primary color of the charts and the color of the headings in the workbook to a hex color of your choice. theme_color="random" sets a random theme
  • Defaults to a neutral color
# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name=Theme Example Name,
                      theme_color="#031835")

Theme Colors
theme_color affects the workbooks and default charts.



add_plots | Optional

  • add_plots={'plotname': Figure, ...} will add additional worksheets with plots of your choosing.

    • No styling/sizing of additional plots is performed.

    • The example below adds two additional plot worksheets, one from seaborn and another from missingno. The workbook can be found here.

from xleda import FieldAnalysis

import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
  

# < your dataframe goes here >
df = penguins = sns.load_dataset("penguins")
  

# Style the additional plots | optional
plt.style.use("dark_background")


# Create additional plots
pair_plots = sns.pairplot(df, hue="species").figure
null_matrix = msno.matrix(df).get_figure()


# Resize the null matrix  | optional
null_matrix.set_size_inches(9.35, 4.5)


# Configures xleda
xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      theme_color="#4C4C4C",
                      add_plots={'Pair Plots': pair_plots,
                                 'Null Matrix': null_matrix})

# Creates the workbook
xleda.create_workbook()

wb_path | Optional

  • Sets the target folder for the xleda workbook.
  • Uses a pathlib Path object.
  • Defaults to current working directory
from pathlib import Path

xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      wb_path=Path(r"c:\my_target_folder"))

# Creates "Penguins.xlsm" in the "c:\my_target_folder" directory
xleda.create_workbook()

overwrite | Optional

  • Whether to overwrite existing workbooks of the same name.
  • Defaults to False

large_report | Optional

  • Raises the default dataframe size limits of 100,000 rows/100 columns to Excel's limits of 1,000,000 rows and 16,000 columns.
  • The closer your are to this limit, the longer it will take to produce.
  • See additional details in the performance section below.
  • Defaults to False

no_vba | Optional

  • Creates the workbook as an xlsx file without VBA.
  • Defaults to False

Usage Notes

Field/Record Lists

  • The Field Lists section helps you create lists of the fields in your data.

    • Anything not marked as False will be included in each list.

    • The Record List field added to your source data works the same way except it creates a list of records instead of a list of fields. More on that below.

    • You can see your lists in the Compiled Lists section.

    • You can rename Record List or any Field List to Anything You Want and the list will be renamed to anything_you_want.

    • The Compiled Lists section formats your lists as python lists for easy copy/pasting.

    • You can use export_analysis() to get your lists, and other things, back into Python. See details below.

    • Adding/deleting rows or columns on the Field Analysis worksheet may offset the formulas which compile your lists. Spot check them before using them if you have.


Field Lists
Easily create lists of fields in your data.



Record List Details

  • Two additional columns are added to your data to support being able to create a list of records for further processing.

    • Record Hash: Uses a built-in pandas feature hash_pandas_object to uniquely identify records. If two records share all column values they also share a Record Hash.

    • Record List: Used to create a list of Record Hash values.

Exporting back into Python

  • At some point, you'll likely need to get some of your analysis back into Python. export_analysis() exports your definitions, notes, lists, and more back into Python.

Completed Field Analysis
A completed xleda workbook of Titanic passenger showing definitions, notes, lists, etc.



Export Details
  • All exported data comes from the Field Analysis worksheet.

  • The export dictionary includes:

    • description: Dataframe description if you've added one

    • definitions: Any field definitions you've added.

    • notes: Any field notes you've added

    • lists: Any lists showing in the compiled lists section

    • source_data: A copy of your unaltered source data that includes Record Hash/Record List columns.

    • altered_source_data: Source Data Table from the workbook that includes any manual edits you've made such as removing records, renaming fields, replacing values, etc. **

      ** Note that data types will likely change in the round-trip translation. **

Example Export Code
from xleda import FieldAnalysis

# After editing your workbook


# If you created your workbook somewhere else, configure xleda before export
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Export notes, definitions, data, etc from an xleda workbook
xleda_dict = xleda.export_analysis()

Example Export Dictionary

Export Dict
An example export from a completed field analysis on Titanic passenger data.



Exporting Cautions
Spot Check your Lists
  • Most of the few formulas in an xleda workbook are used to compile your lists.

  • If you've added/deleted rows/columns, double-check that your lists show correctly in the Compiled Lists section before using them downstream.

Round-Tripping Source Data Edits
  • altered_source_data will likely change the field data types vs what was in the original dataframe.

  • The Source Data table shouldn't be renamed in Excel. Any other edits such as deleting rows/columns, editing values, etc. are fine.

Performance

xleda creates workbooks for most data sets less than 20 seconds.

Create Example
The Penguin example from above that includes extra plots took only 7 seconds to create.



Limits with Large Data Sets

  • To ensure workbooks are created quickly, defaults limit data to the first 100 columns and a random sample of 100,000 records. You'll see a warning banner if you hit a limit.

Create Example


  • One of the more complex data sets tested was a 600 column/1,200 row dataframe.

    • It took ~5 minutes to create, in part because most values are unique for all 600 columns and xleda give you a top 5 members composition chart per column.

    • It is still snappy to use even though it has 1,200 charts on a single worksheet. That example is here.

VBA Code

  • There is a small amount of VBA code in the template that makes the sections expand/collapse when you select them as pictured above. If you can't or don't want to enable VBA, use row groupings as pictured below.

Row Groupings
Use row groupings to navigate if you can't use VBA.



Extensible

  • Because it's an ordinary workbook, you can use any tool that works with Microsoft Excel workbooks to do more. xlwings is recommended if you need more.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xleda-0.8.180.tar.gz (59.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xleda-0.8.180-py3-none-any.whl (58.7 kB view details)

Uploaded Python 3

File details

Details for the file xleda-0.8.180.tar.gz.

File metadata

  • Download URL: xleda-0.8.180.tar.gz
  • Upload date:
  • Size: 59.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xleda-0.8.180.tar.gz
Algorithm Hash digest
SHA256 593a559d3342de75f98fcbded13a3c22619a493e68f97897169fc6fb654fe69b
MD5 6079c5f2a26762b7f7a005053342c18a
BLAKE2b-256 8ff229311a931c6d800ce0382993d8a146f07bda89819ff335dc989ec255098f

See more details on using hashes here.

File details

Details for the file xleda-0.8.180-py3-none-any.whl.

File metadata

  • Download URL: xleda-0.8.180-py3-none-any.whl
  • Upload date:
  • Size: 58.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.16 {"installer":{"name":"uv","version":"0.11.16","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xleda-0.8.180-py3-none-any.whl
Algorithm Hash digest
SHA256 2199061046c42cfb95527568b7d06589787f5f5adf8e647c86e3895979ca5cd5
MD5 c176b8a6b2b2270f7cd4adabd3a70db7
BLAKE2b-256 6e3be2b4ee0f5f3b32fc4c1a0280930bf680bd4e13d077d749eba9017a3ddf8c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page