Skip to main content

xleda is a Microsoft Excel powered EDA tool for Python data

Project description

License: MIT PyPI - Version PyPI - Python Version Downloads Buy Me A Coffee

xleda is a Microsoft Excel powered EDA tool for Python data.

  • Produces a Microsoft Excel workbook from a pandas dataframe that is highly optimized to both perform and document the activity of Exploratory Data Analysis .

  • Visually explore your data, navigate with your keyboard, take field or record notes, create lists of fields/records for editing, round-trip your edits/analysis back into python, share your workbook with other contributors.

  • There are some amazing EDA tools for Python. You shouldn't have to start from scratch to include Microsoft Excel among them.

  • xleda provides a good start to a robust EDA.

  • See some example xleda workbooks.


Example Top View
An xleda workbook made with diamond data.



Requirements/Compatibility

  • Requires Microsoft Excel to create the workbook.

  • It has been tested on Windows though it should also work on MacOS.

  • xleda workbooks should work in anything that reads Microsoft Excel workbooks.

Quick Start

from xleda import FieldAnalysis
import seaborn as sns

# < your dataframe goes here >
df = sns.load_dataset("titanic")
 

# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Create workbook
xleda.create_workbook()




Basic Metadata

Basic Metadata
Basic Metadata.

Overview

  • The overview worksheet rotates the basic metadata 90 degrees so that you can sort/filter fields by their name, metadata, or notes/definitions/etc.

Composition Table
Sorting fields from MLB data by memory usage.

xleda Configuration

input_df | Mandatory

  • A pandas dataframe of any size

name | Mandatory

  • Name of the dataset. Also used for the file name of the created workbook

theme_color | Optional

  • Sets the primary color of the charts and the color of the headings in the workbook to a hex color of your choice. theme_color="random" sets a random theme
  • Defaults to a neutral color
# Configure xleda
xleda = FieldAnalysis(input_df=df,
                      name=Theme Example Name,
                      theme_color="#031835")

Theme Colors
theme_color affects the workbooks and default charts.



add_plots | Optional

  • add_plots={'plotname': Figure, ...} will add additional worksheets with plots of your choosing.

    • No styling/sizing of additional plots is performed.

    • The example below adds two additional plot worksheets, one from seaborn and another from missingno. The workbook can be found here.

from xleda import FieldAnalysis

import matplotlib.pyplot as plt
import seaborn as sns
import missingno as msno
  

# < your dataframe goes here >
df = penguins = sns.load_dataset("penguins")
  

# Style the additional plots | optional
plt.style.use("dark_background")


# Create additional plots
pair_plots = sns.pairplot(df, hue="species").figure
null_matrix = msno.matrix(df).get_figure()


# Resize the null matrix  | optional
null_matrix.set_size_inches(9.35, 4.5)


# Configures xleda
xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      theme_color="#4C4C4C",
                      add_plots={'Pair Plots': pair_plots,
                                 'Null Matrix': null_matrix})

# Creates the workbook
xleda.create_workbook()

wb_path | Optional

  • Sets the target folder for the xleda workbook.
  • Uses a pathlib Path object.
  • Defaults to current working directory
from pathlib import Path

xleda = FieldAnalysis(input_df=df,
                      name="Penguins",
                      wb_path=Path(r"c:\my_target_folder"))

# Creates "Penguins.xlsm" in the "c:\my_target_folder" directory
xleda.create_workbook()

overwrite | Optional

  • Whether to overwrite existing workbooks of the same name.
  • Defaults to False

large_report | Optional

  • Raises the default dataframe size limits of 100,000 rows/100 columns to Excel's limits of 1,000,000 rows and 16,000 columns.
  • The closer your are to this limit, the longer it will take to produce.
  • See additional details in the performance section below.
  • Defaults to False

Usage Notes

Field/Record Lists

  • The Field Lists section helps you create lists of the fields in your data.

    • Anything not marked as False will be included in each list.

    • The Record List field added to your source data works the same way except it creates a list of records instead of a list of fields. More on that below.

    • You can see your lists in the Compiled Lists section.

    • You can rename Record List or any Field List to Anything You Want and the list will be renamed to anything_you_want.

    • The Compiled Lists section formats your lists as python lists for easy copy/pasting.

    • You can use export_analysis() to get your lists, and other things, back into Python. See details below.

    • Adding/deleting rows or columns on the Field Analysis worksheet may offset the formulas which compile your lists. Spot check them before using them if you have.


Field Lists
Easily create lists of fields in your data.



Record List Details

  • Two additional columns are added to your data to support being able to create a list of records for further processing.

    • Record Hash: Uses a built-in pandas feature hash_pandas_object to uniquely identify records. If two records share all column values they also share a Record Hash.

    • Record List: Used to create a list of Record Hash values.

Exporting back into Python

  • At some point, you'll likely need to get some of your analysis back into Python. export_analysis() exports your notes, definitions, lists (all pictured below) and more back into Python.

Completed Field Analysis
A completed xleda workbook of Titanic passenger showing definitions, notes, lists, etc.



Export Details
  • All exported data comes from the Field Analysis worksheet.

  • It is assumed you haven't altered the structure of your workbook such as adding/deleting rows or columns.

  • The dictionary includes:

    • description: Dataframe description if you've added one

    • definitions: Any field definitions you've added.

    • notes: Any field notes you've added

    • lists: Any lists showing in the compiled lists section

    • source_data: A copy of your unaltered source data that includes Record Hash/Record List columns.

    • altered_source_data: source data from the workbook that includes any manual edits you've made such as removing records, renaming fields, etc.

      ** Note that data types will likely change in the round-trip translation. **

Example Export Code
from xleda import FieldAnalysis

# After editing your workbook


# If you created your workbook somewhere else, configure xleda before export
xleda = FieldAnalysis(input_df=df,
                      name="Titanic")

# Export notes, definitions, data, etc from an xleda workbook
xleda_dict = xleda.export_analysis()

Example Export Dictionary

Export Dict
An example export from a completed field analysis on Titanic passenger data.



Performance

xleda creates workbooks for most data sets less than 20 seconds.

Create Example
The Penguin example from above that includes extra plots took only 7 seconds to create.



Limits with Large Data Sets

  • To ensure workbooks are created quickly, defaults limit data to the first 100 columns and a random sample of 100,000 records. You'll see a warning banner if you hit a limit.

Create Example


  • One of the more complex data sets tested was a 600 column/1,200 row dataframe.

    • It took ~12 minutes to create, in part because most values are unique for all 600 columns and xleda give you a top 5 members composition chart per column.

    • It is still snappy to use even though it has 1,200 charts on a single worksheet. That example is here.

VBA Code

  • There is a small amount of VBA code in the template that makes the sections expand/collapse when you select them as pictured above. If you can't or don't want to enable VBA, use row groupings as pictured below.

Row Groupings
Use row groupings to navigate if you can't use VBA.



Extensible

  • Because it's an ordinary workbook, you can use any tool that works with Microsoft Excel workbooks to do more. xlwings is recommended if you need more.

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xleda-0.8.172.tar.gz (154.0 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xleda-0.8.172-py3-none-any.whl (153.9 kB view details)

Uploaded Python 3

File details

Details for the file xleda-0.8.172.tar.gz.

File metadata

  • Download URL: xleda-0.8.172.tar.gz
  • Upload date:
  • Size: 154.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xleda-0.8.172.tar.gz
Algorithm Hash digest
SHA256 0cff9b3fa440b7c592ff237b8e2d882bd4494281186f38f2582a8ec3370272d1
MD5 59152b3bb14aab4118584115e560cc08
BLAKE2b-256 aafed3093ed8c03d1b433b50e29f0669cb93fa683c2dc40ac987a35a6dbfb613

See more details on using hashes here.

File details

Details for the file xleda-0.8.172-py3-none-any.whl.

File metadata

  • Download URL: xleda-0.8.172-py3-none-any.whl
  • Upload date:
  • Size: 153.9 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: uv/0.11.15 {"installer":{"name":"uv","version":"0.11.15","subcommand":["publish"]},"python":null,"implementation":{"name":null,"version":null},"distro":{"name":"Ubuntu","version":"24.04","id":"noble","libc":null},"system":{"name":null,"release":null},"cpu":null,"openssl_version":null,"setuptools_version":null,"rustc_version":null,"ci":true}

File hashes

Hashes for xleda-0.8.172-py3-none-any.whl
Algorithm Hash digest
SHA256 50035ee4f385326e40b0bc834182d28241c7be50968d308d054c6480e112f4cc
MD5 c5e29f5cc3275b33faf883e734ff3abb
BLAKE2b-256 82fe7cd7b5cbb2e86c859a0fe9840a4e9090c542a62bf92e0792442026939e0c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page