Advanced Excel data processing and manipulation library for Python

These details have not been verified by PyPI

Project links

Project description

AdvancedExcel: Simplifying Excel Data Processing in Python

v2.0+ note: The public API now uses PEP 8 snake_case method names (e.g. get_sheet, remove_empty_rows, get_dataframe_blocks_by_key_name). This is a breaking change from v1.x.

The AdvancedExcel class provides a powerful and convenient way to interact with Excel files in Python. It streamlines common Excel data processing tasks, from reading and extracting data to cleaning, transforming, and manipulating it using pandas DataFrames. This class aims to reduce the boilerplate code often associated with Excel handling, making your data analysis workflows more efficient and readable.

Key Strengths:

Effortless Data Extraction: Easily read data from Excel sheets, including handling complexities like merged cells, multiple tables within a sheet, and various data types. Target specific tables using row and column indices or header values.
Comprehensive DataFrame Manipulation: A rich set of methods for cleaning, transforming, and manipulating pandas DataFrames. Perform operations like removing rows/columns based on criteria, filling missing values using various strategies (forward fill, left fill), merging and splitting columns, changing data types, and more.
Flexible Header and Index Management: Gain fine-grained control over headers and indices. Set rows as headers, add headers as rows, reset indices, and drop unnecessary index levels.
Streamlined String and Value Processing: Simplify string manipulation with methods for stripping whitespace, replacing values (including using regular expressions), and changing the case of strings (uppercase, lowercase, title case).
Convenient Utility Functions: Includes helper functions for common tasks such as transposing DataFrames, merging multiple DataFrames, extracting filenames from paths, and getting sheet names without fully opening the Excel file.
Seamless Integration with sspipe: Designed for smooth integration with the sspipe library (https://github.com/sspipe/sspipe). Chain method calls in a readable and functional style, creating elegant and concise data pipelines.

Why use AdvancedExcel?

Working with Excel files in Python often involves repetitive and verbose code. AdvancedExcel encapsulates these common tasks into reusable methods, promoting code reusability, readability, and maintainability. It handles many of the edge cases and complexities of Excel data, allowing you to focus on the core logic of your data analysis. The integration with sspipe further enhances code clarity by enabling a functional programming paradigm.

Getting Started

Create a Virtual Environment.

py -m venv venv

Activate the virtual environment

venv\Scripts\activate.bat (in Linux source ./venv/bin/activate )

Update PIP

py -m pip install -U pip

Installation

pip install advanced-excel

For the functional piping examples using sspipe:

pip install "advanced-excel[sspipe]"

For development:

pip install -e ".[dev,sspipe]"

Dependencies

Core dependencies (pandas, openpyxl, etc.) are declared in pyproject.toml. Optional extras are available for sspipe support and development.

Usage

Basic usage

from advanced_excel import AdvancedExcel

data = AdvancedExcel("simple_excel.xlsx")
sheet = data.get_sheet("Data")
result = data.get_next_value(sheet, "Study Number")
print(result)

Note: The recommended import is from advanced_excel import AdvancedExcel (this is the stable API as of v2.0).

Example using sspipe for complex chaining (requires `pip install "advanced-excel[sspipe]"`):

from sspipe import p
import pandas as pd
from advanced_excel import AdvancedExcel

excel_processor = AdvancedExcel("complex_excel.xlsx")

raw = excel_processor.get_sheet("Raw data")
sheet1 = excel_processor.get_sheet("Sheet1")
sheet2 = excel_processor.get_sheet("Sheet2")

final_df_chained = (
    raw
    | p(excel_processor.remove_empty_rows)
    | p(excel_processor.set_column_to_numeric, columnName="Quantity")
    | p(pd.merge,
        sheet1
        | p(excel_processor.remove_empty_rows)
        | p(excel_processor.rename_headers, {"OldName": "CategoryName"}),
        on="ProductID", how="left")
    | p(pd.merge,
        sheet2
        | p(excel_processor.remove_empty_rows)
        | p(excel_processor.strip_all)
        | p(excel_processor.remove_duplicates),
        left_index=True, right_index=True)
)

Another example of usage on a complex Excel (multiple tables per sheet):

from advanced_excel import AdvancedExcel
from sspipe import p   # optional, for nice chaining

data = AdvancedExcel("studies_excel.xlsx")
results = data.get_sheet("Raw data") | p(data.get_dataframe_blocks_by_key_name, "Batch Number")

all_tables = []
for dblock in results:
    study_num = data.get_next_value(dblock, "Study No")
    plan_num = data.get_next_value(dblock, "Plan No")

    table = (
        dblock
        | p(data.get_dataframe_blocks_by_key_name, "Batch Number")
        | (lambda blocks: blocks[0] if blocks else pd.DataFrame())
        | p(data.replace_nan_strings)
        | p(data.remove_empty_rows)
        | p(data.merge_rows, 0, 2)
        | p(data.set_row_as_header, 0)
        | p(data.remove_rows_if_unique_value_on_bottom, 2)
        | p(data.merge_columns, 1, 2, '_', 'SAMPLE_REPLICATE')
        | p(data.remove_column_by_name, "Sample #_Replicat #")
    )
    all_tables.append(table)

from advanced_excel import AdvancedExcel
from sspipe import p
import pandas as pd

data = AdvancedExcel("studies_excel.xlsx")

columns_to_rename = {"PLATEFORM": "PLATFORM"}
columns_not_to_clean = ["PARAMETER"]

results = (
    data.get_sheet("data_raw")
    | p(data.case_headers)
    | p(data.rename_headers, columns_to_rename)
    | p(data.strip_all)
    | p(data.remove_column_by_name, r"^T\d+$")
    | p(data.remove_column_by_name, r"^T\d+\.\d+$")
    | p(data.remove_all_crlf, ";")
    | p(data.replace_nan_strings)
    | p(data.replace_spaces_by_separator, ";", [], columns_not_to_clean)
    | p(data.replace_all, ",;", ";", [], columns_not_to_clean)
    | p(data.replace_all, "tbd", "TBD", [], [])
    # | p(data.replace_by_dictionary, unit_map, "UNIT")
    | p(data.case_column_values, "PARAMETER", title=True)
    # | p(data.replace_by_dictionary, variable_map, "PARAMETER")
)

Project details

These details have not been verified by PyPI

Project links

Release history Release notifications | RSS feed

This version

2.0.0

Jun 9, 2026

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

advanced_excel-2.0.0.tar.gz (34.6 kB view details)

Uploaded Jun 9, 2026 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

advanced_excel-2.0.0-py3-none-any.whl (29.3 kB view details)

Uploaded Jun 9, 2026 Python 3

File details

Details for the file advanced_excel-2.0.0.tar.gz.

File metadata

Download URL: advanced_excel-2.0.0.tar.gz
Upload date: Jun 9, 2026
Size: 34.6 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for advanced_excel-2.0.0.tar.gz
Algorithm	Hash digest
SHA256	`c70e8ba5d1055a934daf08bc9e98161a043d9bfade0bd2b7b7692ae96be8489d`
MD5	`43f4de2002d9983c88f86acf7f3dd644`
BLAKE2b-256	`a7c42aa005607a06eedba2ce1c082e1f40717be59f495f3c867ef3a2acaf71fd`

See more details on using hashes here.

File details

Details for the file advanced_excel-2.0.0-py3-none-any.whl.

File metadata

Download URL: advanced_excel-2.0.0-py3-none-any.whl
Upload date: Jun 9, 2026
Size: 29.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.1.0 CPython/3.13.13

File hashes

Hashes for advanced_excel-2.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`2c495591eb9e9003adfa45c69bf3d45d3cbfd4b3e915d8fe87a3907b257fee3d`
MD5	`7cc58cc3873e8f9aa63f0175fc0242e0`
BLAKE2b-256	`c239e384e0d7ff31d0eddad8371f9b7729b44dce550b5364882937bf67a27fe6`

See more details on using hashes here.

advanced-excel 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AdvancedExcel: Simplifying Excel Data Processing in Python

Getting Started

Installation

Dependencies

Usage

Basic usage

Example using sspipe for complex chaining (requires `pip install "advanced-excel[sspipe]"`):

Another example of usage on a complex Excel (multiple tables per sheet):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

advanced-excel 2.0.0

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

AdvancedExcel: Simplifying Excel Data Processing in Python

Getting Started

Installation

Dependencies

Usage

Basic usage

Example using sspipe for complex chaining (requires pip install "advanced-excel[sspipe]"):

Another example of usage on a complex Excel (multiple tables per sheet):

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

Example using sspipe for complex chaining (requires `pip install "advanced-excel[sspipe]"`):