Skip to main content

A package for spreadsheet data loader to process with LLM

Project description

logo

🔭📊 Spreadsheet Intelligence

Apache 2.0 License API Reference PyPI version

⚡ Quick Install

With pip:

pip install spreadsheet_intelligence

🤔 What is Spreadsheet Intelligence?

Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.

Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.

The paper is available on arXiv.

arXiv Static Badge

🚀 Quick Start

from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader

loader = ExcelAutoshapeLoader(file_path="path/to/your/excel/file.xlsx")
loader.load()
autoshape_info_json = loader.export2json()
print(autoshape_info_json)

The output is as follows

{
    "connectors": [
        {
            "type": "straightConnector1",
            "arrowType": "bidirectional",
            "color": "#000000",
            "startX": "8.47",
            "startY": "8.77",
            "endX": "18.30",
            "endY": "8.77"
        },
        {
            "type": "bentConnector3",
            "arrowType": "unidirectional",
            "color": "#000000",
            "startX": "14.75",
            "startY": "4.74",
            "StartArrowHeadDirection": "left",
            "endX": "21.59",
            "endY": "6.00",
            "EndArrowHeadDirection": "right"
        }
        ...
    ],
    "shapes": [
        {
            "shapeType": "round_rect",
            "fillColor": "#156082",
            "borderColor": "#0E2841",
            "left": "1.41",
            "top": "5.52",
            "right": "39.13",
            "bottom": "23.40",
            "text": null
        },
        {
            "shapeType": "rect",
            "fillColor": "#000000",
            "borderColor": "#000000",
            "left": "5.17",
            "top": "19.07",
            "right": "9.27",
            "bottom": "19.87",
            "text": {
                "content": "Azure Cognitive Search",
                "fontColor": null,
                "fontSize": null,
                "alignment": null
            }
        },
        ...
    ]
}

🗂️ Project Structure

This package is mainly composed of five packages: core, models, parsers, converters, and formatters.

spreadsheet_intelligence/
├── core/
│   ├── excel_autoshape_loader.py
├── models/
│   ├── converted/
│   ├── raw/
├── parsers/
├── converters/
├── formatters/
├── ...

Basic Processing Flow

The Excel file loaded as XML is processed in the following flow:

  1. It is parsed by parsers in a nearly raw state and stored in Raw models.
  2. It is converted by converters from the XML representation to a structure that is easy for humans (LLM) to understand and stored in Converted models.
  3. It is converted by formatters from the Converted models to JSON format data that can be directly used in LLM prompts.

Basically, by using the ExcelAutoshapeLoader class in the core package, this flow can be wrapped and executed.

Customizability

Mainly extendable in the following ways:

  • Extend the data retrieved from XML -> Extend by inheriting from parsers
  • Extend the data conversion methods -> Extend by inheriting from converters
  • Extend the output data format -> Extend by inheriting from formatters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

spreadsheet_intelligence-0.1.0.tar.gz (36.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

spreadsheet_intelligence-0.1.0-py3-none-any.whl (47.1 kB view details)

Uploaded Python 3

File details

Details for the file spreadsheet_intelligence-0.1.0.tar.gz.

File metadata

  • Download URL: spreadsheet_intelligence-0.1.0.tar.gz
  • Upload date:
  • Size: 36.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/6.1.0 CPython/3.12.9

File hashes

Hashes for spreadsheet_intelligence-0.1.0.tar.gz
Algorithm Hash digest
SHA256 8aa46e5b55c96088fa453b18de3992c2f080b0e82c4e370897aa4fff26083eb5
MD5 6618ec937d2fc1cce7366f5f5336bd6e
BLAKE2b-256 8167cc9a7c6c8edf9b9b545a8d23ac015100ee9b86f99a8515d4f64fb4779cd8

See more details on using hashes here.

Provenance

The following attestation bundles were made for spreadsheet_intelligence-0.1.0.tar.gz:

Publisher: publish-pypi.yml on galirage/spreadsheet-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file spreadsheet_intelligence-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for spreadsheet_intelligence-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 9d716332f5714b2d4acf1fa1fbf38d568fe3a41df4e53d6cb5e28a6549a56d9e
MD5 0c89261daeef0bc02aefb969ea42d2ab
BLAKE2b-256 4068ca2f3f96e72f4688bc6c97d14a5a201f685cf8b20cf4f98c2c0ccbc89b75

See more details on using hashes here.

Provenance

The following attestation bundles were made for spreadsheet_intelligence-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on galirage/spreadsheet-intelligence

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page