Skip to main content

A package for spreadsheet data loader to process with LLM

Project description

galirage_logo

🔭📊 Spreadsheet Intelligence

Apache 2.0 License API Reference PyPI version

⚡ Quick Install

With pip:

pip install spreadsheet_intelligence

🤔 What is Spreadsheet Intelligence?

Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.

Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.

The paper is available on arXiv.

arXiv Static Badge

🚀 Quick Start

from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader

loader = ExcelAutoshapeLoader(file_path="path/to/your/excel/file.xlsx")
loader.load()
autoshape_info_json = loader.export2json()
print(autoshape_info_json)

The output is as follows

{
    "connectors": [
        {
            "type": "straightConnector1",
            "arrowType": "bidirectional",
            "color": "#000000",
            "startX": "8.47",
            "startY": "8.77",
            "endX": "18.30",
            "endY": "8.77"
        },
        {
            "type": "bentConnector3",
            "arrowType": "unidirectional",
            "color": "#000000",
            "startX": "14.75",
            "startY": "4.74",
            "StartArrowHeadDirection": "left",
            "endX": "21.59",
            "endY": "6.00",
            "EndArrowHeadDirection": "right"
        }
        ...
    ],
    "shapes": [
        {
            "shapeType": "round_rect",
            "fillColor": "#156082",
            "borderColor": "#0E2841",
            "left": "1.41",
            "top": "5.52",
            "right": "39.13",
            "bottom": "23.40",
            "text": null
        },
        {
            "shapeType": "rect",
            "fillColor": "#000000",
            "borderColor": "#000000",
            "left": "5.17",
            "top": "19.07",
            "right": "9.27",
            "bottom": "19.87",
            "text": {
                "content": "Azure Cognitive Search",
                "fontColor": null,
                "fontSize": null,
                "alignment": null
            }
        },
        ...
    ]
}

🗂️ Project Structure

This package is mainly composed of five packages: core, models, parsers, converters, and formatters.

spreadsheet_intelligence/
├── core/
│   ├── excel_autoshape_loader.py
├── models/
│   ├── converted/
│   ├── raw/
├── parsers/
├── converters/
├── formatters/
├── ...

Basic Processing Flow

The Excel file loaded as XML is processed in the following flow:

  1. It is parsed by parsers in a nearly raw state and stored in Raw models.
  2. It is converted by converters from the XML representation to a structure that is easy for humans (LLM) to understand and stored in Converted models.
  3. It is converted by formatters from the Converted models to JSON format data that can be directly used in LLM prompts.

Basically, by using the ExcelAutoshapeLoader class in the core package, this flow can be wrapped and executed.

Customizability

Mainly extendable in the following ways:

  • Extend the data retrieved from XML -> Extend by inheriting from parsers
  • Extend the data conversion methods -> Extend by inheriting from converters
  • Extend the output data format -> Extend by inheriting from formatters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gg_research_community_playbook-0.1.0.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gg_research_community_playbook-0.1.0-py3-none-any.whl (47.3 kB view details)

Uploaded Python 3

File details

Details for the file gg_research_community_playbook-0.1.0.tar.gz.

File metadata

File hashes

Hashes for gg_research_community_playbook-0.1.0.tar.gz
Algorithm Hash digest
SHA256 f9858e471a3f743603195941d4121bacf4771668ce7f245ad75052bf1638c7d8
MD5 b710a0d977e3eba6ba253eaa06983db7
BLAKE2b-256 36f500743c2767c2f1a9231ab67295bd8f56a5b2139fb792133491372b25cba2

See more details on using hashes here.

Provenance

The following attestation bundles were made for gg_research_community_playbook-0.1.0.tar.gz:

Publisher: publish-pypi.yml on galirage/gg-research-community-playbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gg_research_community_playbook-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for gg_research_community_playbook-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 0fb8affbe020293d362e79521ee439f810ddaddb7cdd3fb583063bd2586a9a8e
MD5 d4fe60a39daf8c16c724de44ea11d784
BLAKE2b-256 4effeeff68db2760c8bbed92b0aba44f4d5c311470784f2057227f0afb3f86fc

See more details on using hashes here.

Provenance

The following attestation bundles were made for gg_research_community_playbook-0.1.0-py3-none-any.whl:

Publisher: publish-pypi.yml on galirage/gg-research-community-playbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page