Skip to main content

A package for spreadsheet data loader to process with LLM

Project description

logo

🔭📊 Spreadsheet Intelligence

Apache 2.0 License API Reference PyPI version

⚡ Quick Install

With pip:

pip install spreadsheet_intelligence

🤔 What is Spreadsheet Intelligence?

Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.

Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.

The paper is available on arXiv.

arXiv Static Badge

🚀 Quick Start

from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader

loader = ExcelAutoshapeLoader(file_path="path/to/your/excel/file.xlsx")
loader.load()
autoshape_info_json = loader.export2json()
print(autoshape_info_json)

The output is as follows

{
    "connectors": [
        {
            "type": "straightConnector1",
            "arrowType": "bidirectional",
            "color": "#000000",
            "startX": "8.47",
            "startY": "8.77",
            "endX": "18.30",
            "endY": "8.77"
        },
        {
            "type": "bentConnector3",
            "arrowType": "unidirectional",
            "color": "#000000",
            "startX": "14.75",
            "startY": "4.74",
            "StartArrowHeadDirection": "left",
            "endX": "21.59",
            "endY": "6.00",
            "EndArrowHeadDirection": "right"
        }
        ...
    ],
    "shapes": [
        {
            "shapeType": "round_rect",
            "fillColor": "#156082",
            "borderColor": "#0E2841",
            "left": "1.41",
            "top": "5.52",
            "right": "39.13",
            "bottom": "23.40",
            "text": null
        },
        {
            "shapeType": "rect",
            "fillColor": "#000000",
            "borderColor": "#000000",
            "left": "5.17",
            "top": "19.07",
            "right": "9.27",
            "bottom": "19.87",
            "text": {
                "content": "Azure Cognitive Search",
                "fontColor": null,
                "fontSize": null,
                "alignment": null
            }
        },
        ...
    ]
}

🗂️ Project Structure

This package is mainly composed of five packages: core, models, parsers, converters, and formatters.

spreadsheet_intelligence/
├── core/
│   ├── excel_autoshape_loader.py
├── models/
│   ├── converted/
│   ├── raw/
├── parsers/
├── converters/
├── formatters/
├── ...

Basic Processing Flow

The Excel file loaded as XML is processed in the following flow:

  1. It is parsed by parsers in a nearly raw state and stored in Raw models.
  2. It is converted by converters from the XML representation to a structure that is easy for humans (LLM) to understand and stored in Converted models.
  3. It is converted by formatters from the Converted models to JSON format data that can be directly used in LLM prompts.

Basically, by using the ExcelAutoshapeLoader class in the core package, this flow can be wrapped and executed.

Customizability

Mainly extendable in the following ways:

  • Extend the data retrieved from XML -> Extend by inheriting from parsers
  • Extend the data conversion methods -> Extend by inheriting from converters
  • Extend the output data format -> Extend by inheriting from formatters

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

gg_research_community_playbook-0.1.1.tar.gz (34.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

gg_research_community_playbook-0.1.1-py3-none-any.whl (47.2 kB view details)

Uploaded Python 3

File details

Details for the file gg_research_community_playbook-0.1.1.tar.gz.

File metadata

File hashes

Hashes for gg_research_community_playbook-0.1.1.tar.gz
Algorithm Hash digest
SHA256 f143efc2ee581c9f6bc2134a2770e23ecf4d514d6347ad41af69ac75966d278b
MD5 79f4bc9ac9f80142080362ae2356439a
BLAKE2b-256 32ea0ad883691c433dd9f0175dce3ee821bf4fc2802e0cfa4a3593552144b06f

See more details on using hashes here.

Provenance

The following attestation bundles were made for gg_research_community_playbook-0.1.1.tar.gz:

Publisher: publish-pypi.yml on galirage/gg-research-community-playbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

File details

Details for the file gg_research_community_playbook-0.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for gg_research_community_playbook-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 b8ef7b8f93225960b0e7362d1fa82e26980f2258fe8328a89e308194cd6389d0
MD5 01b53ead0fc52a51bbb7f6170e9b77fc
BLAKE2b-256 c5d57b211ad50da006d49342e55ae6e62a4ca2243f64f5c9ef5d937517fcc574

See more details on using hashes here.

Provenance

The following attestation bundles were made for gg_research_community_playbook-0.1.1-py3-none-any.whl:

Publisher: publish-pypi.yml on galirage/gg-research-community-playbook

Attestations: Values shown here reflect the state when the release was signed and may no longer be current.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page