A package for spreadsheet data loader to process with LLM
Project description
🔭📊 Spreadsheet Intelligence
⚡ Quick Install
With pip:
pip install spreadsheet_intelligence
🤔 What is Spreadsheet Intelligence?
Spreadsheet Intelligence parses the XML of Excel files to load various data and enhance the RAG performance of Excel files using LLM.
Currently, it supports the conversion of system configuration diagrams consisting of autoshapes in Excel, and it is a powerful tool reported in our paper to overcome the limitations of VLM in diagram interpretation.
The paper is available on arXiv.
🚀 Quick Start
from spreadsheet_intelligence.core.excel_autoshape_loader import ExcelAutoshapeLoader
loader = ExcelAutoshapeLoader(file_path="path/to/your/excel/file.xlsx")
loader.load()
autoshape_info_json = loader.export2json()
print(autoshape_info_json)
The output is as follows
{
"connectors": [
{
"type": "straightConnector1",
"arrowType": "bidirectional",
"color": "#000000",
"startX": "8.47",
"startY": "8.77",
"endX": "18.30",
"endY": "8.77"
},
{
"type": "bentConnector3",
"arrowType": "unidirectional",
"color": "#000000",
"startX": "14.75",
"startY": "4.74",
"StartArrowHeadDirection": "left",
"endX": "21.59",
"endY": "6.00",
"EndArrowHeadDirection": "right"
}
...
],
"shapes": [
{
"shapeType": "round_rect",
"fillColor": "#156082",
"borderColor": "#0E2841",
"left": "1.41",
"top": "5.52",
"right": "39.13",
"bottom": "23.40",
"text": null
},
{
"shapeType": "rect",
"fillColor": "#000000",
"borderColor": "#000000",
"left": "5.17",
"top": "19.07",
"right": "9.27",
"bottom": "19.87",
"text": {
"content": "Azure Cognitive Search",
"fontColor": null,
"fontSize": null,
"alignment": null
}
},
...
]
}
🗂️ Project Structure
This package is mainly composed of five packages: core, models, parsers, converters, and formatters.
spreadsheet_intelligence/
├── core/
│ ├── excel_autoshape_loader.py
├── models/
│ ├── converted/
│ ├── raw/
├── parsers/
├── converters/
├── formatters/
├── ...
Basic Processing Flow
The Excel file loaded as XML is processed in the following flow:
- It is parsed by
parsersin a nearly raw state and stored inRawmodels. - It is converted by
convertersfrom the XML representation to a structure that is easy for humans (LLM) to understand and stored inConvertedmodels. - It is converted by
formattersfrom theConvertedmodels to JSON format data that can be directly used in LLM prompts.
Basically, by using the ExcelAutoshapeLoader class in the core package, this flow can be wrapped and executed.
Customizability
Mainly extendable in the following ways:
- Extend the data retrieved from XML -> Extend by inheriting from
parsers - Extend the data conversion methods -> Extend by inheriting from
converters - Extend the output data format -> Extend by inheriting from
formatters
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file gg_research_community_playbook-0.1.1.tar.gz.
File metadata
- Download URL: gg_research_community_playbook-0.1.1.tar.gz
- Upload date:
- Size: 34.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f143efc2ee581c9f6bc2134a2770e23ecf4d514d6347ad41af69ac75966d278b
|
|
| MD5 |
79f4bc9ac9f80142080362ae2356439a
|
|
| BLAKE2b-256 |
32ea0ad883691c433dd9f0175dce3ee821bf4fc2802e0cfa4a3593552144b06f
|
Provenance
The following attestation bundles were made for gg_research_community_playbook-0.1.1.tar.gz:
Publisher:
publish-pypi.yml on galirage/gg-research-community-playbook
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gg_research_community_playbook-0.1.1.tar.gz -
Subject digest:
f143efc2ee581c9f6bc2134a2770e23ecf4d514d6347ad41af69ac75966d278b - Sigstore transparency entry: 244429862
- Sigstore integration time:
-
Permalink:
galirage/gg-research-community-playbook@aa23cbd1194007525b20c8309df601cafb89a005 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/galirage
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@aa23cbd1194007525b20c8309df601cafb89a005 -
Trigger Event:
release
-
Statement type:
File details
Details for the file gg_research_community_playbook-0.1.1-py3-none-any.whl.
File metadata
- Download URL: gg_research_community_playbook-0.1.1-py3-none-any.whl
- Upload date:
- Size: 47.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/6.1.0 CPython/3.12.9
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b8ef7b8f93225960b0e7362d1fa82e26980f2258fe8328a89e308194cd6389d0
|
|
| MD5 |
01b53ead0fc52a51bbb7f6170e9b77fc
|
|
| BLAKE2b-256 |
c5d57b211ad50da006d49342e55ae6e62a4ca2243f64f5c9ef5d937517fcc574
|
Provenance
The following attestation bundles were made for gg_research_community_playbook-0.1.1-py3-none-any.whl:
Publisher:
publish-pypi.yml on galirage/gg-research-community-playbook
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
gg_research_community_playbook-0.1.1-py3-none-any.whl -
Subject digest:
b8ef7b8f93225960b0e7362d1fa82e26980f2258fe8328a89e308194cd6389d0 - Sigstore transparency entry: 244429863
- Sigstore integration time:
-
Permalink:
galirage/gg-research-community-playbook@aa23cbd1194007525b20c8309df601cafb89a005 -
Branch / Tag:
refs/tags/v0.1.1 - Owner: https://github.com/galirage
-
Access:
private
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
publish-pypi.yml@aa23cbd1194007525b20c8309df601cafb89a005 -
Trigger Event:
release
-
Statement type: