Convert XLSX files to CSV with visual formatting preserved as inline flags
Project description
Flagged CSV
Convert XLSX files to CSV while preserving visual formatting information as inline flags.
Traditional XLSX → CSV conversion will cause the loss of format such as background cell colors or cell merge:
The new Flagged CSV will attach important flags to the original cell value, keeping the colors/cell-merge with {} flags
Video Tutorial
Why Flagged CSV is Critical for AI
Traditional AI models cannot process XLSX files directly. When users convert XLSX to CSV for AI processing, critical visual information is lost - colors that indicate categories, merged cells that show date ranges, and formatting that conveys meaning. This forces users to manually add context that was already present visually.
Real-World Example: AI's Limitation with Standard CSV
Consider this financial data with colored categories and merged cells for date ranges:
Original Excel file has:
- Different colors for each expense category
- Merged cells showing the time span for each expense
- My color1 and My color4 share the same green color (#84E291)
After standard CSV conversion, an AI assistant fails to answer basic questions:
User: "Which two items have the same color?"
AI: "The provided CSV file does not contain any information about the colors of the cells..."
User: "What is the time range for each color's spend?"
AI: "I cannot determine time ranges from the CSV data..."
The Flagged CSV Solution
With Flagged CSV format, the same AI can now answer correctly:
Name,Color,Value,JUL{#0E2841},AUG{#0E2841},SEP{#0E2841},OCT{#0E2841},NOV{#0E2841},DEC{#0E2841}
My color1,{#84E291},30,$500{#84E291}{MG:881261},{MG:881261},,,,
My color2,{#E49EDD},32,$600{#E49EDD}{MG:316508},{MG:316508},{MG:316508},,,
My color3,{#F6C6AC},34,$700{#F6C6AC}{MG:353471},{MG:353471},{MG:353471},{MG:353471},,
My color4,{#84E291},36,$800{#84E291}{MG:860393},{MG:860393},{MG:860393},{MG:860393},{MG:860393},{MG:860393}
Now the AI can answer:
Q: Which two items have the same color?
A: My color1 and My color4 have the same color (#84E291).
Q: What is the time range and spend for each item?
A:
- My color1: Time range JUL-AUG with a spend of $500
- My color2: Time range JUL-SEP with a spend of $600
- My color3: Time range JUL-OCT with a spend of $700
- My color4: Time range JUL-DEC with a spend of $800
This enables AI to understand:
- Color patterns - Which items belong to the same category
- Time ranges - Merged cells indicate duration
- Relationships - Visual cues that humans naturally understand
Overview
Flagged CSV is a Python library and command-line tool that converts Excel (XLSX) files to CSV format while preserving important visual information that would normally be lost in conversion:
- Cell background colors - Preserved as
{#RRGGBB}or{bc:#RRGGBB}flags - Cell foreground colors - Preserved as
{fc:#RRGGBB}flags - Merged cells - Marked with
{MG:XXXXXX}flags where XXXXXX is a unique identifier - Cell formatting - Currency symbols, number formats, dates preserved as displayed in Excel
- Cell locations - Original Excel coordinates preserved as
{l:CellRef}flags
Installation
Via uv (Recommended)
# Clone the repository
git clone https://github.com/yourusername/flagged-csv.git
cd flagged-csv
# Sync dependencies with uv
uv sync
# Install in development mode
uv pip install -e .
Via pip
pip install flagged-csv
Quick Start
Command Line Usage
# Basic conversion
flagged-csv input.xlsx -t Sheet1 > output.csv
# Include all colors (foreground and background) with merge information
flagged-csv input.xlsx -t Sheet1 --include-colors --signal-merge -o output.csv
# Include background colors only, preserve formatting, ignore white backgrounds
flagged-csv input.xlsx -t Sheet1 --preserve-formats --include-bg-colors --ignore-colors "#FFFFFF"
# Include foreground colors only (font colors)
flagged-csv input.xlsx -t Sheet1 --include-fg-colors -o output.csv
# Include cell locations and keep empty rows for structure preservation
flagged-csv input.xlsx -t Sheet1 --add-location --keep-empty-lines -o output.csv
# Process with size limits
flagged-csv input.xlsx -t Sheet1 --max-rows 1000 --max-columns 200 -o output.csv
Python Library Usage
# Run Python scripts with uv
uv run python your_script.py
# your_script.py
from flagged_csv import XlsxConverter
# Create converter instance
converter = XlsxConverter()
# Convert with all formatting options
csv_content = converter.convert_to_csv(
'data.xlsx',
tab_name='Sheet1',
include_colors=True,
signal_merge=True,
preserve_formats=True,
ignore_colors='#FFFFFF'
)
# Save to file
with open('output.csv', 'w') as f:
f.write(csv_content)
Flag Format Specification
Color Flags
Background Color
- Format:
{#RRGGBB}(backward-compatible) or{bc:#RRGGBB}(explicit) - Example:
Sales{#FF0000}orSales{bc:#FF0000}- "Sales" with red background
Foreground Color
- Format:
{fc:#RRGGBB} - Example:
Text{fc:#0000FF}- "Text" with blue font color
Combined Colors
- Multiple flags can be combined:
100{#FFFF00}{fc:#FF0000}{MG:123456}{l:B5} - This represents: Yellow background, red text, part of merge group 123456, from cell B5
Merge Flags
- Format:
{MG:XXXXXX}where XXXXXX is a 6-digit identifier - All cells in a merged range share the same ID
- The first cell contains the actual value
- Subsequent cells contain only the merge flag
Location Flags
- Format:
{l:CellRef}where CellRef is the Excel cell coordinate - Example:
Value{l:A5}- "Value" from cell A5 - Useful for preserving cell position information
Example Output
Given an Excel file with:
- Cell A1: "Total Sales" with blue background (#0000FF) and white text (#FFFFFF)
- Cells B1-D1: Merged cell containing "$1,000" with green background (#00FF00)
- Cell A2: "Profit" with red text (#FF0000)
The CSV output with --include-colors would be:
Total Sales{#0000FF}{fc:#FFFFFF},$1000{#00FF00}{MG:384756},{MG:384756},{MG:384756}
Profit{fc:#FF0000},,
Configuration Options
CLI Options
-t, --tab-name: Sheet name to convert (required)-o, --output: Output file path (default: stdout)--format: Output format: csv, html, or markdown (default: csv)--include-colors: Include both foreground and background colors--include-bg-colors: Include background colors only as {#RRGGBB} flags--include-fg-colors: Include foreground colors only as {fc:#RRGGBB} flags--signal-merge: Include merged cell information as {MG:XXXXXX} flags--preserve-formats: Preserve number/date formatting as displayed in Excel--ignore-colors: Comma-separated hex colors to ignore (e.g., "#FFFFFF,#000000")--add-location: Add cell coordinates {l:A5} to non-empty cells--keep-empty-lines: Preserve empty rows to maintain original row positions--max-rows: Maximum number of rows to process (default: 300)--max-columns: Maximum number of columns to process (default: 100)--no-header: Exclude DataFrame column headers (A, B, C...) from output--keep-na: Keep NA values instead of converting to empty strings
Python API Options
from flagged_csv import XlsxConverter, XlsxConverterConfig
# Create converter with custom configuration
config = XlsxConverterConfig(
keep_default_na=False, # Convert NA to empty strings
index=False, # Don't include row index
header=False, # Don't include DataFrame column headers (default)
keep_empty_lines=False, # Remove empty rows (default)
add_location=False # Don't add cell coordinates (default)
)
converter = XlsxConverter(config)
# Convert with additional options
csv_content = converter.convert_to_csv(
'data.xlsx',
tab_name='Sheet1',
include_colors=True, # Include both fg and bg colors
# OR use specific color options:
# include_bg_colors=True, # Background colors only
# include_fg_colors=True, # Foreground colors only
signal_merge=True,
preserve_formats=True,
add_location=True, # Add {l:A5} cell coordinates
keep_empty_lines=True, # Keep empty rows
max_rows=500, # Process up to 500 rows
max_columns=50 # Process up to 50 columns
)
Advanced Usage
Processing Multiple Sheets
Save this as process_sheets.py:
from flagged_csv import XlsxConverter
import pandas as pd
converter = XlsxConverter()
# Process all sheets in a workbook
xl_file = pd.ExcelFile('multi_sheet.xlsx')
for sheet_name in xl_file.sheet_names:
csv_content = converter.convert_to_csv(
'multi_sheet.xlsx',
tab_name=sheet_name,
include_colors=True,
signal_merge=True
)
with open(f'{sheet_name}.csv', 'w') as f:
f.write(csv_content)
print(f'Converted {sheet_name} -> {sheet_name}.csv')
Run with:
uv run python process_sheets.py
Parsing Flagged CSV
Save this as parse_flagged.py:
import re
import pandas as pd
def parse_flagged_csv(file_path):
"""Parse a flagged CSV file and extract values and formatting."""
df = pd.read_csv(file_path, header=None)
# Regular expressions for parsing flags
color_pattern = r'{#([0-9A-Fa-f]{6})}'
merge_pattern = r'{MG:(\d{6})}'
location_pattern = r'{l:([A-Z]+\d+)}'
# Extract clean values and formatting info
for row_idx in range(len(df)):
for col_idx in range(len(df.columns)):
cell = str(df.iloc[row_idx, col_idx])
# Extract color
color_match = re.search(color_pattern, cell)
if color_match:
color = color_match.group(1)
print(f"Cell ({row_idx},{col_idx}) has color #{color}")
# Extract merge ID
merge_match = re.search(merge_pattern, cell)
if merge_match:
merge_id = merge_match.group(1)
print(f"Cell ({row_idx},{col_idx}) is part of merge group {merge_id}")
# Extract location
location_match = re.search(location_pattern, cell)
if location_match:
location = location_match.group(1)
print(f"Cell ({row_idx},{col_idx}) originally from {location}")
# Get clean value (remove all flags)
clean_value = re.sub(r'{[^}]+}', '', cell)
df.iloc[row_idx, col_idx] = clean_value
return df
# Example usage
if __name__ == "__main__":
df = parse_flagged_csv('output.csv')
print("\nCleaned data:")
print(df)
Run with:
uv run python parse_flagged.py
Working with Merged Cells
def reconstruct_merged_cells(df):
"""Reconstruct merged cell ranges from flagged CSV."""
merge_groups = {}
for row_idx in range(len(df)):
for col_idx in range(len(df.columns)):
cell = str(df.iloc[row_idx, col_idx])
# Find merge ID
match = re.search(r'{MG:(\d{6})}', cell)
if match:
merge_id = match.group(1)
if merge_id not in merge_groups:
merge_groups[merge_id] = []
merge_groups[merge_id].append((row_idx, col_idx))
# merge_groups now contains all cells belonging to each merge
for merge_id, cells in merge_groups.items():
print(f"Merge {merge_id}: {cells}")
Output Formats
CSV (Default)
Standard CSV format with flags appended to cell values.
HTML
html_output = converter.convert_to_csv(
'data.xlsx',
tab_name='Sheet1',
output_format='html',
include_colors=True
)
Markdown
markdown_output = converter.convert_to_csv(
'data.xlsx',
tab_name='Sheet1',
output_format='markdown',
include_colors=True
)
Error Handling
The library handles various error cases gracefully:
try:
csv_content = converter.convert_to_csv('data.xlsx', tab_name='InvalidSheet')
except ValueError as e:
print(f"Sheet not found: {e}")
except FileNotFoundError as e:
print(f"File not found: {e}")
Performance Considerations
- The library uses multiple fallback engines (calamine, openpyxl, xlrd) for maximum compatibility
- Large files are processed efficiently with streaming where possible
- Color extraction uses caching to avoid repeated theme color lookups
- Default limits of 300 rows and 100 columns can be adjusted using
--max-rowsand--max-columns - Empty rows are automatically removed by default (use
--keep-empty-linesto preserve them) - Trailing empty content is always trimmed to reduce file size
Testing
Run the test suite using uv:
# Run all tests
uv run pytest tests/test_converter.py
# Run tests with verbose output
uv run pytest tests/test_converter.py -v
# Run a specific test
uv run pytest tests/test_converter.py::TestXlsxConverter::test_color_extraction -v
Development
# Set up development environment
uv sync
# Run the example script
uv run python example.py
# Run the CLI tool in development
uv run flagged-csv --help
Requirements
- Python 3.11+
- pandas >= 2.0.0, < 2.2.0
- numpy < 2.0 (for compatibility)
- pydantic >= 2.0.0 (for configuration models)
- openpyxl >= 3.1.0
- python-calamine >= 0.2.0 (for robust Excel reading)
- xlrd >= 2.0.0 (for older Excel format support)
- click >= 8.0.0 (for CLI)
Contributing
Contributions are welcome! Please feel free to submit a Pull Request.
- Fork the repository
- Create your feature branch (
git checkout -b feature/amazing-feature) - Run tests to ensure everything works (
uv run pytest tests/) - Commit your changes (
git commit -m 'Add amazing feature') - Push to the branch (
git push origin feature/amazing-feature) - Open a Pull Request
License
This project is licensed under the MIT License - see the LICENSE file for details.
AI Integration Guide
When using Flagged CSV with AI systems (ChatGPT, Claude, etc.), you have two options:
Option 1: Use the provided prompt file
Simply include the contents of flagged-csv.prompt.md in your conversation with the AI. This file contains the complete specification for the Flagged CSV format.
Option 2: Use this condensed prompt
The CSV data uses Flagged CSV format where:
- {#RRGGBB} indicates cell background color (e.g., {#FF0000} is red)
- {MG:XXXXXX} indicates merged cells (same ID = same merged group)
- {l:CellRef} indicates the original Excel cell location (e.g., {l:A5} is cell A5)
- The value in the merged cell should be applied to the entire range as a whole.
- Multiple flags can appear together: value{#color}{MG:id}{l:cell}
This simple context enables AI to answer questions about:
- Color-based categorization
- Time ranges from merged cells
- Visual patterns and relationships
- Conditional formatting meanings
Acknowledgments
This library is inspired by the need to preserve Excel's visual information during data processing pipelines, particularly for financial and business reporting applications where cell colors and merged cells convey important meaning. It bridges the gap between human-readable Excel files and AI-processable data formats.
The code for this project was primarily written using Claude Code, demonstrating how AI can help create tools that make data more accessible to AI systems.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file flagged_csv-0.1.4.tar.gz.
File metadata
- Download URL: flagged_csv-0.1.4.tar.gz
- Upload date:
- Size: 25.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
1c4e162eb449f7d4376b4204a01530e4e40a0c77720125ce7ee083c3740ee80f
|
|
| MD5 |
b9f0eef573fc104f57b6158e03d6651d
|
|
| BLAKE2b-256 |
f170bf21f5e49218e91f8434a5419a79cc612b677f8ae1613668e5e26b5005ec
|
File details
Details for the file flagged_csv-0.1.4-py3-none-any.whl.
File metadata
- Download URL: flagged_csv-0.1.4-py3-none-any.whl
- Upload date:
- Size: 19.2 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/6.1.0 CPython/3.11.5
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9e8570dcce7f903e290d2f2a700d8b3a6775f8f024e3b6fa36a0cb1983916c01
|
|
| MD5 |
b532d8015a7c6590c676c096a27bc95d
|
|
| BLAKE2b-256 |
7e7010f0507aefbb60ee01d84947c157a2eb712256d7fb673865d3aafeb57a03
|