Skip to main content

Convert XLSX files to CSV with visual formatting preserved as inline flags

Project description

Flagged CSV

Convert XLSX files to CSV while preserving visual formatting information as inline flags.

Traditional XLSX → CSV conversion will cause the loss of format such as background cell colors or cell merge: ShowExcelConversionOriginalCsv_ManimCE_v0 19 0

The new Flagged CSV will attach important flags to the original cell value, keeping the colors/cell-merge with {} flags ShowExcelConversionFlaggedCsv_ManimCE_v0 19 0

Video Tutorial

Flagged CSV Component Demo

Why Flagged CSV is Critical for AI

Traditional AI models cannot process XLSX files directly. When users convert XLSX to CSV for AI processing, critical visual information is lost - colors that indicate categories, merged cells that show date ranges, and formatting that conveys meaning. This forces users to manually add context that was already present visually.

Real-World Example: AI's Limitation with Standard CSV

Consider this financial data with colored categories and merged cells for date ranges:

Original Excel file has:

  • Different colors for each expense category
  • Merged cells showing the time span for each expense
  • My color1 and My color4 share the same green color (#84E291)

After standard CSV conversion, an AI assistant fails to answer basic questions:

User: "Which two items have the same color?"

AI: "The provided CSV file does not contain any information about the colors of the cells..."

User: "What is the time range for each color's spend?"

AI: "I cannot determine time ranges from the CSV data..."

The Flagged CSV Solution

With Flagged CSV format, the same AI can now answer correctly:

Name,Color,Value,JUL{#0E2841},AUG{#0E2841},SEP{#0E2841},OCT{#0E2841},NOV{#0E2841},DEC{#0E2841}
My color1,{#84E291},30,$500{#84E291}{MG:881261},{MG:881261},,,,
My color2,{#E49EDD},32,$600{#E49EDD}{MG:316508},{MG:316508},{MG:316508},,,
My color3,{#F6C6AC},34,$700{#F6C6AC}{MG:353471},{MG:353471},{MG:353471},{MG:353471},,
My color4,{#84E291},36,$800{#84E291}{MG:860393},{MG:860393},{MG:860393},{MG:860393},{MG:860393},{MG:860393}

Now the AI can answer:

Q: Which two items have the same color?

A: My color1 and My color4 have the same color (#84E291).

Q: What is the time range and spend for each item?

A:

  • My color1: Time range JUL-AUG with a spend of $500
  • My color2: Time range JUL-SEP with a spend of $600
  • My color3: Time range JUL-OCT with a spend of $700
  • My color4: Time range JUL-DEC with a spend of $800

This enables AI to understand:

  • Color patterns - Which items belong to the same category
  • Time ranges - Merged cells indicate duration
  • Relationships - Visual cues that humans naturally understand

Overview

Flagged CSV is a Python library and command-line tool that converts Excel (XLSX) files to CSV format while preserving important visual information that would normally be lost in conversion:

  • Cell background colors - Preserved as {#RRGGBB} or {bc:#RRGGBB} flags
  • Cell foreground colors - Preserved as {fc:#RRGGBB} flags
  • Merged cells - Marked with {MG:XXXXXX} flags where XXXXXX is a unique identifier
  • Cell formatting - Currency symbols, number formats, dates preserved as displayed in Excel
  • Cell locations - Original Excel coordinates preserved as {l:CellRef} flags

Installation

Via uv (Recommended)

# Clone the repository
git clone https://github.com/yourusername/flagged-csv.git
cd flagged-csv

# Sync dependencies with uv
uv sync

# Install in development mode
uv pip install -e .

Via pip

pip install flagged-csv

Quick Start

Command Line Usage

# Basic conversion
flagged-csv input.xlsx -t Sheet1 > output.csv

# Include all colors (foreground and background) with merge information
flagged-csv input.xlsx -t Sheet1 --include-colors --signal-merge -o output.csv

# Include background colors only, preserve formatting, ignore white backgrounds
flagged-csv input.xlsx -t Sheet1 --preserve-formats --include-bg-colors --ignore-colors "#FFFFFF"

# Include foreground colors only (font colors)
flagged-csv input.xlsx -t Sheet1 --include-fg-colors -o output.csv

# Include cell locations and keep empty rows for structure preservation
flagged-csv input.xlsx -t Sheet1 --add-location --keep-empty-lines -o output.csv

# Process with size limits
flagged-csv input.xlsx -t Sheet1 --max-rows 1000 --max-columns 200 -o output.csv

Python Library Usage

# Run Python scripts with uv
uv run python your_script.py
# your_script.py
from flagged_csv import XlsxConverter

# Create converter instance
converter = XlsxConverter()

# Convert with all formatting options
csv_content = converter.convert_to_csv(
    'data.xlsx',
    tab_name='Sheet1',
    include_colors=True,
    signal_merge=True,
    preserve_formats=True,
    ignore_colors='#FFFFFF'
)

# Save to file
with open('output.csv', 'w') as f:
    f.write(csv_content)

Flag Format Specification

Color Flags

Background Color

  • Format: {#RRGGBB} (backward-compatible) or {bc:#RRGGBB} (explicit)
  • Example: Sales{#FF0000} or Sales{bc:#FF0000} - "Sales" with red background

Foreground Color

  • Format: {fc:#RRGGBB}
  • Example: Text{fc:#0000FF} - "Text" with blue font color

Combined Colors

  • Multiple flags can be combined: 100{#FFFF00}{fc:#FF0000}{MG:123456}{l:B5}
  • This represents: Yellow background, red text, part of merge group 123456, from cell B5

Merge Flags

  • Format: {MG:XXXXXX} where XXXXXX is a 6-digit identifier
  • All cells in a merged range share the same ID
  • The first cell contains the actual value
  • Subsequent cells contain only the merge flag

Location Flags

  • Format: {l:CellRef} where CellRef is the Excel cell coordinate
  • Example: Value{l:A5} - "Value" from cell A5
  • Useful for preserving cell position information

Example Output

Given an Excel file with:

  • Cell A1: "Total Sales" with blue background (#0000FF) and white text (#FFFFFF)
  • Cells B1-D1: Merged cell containing "$1,000" with green background (#00FF00)
  • Cell A2: "Profit" with red text (#FF0000)

The CSV output with --include-colors would be:

Total Sales{#0000FF}{fc:#FFFFFF},$1000{#00FF00}{MG:384756},{MG:384756},{MG:384756}
Profit{fc:#FF0000},,

Configuration Options

CLI Options

  • -t, --tab-name: Sheet name to convert (required)
  • -o, --output: Output file path (default: stdout)
  • --format: Output format: csv, html, or markdown (default: csv)
  • --include-colors: Include both foreground and background colors
  • --include-bg-colors: Include background colors only as {#RRGGBB} flags
  • --include-fg-colors: Include foreground colors only as {fc:#RRGGBB} flags
  • --signal-merge: Include merged cell information as {MG:XXXXXX} flags
  • --preserve-formats: Preserve number/date formatting as displayed in Excel
  • --ignore-colors: Comma-separated hex colors to ignore (e.g., "#FFFFFF,#000000")
  • --add-location: Add cell coordinates {l:A5} to non-empty cells
  • --keep-empty-lines: Preserve empty rows to maintain original row positions
  • --max-rows: Maximum number of rows to process (default: 300)
  • --max-columns: Maximum number of columns to process (default: 100)
  • --no-header: Exclude DataFrame column headers (A, B, C...) from output
  • --keep-na: Keep NA values instead of converting to empty strings

Python API Options

from flagged_csv import XlsxConverter, XlsxConverterConfig

# Create converter with custom configuration
config = XlsxConverterConfig(
    keep_default_na=False,    # Convert NA to empty strings
    index=False,              # Don't include row index
    header=False,             # Don't include DataFrame column headers (default)
    keep_empty_lines=False,   # Remove empty rows (default)
    add_location=False        # Don't add cell coordinates (default)
)

converter = XlsxConverter(config)

# Convert with additional options
csv_content = converter.convert_to_csv(
    'data.xlsx',
    tab_name='Sheet1',
    include_colors=True,      # Include both fg and bg colors
    # OR use specific color options:
    # include_bg_colors=True,  # Background colors only
    # include_fg_colors=True,  # Foreground colors only
    signal_merge=True,
    preserve_formats=True,
    add_location=True,        # Add {l:A5} cell coordinates
    keep_empty_lines=True,    # Keep empty rows
    max_rows=500,            # Process up to 500 rows
    max_columns=50           # Process up to 50 columns
)

Advanced Usage

Processing Multiple Sheets

Save this as process_sheets.py:

from flagged_csv import XlsxConverter
import pandas as pd

converter = XlsxConverter()

# Process all sheets in a workbook
xl_file = pd.ExcelFile('multi_sheet.xlsx')
for sheet_name in xl_file.sheet_names:
    csv_content = converter.convert_to_csv(
        'multi_sheet.xlsx',
        tab_name=sheet_name,
        include_colors=True,
        signal_merge=True
    )
    
    with open(f'{sheet_name}.csv', 'w') as f:
        f.write(csv_content)
    print(f'Converted {sheet_name} -> {sheet_name}.csv')

Run with:

uv run python process_sheets.py

Parsing Flagged CSV

Save this as parse_flagged.py:

import re
import pandas as pd

def parse_flagged_csv(file_path):
    """Parse a flagged CSV file and extract values and formatting."""
    df = pd.read_csv(file_path, header=None)
    
    # Regular expressions for parsing flags
    color_pattern = r'{#([0-9A-Fa-f]{6})}'
    merge_pattern = r'{MG:(\d{6})}'
    location_pattern = r'{l:([A-Z]+\d+)}'
    
    # Extract clean values and formatting info
    for row_idx in range(len(df)):
        for col_idx in range(len(df.columns)):
            cell = str(df.iloc[row_idx, col_idx])
            
            # Extract color
            color_match = re.search(color_pattern, cell)
            if color_match:
                color = color_match.group(1)
                print(f"Cell ({row_idx},{col_idx}) has color #{color}")
            
            # Extract merge ID
            merge_match = re.search(merge_pattern, cell)
            if merge_match:
                merge_id = merge_match.group(1)
                print(f"Cell ({row_idx},{col_idx}) is part of merge group {merge_id}")
            
            # Extract location
            location_match = re.search(location_pattern, cell)
            if location_match:
                location = location_match.group(1)
                print(f"Cell ({row_idx},{col_idx}) originally from {location}")
            
            # Get clean value (remove all flags)
            clean_value = re.sub(r'{[^}]+}', '', cell)
            df.iloc[row_idx, col_idx] = clean_value
    
    return df

# Example usage
if __name__ == "__main__":
    df = parse_flagged_csv('output.csv')
    print("\nCleaned data:")
    print(df)

Run with:

uv run python parse_flagged.py

Working with Merged Cells

def reconstruct_merged_cells(df):
    """Reconstruct merged cell ranges from flagged CSV."""
    merge_groups = {}
    
    for row_idx in range(len(df)):
        for col_idx in range(len(df.columns)):
            cell = str(df.iloc[row_idx, col_idx])
            
            # Find merge ID
            match = re.search(r'{MG:(\d{6})}', cell)
            if match:
                merge_id = match.group(1)
                if merge_id not in merge_groups:
                    merge_groups[merge_id] = []
                merge_groups[merge_id].append((row_idx, col_idx))
    
    # merge_groups now contains all cells belonging to each merge
    for merge_id, cells in merge_groups.items():
        print(f"Merge {merge_id}: {cells}")

Output Formats

CSV (Default)

Standard CSV format with flags appended to cell values.

HTML

html_output = converter.convert_to_csv(
    'data.xlsx',
    tab_name='Sheet1',
    output_format='html',
    include_colors=True
)

Markdown

markdown_output = converter.convert_to_csv(
    'data.xlsx', 
    tab_name='Sheet1',
    output_format='markdown',
    include_colors=True
)

Error Handling

The library handles various error cases gracefully:

try:
    csv_content = converter.convert_to_csv('data.xlsx', tab_name='InvalidSheet')
except ValueError as e:
    print(f"Sheet not found: {e}")
except FileNotFoundError as e:
    print(f"File not found: {e}")

Performance Considerations

  • The library uses multiple fallback engines (calamine, openpyxl, xlrd) for maximum compatibility
  • Large files are processed efficiently with streaming where possible
  • Color extraction uses caching to avoid repeated theme color lookups
  • Default limits of 300 rows and 100 columns can be adjusted using --max-rows and --max-columns
  • Empty rows are automatically removed by default (use --keep-empty-lines to preserve them)
  • Trailing empty content is always trimmed to reduce file size

Testing

Run the test suite using uv:

# Run all tests
uv run pytest tests/test_converter.py

# Run tests with verbose output
uv run pytest tests/test_converter.py -v

# Run a specific test
uv run pytest tests/test_converter.py::TestXlsxConverter::test_color_extraction -v

Development

# Set up development environment
uv sync

# Run the example script
uv run python example.py

# Run the CLI tool in development
uv run flagged-csv --help

Requirements

  • Python 3.11+
  • pandas >= 2.0.0, < 2.2.0
  • numpy < 2.0 (for compatibility)
  • pydantic >= 2.0.0 (for configuration models)
  • openpyxl >= 3.1.0
  • python-calamine >= 0.2.0 (for robust Excel reading)
  • xlrd >= 2.0.0 (for older Excel format support)
  • click >= 8.0.0 (for CLI)

Contributing

Contributions are welcome! Please feel free to submit a Pull Request.

  1. Fork the repository
  2. Create your feature branch (git checkout -b feature/amazing-feature)
  3. Run tests to ensure everything works (uv run pytest tests/)
  4. Commit your changes (git commit -m 'Add amazing feature')
  5. Push to the branch (git push origin feature/amazing-feature)
  6. Open a Pull Request

License

This project is licensed under the MIT License - see the LICENSE file for details.

AI Integration Guide

When using Flagged CSV with AI systems (ChatGPT, Claude, etc.), you have two options:

Option 1: Use the provided prompt file

Simply include the contents of flagged-csv.prompt.md in your conversation with the AI. This file contains the complete specification for the Flagged CSV format.

Option 2: Use this condensed prompt

The CSV data uses Flagged CSV format where:
- {#RRGGBB} indicates cell background color (e.g., {#FF0000} is red)
- {MG:XXXXXX} indicates merged cells (same ID = same merged group)
- {l:CellRef} indicates the original Excel cell location (e.g., {l:A5} is cell A5)
- The value in the merged cell should be applied to the entire range as a whole.
- Multiple flags can appear together: value{#color}{MG:id}{l:cell}

This simple context enables AI to answer questions about:

  • Color-based categorization
  • Time ranges from merged cells
  • Visual patterns and relationships
  • Conditional formatting meanings

Acknowledgments

This library is inspired by the need to preserve Excel's visual information during data processing pipelines, particularly for financial and business reporting applications where cell colors and merged cells convey important meaning. It bridges the gap between human-readable Excel files and AI-processable data formats.

The code for this project was primarily written using Claude Code, demonstrating how AI can help create tools that make data more accessible to AI systems.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

flagged_csv-0.1.5.tar.gz (26.2 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

flagged_csv-0.1.5-py3-none-any.whl (19.7 kB view details)

Uploaded Python 3

File details

Details for the file flagged_csv-0.1.5.tar.gz.

File metadata

  • Download URL: flagged_csv-0.1.5.tar.gz
  • Upload date:
  • Size: 26.2 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for flagged_csv-0.1.5.tar.gz
Algorithm Hash digest
SHA256 1f3045e07ba3e1b4c35f0844a13ca17a5822d41f76026a5479b6fe8cd57d14bb
MD5 f7f9201d617d9962158f4e35049297c2
BLAKE2b-256 d83b3454632c42fcd75624fdc238803adb81f2d6f17f7e7b31c57e35cc57d5d6

See more details on using hashes here.

File details

Details for the file flagged_csv-0.1.5-py3-none-any.whl.

File metadata

  • Download URL: flagged_csv-0.1.5-py3-none-any.whl
  • Upload date:
  • Size: 19.7 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.1.0 CPython/3.11.5

File hashes

Hashes for flagged_csv-0.1.5-py3-none-any.whl
Algorithm Hash digest
SHA256 7e9754d9b0090a0e2b16ac08ea67153ddd0fa5d2cfb857b42e74231e6dcd1f3b
MD5 36aa3092698793add7056e660473a4cf
BLAKE2b-256 fa6abf38c2e17c0bba969661e7ddab164a320f3880abcf2ec2356af5374dc7f0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page