Stream an Excel sheet into SQLite or Excel with forward-fill padding, preserved Excel row numbers, and a stable row hash.

These details have not been verified by PyPI

Project description

xlfilldown

Stream an Excel sheet into SQLite or a new Excel sheet in constant memory. Forward-fill (pad) selected columns by header name, preserve original Excel row numbers, and compute a stable SHA-256 row hash.

Ingests only columns with non-empty headers (from --header-row).
Stores all non-empty values as TEXT strings (numbers/dates canonicalized to stable text; strings are stripped; whitespace-only cells become NULL).
Adds optional excel_row and row_hash columns.
Streams rows; suitable for large sheets.

Install

# From the project root:
pip install .
# or, for an isolated CLI:
pipx install .

Python ≥ 3.9. Depends on openpyxl.

CLI

xlfilldown has two subcommands. They share the same input options, and differ only in output destination.

db → write to SQLite
xlsx → write to Excel

Common input options

--infile (required): Path to input .xlsx file.
--insheet (required): Sheet name to read.
--header-row (required, 1-based): Row number containing the headers.
--pad-cols: JSON array of header names to forward-fill.
Example: '["tier1","tier2","tier,4"]'.
--pad-cols-letters: Alternative to --pad-cols.
Provide Excel column letters (A B C AE etc.). These are resolved to header names using --header-row.
If a referenced column’s header cell is empty (None, whitespace, or “nan”), the command will error.
Mutually exclusive with --pad-cols.
--pad-mode (default: hierarchical): Fill-down strategy.
- hierarchical → default. Higher-tier changes reset lower-tier carries.
- independent → legacy/pandas-style ffill. Each padded column carries independently.
--drop-blank-rows: Drop rows where all padded columns are empty after padding (treat as spacer rows).
--require-non-null: JSON array of headers; drop the row if any are null/blank after padding.
--row-hash: Include a row_hash column. In DB mode this also creates a non-unique index on row_hash.
--excel-row-numbers: Include original Excel row numbers in column excel_row (1-based).
--if-exists (default: fail): fail | replace | append.

`db` subcommand (SQLite output)

Additional options:

--db (required): SQLite database file (created if missing).
--table: SQLite table name (default: derived from input sheet name).
--batch-size (default: 1000): Rows per executemany() batch.

Create/append semantics

Table columns are: [row_hash?] [excel_row?] + headers… (all TEXT, including excel_row).
If --if-exists append, the existing table schema must exactly match the expected column order.
Helpful indexes are created automatically when enabled: excel_row and row_hash.

Examples

By header names:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --pad-cols '["columnname1","columnname2","anothercolumn,3"]' \
  --db out.db

By column letters:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --pad-cols-letters A C AE \
  --db out.db

`xlsx` subcommand (Excel output)

Additional options:

--outfile (required): Output .xlsx file.
--outsheet: Output sheet name (default: derived from input sheet name).

Sheet-level --if-exists

fail: error if target sheet exists.
replace: recreate target sheet fresh.
append: append below existing rows; the destination header row must match the expected header list (including excel_row and/or row_hash if enabled).

Examples

By header names:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --pad-cols '["columnname1","columnname2","anothercolumn,3"]' \
  --outfile out.xlsx \
  --outsheet Processed

By column letters:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --pad-cols-letters A D \
  --outfile out.xlsx \
  --outsheet Processed

Behavior details

Headers

Only columns with non-empty header cells on --header-row are ingested.
Empty or duplicate headers after normalization are rejected.

Forward-fill (padding)

Hierarchical (default): Higher-tier changes reset lower-tier carries. Example:

Tier1   Tier2   Tier3
apple
       red     sour
potato
       fried   yellow

→ produces:

apple   red    sour
potato  None   None
potato  fried  yellow

Independent (legacy): Each padded column carries independently (pandas-style ffill). Same input produces:
```
apple   red    sour
potato  red    sour
potato  fried  yellow
```
Completely empty rows (all headers blank) are preserved as empty without applying fill-down; the carry persists past them for later rows.
Whitespace-only cells are treated as blank.

Dropping rows

--drop-blank-rows: drops rows where all --pad-cols are blank (often spacer rows).
--require-non-null [A,B,…]: drops rows where any of those headers are blank after padding.

Row hash

--row-hash adds a SHA-256 hex digest over all ingested columns (in header order) after padding for non-empty rows.
For completely empty rows, the hash reflects all-empty values (no padding is applied by design).
SQLite mode creates a non-unique index on row_hash for faster lookups.
Numeric cells are normalized for hashing (e.g., 1, 1.0 → 1; no scientific notation).

Excel row numbers

--excel-row-numbers includes the original Excel row number (1-based) in column excel_row.

Python API

from xlfilldown.core import ingest_excel_to_sqlite, ingest_excel_to_excel

# → SQLite
summary = ingest_excel_to_sqlite(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    pad_cols=["columnname1","columnname2","anothercolumn,3"],
    db="out.db",
    table=None,
    drop_blank_rows=True,
    require_non_null=["columnname1","columnname2"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    batch_size=1000,
    pad_hierarchical=True,   # default
)

# → Excel
summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    pad_cols=["columnname1","columnname2","anothercolumn,3"],
    outfile="out.xlsx",
    outsheet=None,
    drop_blank_rows=True,
    require_non_null=["columnname1","columnname2"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    pad_hierarchical=False,  # use independent fill
)

Return fields

SQLite: { table, columns, rows_ingested, row_hash, excel_row_numbers }
Excel: { workbook, sheet, columns, rows_written, row_hash, excel_row_numbers }

Notes

All destination columns are written as TEXT (including excel_row). Values are stored as canonical strings; hashing uses the same canonicalization.
The input workbook is opened with read_only=True, data_only=True (formulas are evaluated to cached values).

License

MIT © RexBytes

Project details

These details have not been verified by PyPI

Release history Release notifications | RSS feed

1.0.3

Oct 8, 2025

1.0.2

Oct 8, 2025

1.0.1

Sep 30, 2025

1.0.0

Sep 29, 2025

0.4.0

Sep 28, 2025

0.3.1

Sep 27, 2025

0.3.0

Sep 27, 2025

0.2.0

Sep 22, 2025

0.1.1

Sep 17, 2025

This version

0.1.0

Sep 17, 2025

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlfilldown-0.1.0.tar.gz (23.2 kB view details)

Uploaded Sep 17, 2025 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

xlfilldown-0.1.0-py3-none-any.whl (14.8 kB view details)

Uploaded Sep 17, 2025 Python 3

File details

Details for the file xlfilldown-0.1.0.tar.gz.

File metadata

Download URL: xlfilldown-0.1.0.tar.gz
Upload date: Sep 17, 2025
Size: 23.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xlfilldown-0.1.0.tar.gz
Algorithm	Hash digest
SHA256	`cdf02b51324a9f8a9a77759a718be1227a57dfca1ab13f54031687d33a2b6c53`
MD5	`151a930bd3420ccfa06c670139182c4f`
BLAKE2b-256	`d9318f2995314ed8e888f27fd918b168869b73f60a17eba5897f59ad9d05d84a`

See more details on using hashes here.

File details

Details for the file xlfilldown-0.1.0-py3-none-any.whl.

File metadata

Download URL: xlfilldown-0.1.0-py3-none-any.whl
Upload date: Sep 17, 2025
Size: 14.8 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xlfilldown-0.1.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`ecfcc31d60db69d331ad4c2bcffe771803877b0bf9a86b32a593ca4aa4e56cb2`
MD5	`606f7bd49636eb9ff68c646079310f28`
BLAKE2b-256	`aa8673b83f89565b4aa0c6367079cb8f44d83df4d9e194b1cb45546bd9f865c7`

See more details on using hashes here.

xlfilldown 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

xlfilldown

Install

CLI

Common input options

`db` subcommand (SQLite output)

`xlsx` subcommand (Excel output)

Behavior details

Headers

Forward-fill (padding)

Dropping rows

Row hash

Excel row numbers

Python API

Notes

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

xlfilldown 0.1.0

Navigation

Verified details

Maintainers

Unverified details

Meta

Classifiers

Project description

xlfilldown

Install

CLI

Common input options

db subcommand (SQLite output)

xlsx subcommand (Excel output)

Behavior details

Headers

Forward-fill (padding)

Dropping rows

Row hash

Excel row numbers

Python API

Notes

License

Project details

Verified details

Maintainers

Unverified details

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes

`db` subcommand (SQLite output)

`xlsx` subcommand (Excel output)