Skip to main content

Stream an Excel sheet into SQLite or Excel with forward-fill padding, preserved Excel row numbers, and a stable row hash.

Project description

xlfilldown

PyPI version Python versions License

Stream an Excel sheet into SQLite or a new Excel sheet in constant memory. Forward-fill selected columns by header name, preserve original Excel row numbers, and compute a stable SHA-256 row hash.

  • Ingests only columns with non-empty headers (from --header-row).
  • Stores all non-empty values as TEXT strings (numbers/dates canonicalized to stable text; strings are stripped; whitespace-only cells become NULL).
  • Optional excel_row and row_hash columns.
  • Streams rows; suitable for large sheets.

Install

From PyPI (recommended)

pip install xlfilldown
# or
pipx install xlfilldown

Python ≥ 3.9. Depends on openpyxl.


CLI

xlfilldown has two subcommands that share the same input options and differ only in the output destination:

  • db → write to SQLite
  • xlsx → write to Excel

Common input options

  • --infile (required): Path to input .xlsx file.

  • --insheet (required): Sheet name to read.

  • --header-row (required, 1-based): Row number containing the headers.

  • --fill-cols: JSON array of header names to forward-fill. Example: '["columnname1","columnname2","anothercolumn,3"]'.

  • --fill-cols-letters: Alternative to --fill-cols. Provide Excel column letters (A B C AE etc.). These are resolved to header names using --header-row. If a referenced column’s header cell is empty (None, whitespace, or “nan”), the command errors. Mutually exclusive with --fill-cols.

  • --fill-mode (default: hierarchical): Fill strategy.

    • hierarchical → Higher-tier column changes reset lower-tier carries.
    • independent → Pandas-style ffill, each listed column carries independently.
  • --drop-blank-rows: Drop rows where all fill columns are empty after filling (treat as spacer rows).

  • --require-non-null: JSON array of headers; drop the row if any are null/blank after fill.

  • --require-non-null-letters: Excel column letters; resolved to headers and merged with --require-non-null.

  • --row-hash: Include a row_hash column. In DB mode this also creates a non-unique index on row_hash.

  • --excel-row-numbers: Include original Excel row numbers in column excel_row (1-based).

  • --if-exists (default: fail): fail | replace | append.

Header matching: After normalization (trim; case preserved; 'nan' → blank), names must match exactly.


db subcommand (SQLite output)

Additional options:

  • --db (required): SQLite database file (created if missing).
  • --table: SQLite table name (default: derived from input sheet name).
  • --batch-size (default: 1000): Rows per executemany() batch.

Create/append semantics

  • Table columns are: [row_hash?] [excel_row?] + headers… (all TEXT, including excel_row).
  • If --if-exists append, the existing table schema must exactly match the expected column order.
  • Helpful indexes are created automatically when enabled: excel_row and row_hash.

Examples

By header names:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["columnname1","columnname2","anothercolumn,3"]' \
  --db out.db

By column letters:

xlfilldown db \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols-letters A C AE \
  --db out.db

xlsx subcommand (Excel output)

Additional options:

  • --outfile (required): Output .xlsx file.
  • --outsheet: Output sheet name (default: derived from input sheet name).

Sheet-level --if-exists

  • fail: error if target sheet exists.
  • replace: recreate target sheet fresh.
  • append: append below existing rows; the destination header row must match the expected header list (including excel_row and/or row_hash if enabled).

Examples

By header names:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols '["columnname1","columnname2","anothercolumn,3"]' \
  --outfile out.xlsx \
  --outsheet Processed

By column letters:

xlfilldown xlsx \
  --infile data.xlsx \
  --insheet "Sheet1" \
  --header-row 1 \
  --fill-cols-letters A D \
  --outfile out.xlsx \
  --outsheet Processed

Behavior details

Headers

  • Only columns with non-empty header cells on --header-row are ingested.
  • Empty or duplicate headers after normalization are rejected.

Forward-fill (filling)

  • Hierarchical (default): order matters The hierarchy is the order you pass the columns. The leftmost is the highest tier.

    • Names: --fill-cols '["Region","Country","City"]' ⇒ Region > Country > City
    • Letters: --fill-cols-letters A C B ⇒ Column A > Column C > Column B When a higher-tier value appears on a row, all lower-tier carries reset for that row.
  • Independent (pandas-style ffill) Each listed column forward-fills independently. Order of columns does not matter. Columns do not reset each other.

  • Completely empty rows (all headers blank) are preserved as empty without applying fill; the carry persists past them for later rows.

  • Whitespace-only cells are treated as blank.

Illustration

Input:

columnname1   columnname2   anothercolumn,3
apple
       red     sour
potato
       fried   yellow

Hierarchical output:

apple   red    sour
potato  None   None
potato  fried  yellow

Independent output:

apple   red    sour
potato  red    sour
potato  fried  yellow

Dropping rows

  • --drop-blank-rows: drops rows where all --fill-cols are blank (often spacer rows).
  • --require-non-null [A,B,…] / --require-non-null-letters: drops rows where any of those headers are blank after filling.

Row hash

  • --row-hash adds a SHA-256 hex digest over all ingested columns (in header order) after filling for non-empty rows.
  • For completely empty rows, the hash reflects all-empty values (no filling is applied by design).
  • SQLite mode creates a non-unique index on row_hash for faster lookups.
  • Numeric cells are normalized for hashing (e.g., 1, 1.01; no scientific notation).

Excel row numbers

  • --excel-row-numbers includes the original Excel row number (1-based) in column excel_row.

Python API

from xlfilldown.api import ingest_excel_to_sqlite, ingest_excel_to_excel

# → SQLite
summary = ingest_excel_to_sqlite(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    # choose one:
    fill_cols=["columnname1", "columnname2", "anothercolumn,3"],   # by header names
    # fill_cols_letters=["A", "B", "C"],                           # or by Excel letters
    db="out.db",
    table=None,
    drop_blank_rows=True,
    # choose one (or both, merged & de-duped):
    require_non_null=["columnname1", "columnname2"],               # by header names
    # require_non_null_letters=["A", "B"],                         # or by Excel letters
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    batch_size=1000,
    fill_mode="hierarchical",    # default hierarchical fill
    # fill_mode="independent",   # independent (pandas-style) fill
)

# → Excel
summary = ingest_excel_to_excel(
    file="data.xlsx",
    sheet="Sheet1",
    header_row=1,
    fill_cols=["columnname1", "columnname2", "anothercolumn,3"],
    # or: fill_cols_letters=["A", "B", "C"],
    outfile="out.xlsx",
    outsheet=None,
    drop_blank_rows=True,
    require_non_null=["columnname1", "columnname2"],
    # or: require_non_null_letters=["A", "B"],
    row_hash=True,
    excel_row_numbers=True,
    if_exists="replace",
    fill_mode="independent",     # independent (pandas-style) fill
    # fill_mode="hierarchical",  # hierarchical (default)
)

Return fields

  • SQLite: { table, columns, rows_ingested, row_hash, excel_row_numbers }
  • Excel: { workbook, sheet, columns, rows_written, row_hash, excel_row_numbers }

Raw Ingest Examples

Sometimes you may want to ingest a sheet exactly as-is, without applying any fill-down logic.
This is useful if you just want to load the data into SQLite or Excel with audit columns (row_hash, excel_row) and handle nulls/blanks later.


CLI: Raw ingest to SQLite

xlfilldown db \
  --infile data.xlsx --insheet Sheet1 --header-row 1 \
  --db out.db --table raw_ingest \
  --ingest-mode raw \
  --row-hash --excel-row-numbers --if-exists replace

CLI: Raw ingest to Excel

xlfilldown xlsx \
  --infile data.xlsx --insheet Sheet1 --header-row 1 \
  --outfile out.xlsx --outsheet RawSheet \
  --ingest-mode raw \
  --row-hash --excel-row-numbers --if-exists replace

Python API: Raw ingest to SQLite

from xlfilldown.api import ingest_excel_to_sqlite

summary = ingest_excel_to_sqlite(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    ingest_mode="raw",                      # skip fill-down
    db="out.db", table="raw_ingest",
    row_hash=True, excel_row_numbers=True,
    if_exists="replace",
)
print(summary)

Python API: Raw ingest to Excel

from xlfilldown.api import ingest_excel_to_excel

summary = ingest_excel_to_excel(
    file="data.xlsx", sheet="Sheet1", header_row=1,
    ingest_mode="raw",                      # skip fill-down
    outfile="out.xlsx", outsheet="RawSheet",
    row_hash=True, excel_row_numbers=True,
    if_exists="replace",
)
print(summary)

Notes

  • All destination columns are written as TEXT (including excel_row). Values are stored as canonical strings; hashing uses the same canonicalization.
  • The input workbook is opened with read_only=True, data_only=True (formulas use cached values).

License

MIT © RexBytes

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

xlfilldown-0.4.0.tar.gz (26.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

xlfilldown-0.4.0-py3-none-any.whl (18.3 kB view details)

Uploaded Python 3

File details

Details for the file xlfilldown-0.4.0.tar.gz.

File metadata

  • Download URL: xlfilldown-0.4.0.tar.gz
  • Upload date:
  • Size: 26.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xlfilldown-0.4.0.tar.gz
Algorithm Hash digest
SHA256 d56d11cb1555ef0c1ef3f6c3beff035048ca45ef8c6c2288cc891de4db434d13
MD5 3e54edd78d853f22cc127586d0cddc43
BLAKE2b-256 0f9b1f239a73ef30ea4611fdf8320dfdd53c8cb104416952ea6cd7862d13840a

See more details on using hashes here.

File details

Details for the file xlfilldown-0.4.0-py3-none-any.whl.

File metadata

  • Download URL: xlfilldown-0.4.0-py3-none-any.whl
  • Upload date:
  • Size: 18.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.2.0 CPython/3.12.3

File hashes

Hashes for xlfilldown-0.4.0-py3-none-any.whl
Algorithm Hash digest
SHA256 3930b860a59a481f5384d4ff00deb81844cf63614d7966665552ee9377eab19a
MD5 5a3691857352240843f877406a15f165
BLAKE2b-256 9f1d97585902eff7ae66d8cda5589e09628ddd92e6df543a3ca51e5e281827b0

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page