tabxlsx

tabXLSX reads and writes simple Excel xlsx files. It does not depend on other libraries.

Project description

TABXLSX - minimal Excel support

TabXLSX reads and writes Excel xlsx files. It is a single file implementation that does not depend on other libraries. The output defaults to a markdown table and csv-like output is available as well. This allows piping the data into other scripts.

A number of output format options are available but less than the tabtotext.py module. The export to xlsx was orginally written with openpyx1 for tabtotext but it is possible to write simple xlsx tables just with Python's zipfile and xml.etree builtin modules. That is also faster.

import tabxlsx

The tabxlsx.py script can be used as a library.

use readFromXLSX("file.xlsx") -> data to get data out of an Excel file
use tabtextfileXLSX("file.xlsx") -> (data, headers) to get data and header info
use tabtoXLSX("file.xlsx", data, headers, selected) to write data into an Excel file

and there are generic function that allow to write CSV and Markdown tables. These will run the xlsx output/input when the filename endswith ".xlsx" or ".xls".

use tabtextfile("file.csv") -> (data, headers) to get data and header from csv files
use tabtextfile("file.md") -> (data, headers) to get it for markdown tables in text
use print_tabtotext("file.csv", data, headers, selected) to write a csv file
use print_tabtotext("file.md", data, headers, selected) for markdown tables
use print_tabtotext("", data, headers, selected, defaultformat="csv") to stdout

The code itself mimics that of openpyx1.

use load_workbook(filename) to get a Workbook data frame from a file
use make_workbook(data, headers)to create a Workbook from the provided data
use Workbook.save(filename) to save the data to an xlsx file
and workbook.create_sheet().cell(1,1).alignment = Alignment(horizontal="right")
and of course workbook.active.cell(1,2).value = 1 with Python's basic data types

The headers arguments defines the default order and formatting of the provided input data, which is List[Dict[str, CellValue]], so each row does not have an implicit order. Using ["b:.2f", "a"] shows the "b" values first formatted with two digits after the decimal point. Then the "a" column follows, and then the rest in alphabetic order.

run tabxlsx.py

Use tabxlsx.py --help for the latest options when running the script as a command line tool. The first argument is usually some data.xlsx but it can also be .csv or .md file - the input parser gets selected from the file extension automatically.

Additional arguments are the columns to be selected for output - being a subset of the data from the input file. Just like with headers each column can be formatted in the style of Python's string.format(). The selected columns fall back to known formatting if not provided - including Date/Time columns which are generally recognized in all library parts.

Use "@csv" to ensure output as CSV instead of the default markdown tables. For the markdown tables, the columns of each row have the same width mich makes the data easier to read. Alternative @-formats are available as well, e.g. "@wide" or "@data" with the latter being tab-seperated CSV.

use ./tabxlsx.py data.xlsx -o data.csv # to convert from xlsx to csv
use ./tabxlsx.py data.csv -o data.xlsx # to convert from csv to xslx
use ./tabxlsx.py data.xlsx @csv # to show the xlsx data as csv lines
use ./tabxlsx.py data.xlsx a b @csv # but only the input columns a and b
use ./tabxlsx.py data.xlsx b:2.f a # format the b number for a 2-column table
use ./tabxlsx.py data.xlsx a --unique # get one column out, remove duplicates

development

The code is just a fraction of the "tabtotext.py" formatting engine. The main channel for distribution of that single "tabxlsx.py"script is via pypi.org. You can use pip download tabxlsx to download the latest script to any target system.

https://pypi.org/project/tabxlsx/

As the script does not have any dependencies, it can be copied around as is. Feel free to integrate it into your own Python project. Note that there is also a unittest-based "tabxlsx.tests.py" code that can ensure backward-compatibility if you start extending the tabxlsx code.

The original implementation in tabtoxlsx was based on openpyx1. The resulting xlsx files were inspected how to write them with just Python's internal zipfile. The xlsx reader is using zipfile and Python's internal xml.etree. This should be portable to JPython and IronPython as well. And tests showed tabxlsx to be 10x faster than openpyx1 for small datasets.

Have fun!

Project details

Release history Release notifications | RSS feed

1.1.3365

Sep 5, 2024

1.1.3361

Sep 1, 2024

1.1.3352

Aug 27, 2024

This version

1.1.3321

Aug 1, 2024

1.1.3313

Jul 30, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabxlsx-1.1.3321.tar.gz (18.2 kB view details)

Uploaded Aug 1, 2024 Source

File details

Details for the file tabxlsx-1.1.3321.tar.gz.

File metadata

Download URL: tabxlsx-1.1.3321.tar.gz
Upload date: Aug 1, 2024
Size: 18.2 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.15

File hashes

Hashes for tabxlsx-1.1.3321.tar.gz
Algorithm	Hash digest
SHA256	`c6ec9ff49ca4ff8144f56b1b76b772087f4d72ea46412e21f36720d01c0084c5`
MD5	`4d5f7c3572699e6459e442e8542ce93d`
BLAKE2b-256	`893606186776b8821dacdfb6fb5ff3addab7ba473c465138d565dec53fdf154b`