Skip to main content

tabXLSX reads and writes simple Excel xlsx files. Single script not depending on other libraries.

Project description

TABXLSX - minimal Excel support

TabXLSX reads and writes Excel xlsx files. It is a single file implementation that does not depend on other libraries. The output defaults to a markdown table and csv-like output is available as well. This allows piping the data into other scripts.

A number of output format options are available but less than the tabtotext.py module. The export to xlsx was orginally written with openpyx1 for tabtotext but it is possible to write simple xlsx tables just with Python's zipfile and xml.etree builtin modules. That is also faster.

import tabxlsx

The tabxlsx.py script can be used as a library.

  • use readFromXLSX("file.xlsx") -> data to get data out of an Excel file
  • use tabtextfileXLSX("file.xlsx") -> (data, headers) to get data and header info
  • use tabtoXLSX("file.xlsx", data, headers, selected) to write data into an Excel file

and there are generic function that allow to write CSV and Markdown tables. These will run the xlsx output/input when the filename endswith ".xlsx" or ".xls".

  • use tabtextfile("file.csv") -> (data, headers) to get data and header from csv files
  • use tabtextfile("file.md") -> (data, headers) to get it for markdown tables in text
  • use print_tabtotext("file.csv", data, headers, selected) to write a csv file
  • use print_tabtotext("file.md", data, headers, selected) for markdown tables
  • use print_tabtotext("", data, headers, selected, defaultformat="csv") to stdout

The code itself mimics that of openpyx1.

  • use load_workbook(filename) to get a Workbook data frame from a file
  • use make_workbook(data, headers)to create a Workbook from the provided data
  • use Workbook.save(filename) to save the data to an xlsx file
  • and workbook.create_sheet().cell(1,1).alignment = Alignment(horizontal="right")
  • and of course workbook.active.cell(1,2).value = 1 with Python's basic data types

The headers arguments defines the default order and formatting of the provided input data, which is List[Dict[str, CellValue]], so each row does not have an implicit order. Using ["b:.2f", "a"] shows the "b" values first formatted with two digits after the decimal point. Then the "a" column follows, and then the rest in alphabetic order.

run tabxlsx.py

Use tabxlsx.py --help for the latest options when running the script as a command line tool. The first argument is usually some data.xlsx but it can also be .csv or .md file - the input parser gets selected from the file extension automatically.

Additional arguments are the columns to be selected for output - being a subset of the data from the input file. Just like with headers each column can be formatted in the style of Python's string.format(). The selected columns fall back to known formatting if not provided - including Date/Time columns which are generally recognized in all library parts.

Use "@csv" to ensure output as CSV instead of the default markdown tables. For the markdown tables, the columns of each row have the same width mich makes the data easier to read. Alternative @-formats are available as well, e.g. "@wide" or "@data" with the latter being tab-seperated CSV.

  • use ./tabxlsx.py data.xlsx -o data.csv # to convert from xlsx to csv
  • use ./tabxlsx.py data.csv -o data.xlsx # to convert from csv to xslx
  • use ./tabxlsx.py data.xlsx @csv # to show the xlsx data as csv lines
  • use ./tabxlsx.py data.xlsx a b @csv # but only the input columns a and b
  • use ./tabxlsx.py data.xlsx b:2.f a # format the b number for a 2-column table
  • use ./tabxlsx.py data.xlsx a --unique # get one column out, remove duplicates

Converting to and from "@json" is supported as well but it spoils the column order.

development

The code is just a fraction of the "tabtotext.py" formatting engine. The main channel for distribution of that single "tabxlsx.py"script is via pypi.org. You can use pip download tabxlsx to download the latest script to any target system.

As the script does not have any dependencies, it can be copied around as is. Feel free to integrate it into your own Python project. Note that there is also a unittest-based "tabxlsx.tests.py" code that can ensure backward-compatibility if you start extending the tabxlsx code.

The original implementation in tabtoxlsx was based on openpyx1. The resulting xlsx files were inspected how to write them with just Python's internal zipfile. The xlsx reader is using zipfile and Python's internal xml.etree. This should be portable to JPython and IronPython as well. And tests showed tabxlsx to be 10x faster than openpyx1 for small datasets.

Have fun!

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tabxlsx-1.1.3361.tar.gz (29.0 kB view details)

Uploaded Source

File details

Details for the file tabxlsx-1.1.3361.tar.gz.

File metadata

  • Download URL: tabxlsx-1.1.3361.tar.gz
  • Upload date:
  • Size: 29.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.15.0 pkginfo/1.5.0.1 requests/2.25.1 setuptools/44.1.1 requests-toolbelt/0.9.1 tqdm/4.45.0 CPython/3.6.15

File hashes

Hashes for tabxlsx-1.1.3361.tar.gz
Algorithm Hash digest
SHA256 8b7e04eca9e58404495567f3639475478b77c9717bced73422fe32a77591fdba
MD5 d1c57ef85c131e448e0698f2453632f5
BLAKE2b-256 6cb1e65ba6bfcafefe1e3f10d93e4971873066af37de1d3c39331af44f55ef7b

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page