Skip to main content

Save large amounts of data to small xlsx files

Project description

Yet another Python XLSX writer

... yes, this is reinventing the wheel again and again, but ...

So I decided to write my own xlsx export for two reasons:

First and foremost, the two existing engines I use (openpyxl, xlsxwriter) available in pandas do not store large files efficiently. The problem is when I must load large number of records, up to Excel limit (2^20 = 1048576) and then send it over email (this is quite often the easies way to share data...). The files get way big.

Secondly, I just want to understand xlsx internals and use the simples possible code to handle files. As a side effect, it is simpler and faster than using some other libraries.

As a simple benchmark consider a sample file of 700+k records and 18 columns. Standard pandas creates files of about 40MB. The simple_xls_writer's file is as small as 8MB which makes it more email friendly.

(Of course when saving modified file it gets much bigger but this not the point).

Usage

The project consists of submodules:

  • writer
  • oracle_handler.

writer

This generic module exposes function(s) to write raw data (array of arrays) into Excel file.

This should be clear when reading the helper function:

def write_dummy(base_path: str, target_name: str) -> None:
    data = [["A", "B", "C"], ["TEST", 1.23, "2024-10-01 12:34:56"], ["TEST", 200, "2024-10-01 12:34:56"]]
    write_raw_data(base_path, target_name, data)

Note that the only supported data types are: str, int and float, which relates to the way data is saved in xlsx file.

So you may have to prepare the input array yourself or use other submodules (see below).

There's a helper function write_dummy that saves predefined tiny file under given name.

oracle_handler

If you use Oracle database you can use helper method that reads query result into required structure.

First of all you may wish to verify connection. I prefer to do it this way:

print("db time: "+oracle_handler.get_sysdate(username,password,dh_url).strftime("%Y-%m-%d %H:%M:%S"))

To save query results simply run:

oracle_handler.write_oracle_query(query,base_path, "all_tables",username,password,dh_url)

Example

See: main.py

...    

username = input("username: ")
password = getpass.getpass()
dh_url = input("DSN: ")

# verify connection
print("db time: "+oracle_handler.get_sysdate(username,password,dh_url).strftime("%Y-%m-%d %H:%M:%S"))

# fetch all tables' metadata
query = "select * from all_tables"
base_path = os.path.dirname(__file__)
oracle_handler.write_oracle_query(query,base_path, "all_tables",username,password,dh_url)

...

Installation

Install package using pip:

pip install simple-xlsx-writer

If you wish to use Oracle connectivity, add option:

pip install simple-xlsx-writer[oracle]

To verify installation run:

import os
from simple_xlsx_writer import writer

base_path = os.path.dirname(__file__) # or provide explicit path in interactive mode
writer.write_dummy(base_path, "dummy01")

You should find dummy01.xlsx file in a given containig:

A B C
TEST 1,23 2024-10-01 12:34:56
TEST 200 2024-10-01 12:34:56

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple_xlsx_writer-0.1.0.tar.gz (7.5 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

simple_xlsx_writer-0.1.0-py3-none-any.whl (8.3 kB view details)

Uploaded Python 3

File details

Details for the file simple_xlsx_writer-0.1.0.tar.gz.

File metadata

  • Download URL: simple_xlsx_writer-0.1.0.tar.gz
  • Upload date:
  • Size: 7.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/6.0.1 CPython/3.11.9

File hashes

Hashes for simple_xlsx_writer-0.1.0.tar.gz
Algorithm Hash digest
SHA256 9f5b136961f782b6803c00fc0642e645e1a814cf49926da34337543755918c1d
MD5 ed232d90acf94f8409b0bc48ad53717b
BLAKE2b-256 72d27af21dec3466cc737184c5e887d5b7e1876e396ffe722d2e2ace68ba3cef

See more details on using hashes here.

File details

Details for the file simple_xlsx_writer-0.1.0-py3-none-any.whl.

File metadata

File hashes

Hashes for simple_xlsx_writer-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 03d1fd13ddc52412a97319c4b677c3f7ea9ae02cd55142e506ebcb61c1ce8000
MD5 e5464807fa0f40c63d13c707ea01fdce
BLAKE2b-256 78a4df4c76744135237bd9258ca4b97decd38ab7af57afde905e705320fee659

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page