Skip to main content

Python function to construct an ODS spreadsheet on the fly - without having to store the entire file in memory or disk

Project description

stream-write-ods

PyPI package Test suite Code coverage

Python function to construct an ODS (OpenDocument Spreadsheet) on the fly - without having to store the entire file in memory or disk.

Can be used to convert CSV, SQLite, or JSON to ODS format.

Installation

pip install stream-write-ods

Usage

In general, pass a nested iterable to stream_write_ods and it will return an interable of bytes of the ODS file, as follows.

from stream_write_ods import stream_write_ods

def get_sheets():
    def get_rows_of_sheet_1():
        yield 'Value A', 'Value B'
        yield 'Value C', 'Value D'

    yield 'Sheet 1 name', ('col_1_name', 'col_2_name'), get_rows_of_sheet_1()

    def get_rows_of_sheet_2():
        yield 'col_1_value',

    yield 'Sheet 2 name', ('col_1_name',), get_rows_of_sheet_2()

ods_chunks = stream_write_ods(get_sheets())

Usage: Convert CSV file to ODS

The following recipe converts a local CSV file to ODS

import csv
from stream_write_ods import stream_write_ods

def get_sheets(sheet_name, csv_reader):
    yield sheet_name, next(csv_reader), csv_reader

with open('my.csv', 'r', encoding='utf-8', newline='') as f:
    csv_reader = csv.reader(f, csv.QUOTE_NONNUMERIC)
    ods_chunks = stream_write_ods(get_sheets('Sheet 1', csv_reader))

Usage: Convert CSV bytes to ODS

The following recipe converts an iterable yielding the bytes of a CSV to ODS.

import csv
from io import IOBase, TextIOWrapper
from stream_write_ods import stream_write_ods

# Any iterable that yields the bytes of a CSV file
# Hard coded for the purposes of this example
bytes_iter = (
    b'col_1,col_2\n',
    b'1,"value"\n',
)

def to_str_lines(iterable):
    # Based on the answer at https://stackoverflow.com/a/70639580/1319998
    chunk = b''
    offset = 0
    it = iter(iterable)

    def up_to_iter(size):
        nonlocal chunk, offset

        while size:
            if offset == len(chunk):
                try:
                    chunk = next(it)
                except StopIteration:
                    break
                else:
                    offset = 0
            to_yield = min(size, len(chunk) - offset)
            offset = offset + to_yield
            size -= to_yield
            yield chunk[offset - to_yield:offset]

    class FileLikeObj(IOBase):
        def readable(self):
            return True
        def read(self, size=-1):
            return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))

    yield from TextIOWrapper(FileLikeObj(), encoding='utf-8', newline='')

def get_sheets(sheet_name, csv_reader):
    yield sheet_name, next(csv_reader), csv_reader

lines_iter = to_str_lines(bytes_iter)
csv_reader = csv.reader(lines_iter, csv.QUOTE_NONNUMERIC)
ods_chunks = stream_write_ods(get_sheets('Sheet 1', csv_reader))

Usage: Convert large/chunked pandas dataframe to ODS

from io import BytesIO
from itertools import chain
import pandas as pd
from stream_write_ods import stream_write_ods

# Hard coded for the purposes of this example,
# but could be any file-like object
csv_file = BytesIO((
    b'col_1,col_2\n' +
    b'1,"value"\n'
    b'2,"other value"\n'
))

def get_sheets(reader):
    columns = None

    def get_rows():
        nonlocal columns

        for chunk in reader:
            if columns is None:
                columns = tuple(chunk.columns.tolist())
            yield from (row for index, row in chunk.iterrows())

    rows = get_rows()
    first_row = next(rows)

    yield 'Sheet 1', columns, chain((first_row,), rows)

# Directly saves the chunked dataframe as ODS for the purposes
# of this example, but could include calculations / manipulations
with pd.read_csv(csv_file, chunksize=1024) as reader:
    ods_chunks = stream_write_ods(get_sheets(reader))

Usage: Convert JSON to ODS

Using ijson to stream-parse a JSON file, it's possible to convert JSON data to ODS on the fly:

import ijson
import itertools
from stream_write_ods import stream_write_ods

# Any iterable that yields the bytes of a JSON file
# Hard coded for the purposes of this example
json_bytes_iter = (b'''{
  "data": [
      {"id": 1, "name": "Foo"},
      {"id": 2, "name": "Bar"}
  ]
}''',)

# ijson requires a file-like object
def to_file_like_obj(bytes_iter):
    chunk = b''
    offset = 0
    it = iter(bytes_iter)

    def up_to_iter(size):
        nonlocal chunk, offset

        while size:
            if offset == len(chunk):
                try:
                    chunk = next(it)
                except StopIteration:
                    break
                else:
                    offset = 0
            to_yield = min(size, len(chunk) - offset)
            offset = offset + to_yield
            size -= to_yield
            yield chunk[offset - to_yield:offset]

    class FileLikeObj:
        def read(self, size=-1):
            return b''.join(up_to_iter(float('inf') if size is None or size < 0 else size))

    return FileLikeObj()

def get_sheets(json_file):
    columns = None

    def rows():
        nonlocal columns
        for item in ijson.items(json_file, 'data.item'):
            if columns is None:
                columns = list(item.keys())
            yield tuple(item[column] for column in columns)

    # Ensure columns populated
    rows_it = rows()
    first_row = next(rows_it)

    yield 'Sheet 1', columns, itertools.chain((first_row,), rows_it)

json_file = to_file_like_obj(json_bytes_iter)
ods_chunks = stream_write_ods(get_sheets(json_file))

Usage: Convert SQLite to ODS

SQLite isn't particularly streaming-friendly since typically you need random access to the file. But it's still possible to use stream-write-ods to convert SQLite to ODS.

import contextlib
import sqlite3
import tempfile
from stream_write_ods import stream_write_ods

@contextlib.contextmanager
def get_db():
    # Hard coded in memory database for the purposes of this example
    with sqlite3.connect(':memory:') as con:
        cur = con.cursor()
        cur.execute("CREATE TABLE my_table_a (my_col text);")
        cur.execute("CREATE TABLE my_table_b (my_col text);")
        cur.execute("INSERT INTO my_table_a VALUES ('Value A')")
        cur.execute("INSERT INTO my_table_b VALUES ('Value B')")
        yield con

def quote_identifier(value):
    return '"' + value.replace('"', '""') + '"'

def get_sheets(db):
    cur_table = db.cursor()
    cur_table.execute('''
        SELECT name FROM sqlite_master
        WHERE type = "table" AND name NOT LIKE 'sqlite\\_%' ESCAPE '\\'
    ''')
    cur_data = db.cursor()
    for table, in cur_table:
        cur_data.execute(f'SELECT * FROM {quote_identifier(table)} ORDER BY rowid')
        yield table, tuple(col[0] for col in cur_data.description), cur_data

with get_db() as db:
    ods_chunks = stream_write_ods(get_sheets(db))

Types

There are 8 possible data types in an Open Document Spreadsheet: boolean, currency, date, float, percentage, string, time, and void. 4 of these can be output by stream-write-ods, chosen automatically according to the following table.

Python type ODS type
boolean boolean
date date - without time component
datetime date - with time component
int float
float float
str string
bytes string - base64 encoded
NoneType no type - empty cell

It is possible to change how each type is encoded by overriding the encoders parameter of the stream_write_ods function. See stream-write-ods.py for the default implementation.

Modified at

ODS files are ZIP files, and as such require a "modified at" time for each member file. This defaults to datatime.now, but can be overridden by the get_modified_at parameter of the stream_write_ods function. See stream-write-ods.py for the default implementation.

This is useful if you want to make sure generated ODS files are byte-for-byte identical to a fixed reference, say from automated tests.

Large ODS files

ODS spreadsheets are essentially ZIP archives containing several member files. By default, stream-write-ods creates ZIP files that are limited to 4GiB (gibibyte) of data for compatibility reasons, for example to support older versions of LibreOffice. If you attempt to store more than this limit a ZipOverflow exception will be raised.

To avoid this exception and to store more data, you can pass use_zip_64=True as an argument to stream-write-ods. This results in a more recent ZIP format used that allows 16 EiB (exbibyte) of data to be stored, but with the downside that older versions of LibreOffice will not be able to open the resulting ODS file.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream_write_ods-0.0.25.tar.gz (6.8 kB view details)

Uploaded Source

Built Distribution

stream_write_ods-0.0.25-py3-none-any.whl (6.6 kB view details)

Uploaded Python 3

File details

Details for the file stream_write_ods-0.0.25.tar.gz.

File metadata

  • Download URL: stream_write_ods-0.0.25.tar.gz
  • Upload date:
  • Size: 6.8 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? Yes
  • Uploaded via: twine/5.1.1 CPython/3.12.7

File hashes

Hashes for stream_write_ods-0.0.25.tar.gz
Algorithm Hash digest
SHA256 d5b087ba02e3087b9303aaac000091226c288fc49873f39f8afa2a7ccd7f6725
MD5 229c3f020b6a97dbd7d4c795d10a9a72
BLAKE2b-256 cd3a94a8c3454575dbbcd26fae9389f34717bf63786516ead83b69bb36125f5e

See more details on using hashes here.

File details

Details for the file stream_write_ods-0.0.25-py3-none-any.whl.

File metadata

File hashes

Hashes for stream_write_ods-0.0.25-py3-none-any.whl
Algorithm Hash digest
SHA256 0dc785d3d39bd6a2e5c343eacc24814cf5a81ab0ff891175bc9aa279bd424e9b
MD5 116383b6e19dd99546ff2518a6db3c66
BLAKE2b-256 4e111bbf14f07150676f69c1506a162273ce634d6c06e54ccce760c6b61496c8

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page