Skip to main content

Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk

Project description

stream-read-ods CircleCI Test Coverage

Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk

To construct ODS spreadsheets on the fly, try stream-write-ods.

Installation

pip install stream-read-ods

Usage

To extract the rows you must use the stream_read_ods function, passing it an iterable of bytes instances, and it will return an iterable of (sheet_name, sheet_rows) pairs.

from stream_read_ods import stream_read_ods
import httpx

def ods_chunks():
    # Iterable that yields the bytes of an ODS file
    with httpx.stream('GET', 'https://www.example.com/my.ods') as r:
        yield from r.iter_bytes(chunk_size=65536)

for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
    for sheet_row in sheet_rows:
        print(row)  # Tuple of cells

If the spreadsheet is of a fairly simple structure, then the sheet_rows from above can be passed to the simple_table function to extract the names of the columns and the rows of the table.

from stream_read_ods import stream_read_ods, simple_table

for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
    columns, rows = simple_table(sheet_rows, skip_rows=2)
    for row in rows:
        print(row)  # Tuple of cells

This can then be used to construct a Pandas dataframe from the ODS file (although this would store the entire sheet in memory).

import pandas as pd
from stream_read_ods import stream_read_ods, simple_table

for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
    columns, rows = simple_table(sheet_rows, skip_rows=2)
    df = pd.DataFrame(rows, columns=columns)
    print(df)

Types

There are 8 possible data types in an Open Document Spreadsheet: boolean, currency, date, float, percentage, string, time, and void. These are converted to Python types according to the following table.

ODS type Python type
boolean bool
currency stream_read_ods.Currency
date date or datetime
float Decimal
percentage stream_read_ods.Percentage
string str
time stream_read_ods.Time
void NoneType

stream_read_ods.Currency

A subclass of Decimal with an additional attribute code that contains the currency code, for example the string GBP. This can be None if the ODS file does not specify a code.

stream_read_ods.Percentage

A subclass of Decimal.

stream_read_ods.Time

The Python built-in timedelta type is not used since timedelta does not offer a way to store intervals of years or months, other than converting to days which would be a loss of information.

Instead, a namedtuple is defined, stream_read_ods.Time, with members:

Member Type
sign str
years int
months int
days int
hours int
minutes int
seconds Decimal

Running tests

pip install -r requirements-dev.txt
pytest

Exceptions

Exceptions raised by the source iterable are passed through stream_read_ods unchanged. Other exceptions are in the stream_read_ods module, and derive from its StreamReadODSError.

Exception hierarchy

  • StreamReadODSError

    Base class for all explicitly-thrown exceptions

    • InvalidOperationError

      • UnfinishedIterationError

        The rows iterator of a sheet has not been iterated to completion

    • InvalidODSFileError (also inherits from the ValueError built-in)

      Base class for errors relating to the bytes of the ODS file not being parsable. Several errors relate to the fact that ODS files are ZIP archives that require specific members and contents.

      • UnzipError

        The ODS file does not appear to be a valid ZIP file. More detail is in the __cause__ member of the raised exception, which is an exception that derives from UnzipValueError in stream-unzip.

      • MissingMIMETypeError

        The MIME type of the file was not present. In ZIP terms, this means that the first file of the ZIP archive is not named mimetype.

      • IncorrectMIMETypeError

        The MIME type was present, but does not match application/vnd.oasis.opendocument.spreadsheet. The can happen if a file such as an Open Document Text (ODT) file is passed rather than an ODS file.

      • MissingContentXMLError

        The file claims to be an ODS file according to its MIME type, but does not contain the requires content.xml file that contains the sheet data.

      • InvalidContentXMLError

        The file claims to be an ODS file according to its MIME type, it contains a content.xml file, but it doesn't appear to contain valid XML. More detail is in the __cause__ member of the raised exception, which is an exception that derives from lxml.etree.LxmlError

        This exception may be raised in cases the underlying XML requires a high amount of memory to be parsed.

      • InvalidODSXMLError

        The file has valid content as XML, but there is some aspect of the XML that makes it not parseable as a spreadsheet.

        • InvalidTypeError

          The data type of a cell is not one of the 8 ODS data types

        • InvalidValueError

          The value of a cell cannot be parsed as its declared type. More detail may be in the __cause__ member of the raised exception.

          • InvalidBooleanValueError

          • InvalidCurrencyValueError

          • InvalidDateValueError

          • InvalidFloatValueError

          • InvalidPercentageValueError

          • InvalidTimeValueError

    • SizeError

      The file appears valid as an ODS file so far, but processing hat hit a size related limit. These limits are in place to avoid unexpected high memory use.

      • TooManyColumnsError

        More columns than the max_columns argument to stream_read_ods have been encountered. The default limit is 65536.

      • StringTooLongError

        A cell with a string value that's longer than the max_string_length argument to stream_read_ods has been encountered. The default limit is 65536.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stream-read-ods-0.0.13.tar.gz (7.3 kB view details)

Uploaded Source

Built Distribution

stream_read_ods-0.0.13-py3-none-any.whl (7.7 kB view details)

Uploaded Python 3

File details

Details for the file stream-read-ods-0.0.13.tar.gz.

File metadata

  • Download URL: stream-read-ods-0.0.13.tar.gz
  • Upload date:
  • Size: 7.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.12

File hashes

Hashes for stream-read-ods-0.0.13.tar.gz
Algorithm Hash digest
SHA256 22d29f74d53cee227b8cce53aab6762217e9169f2f9f3e0c67d3aa0b3431d5df
MD5 f7e59da8f6a23962e76f95a6e22e38de
BLAKE2b-256 7a241e75fee8c4ea753fe1a61628167bd53ee2a93d30cfae89470e78bb9a1ce0

See more details on using hashes here.

File details

Details for the file stream_read_ods-0.0.13-py3-none-any.whl.

File metadata

File hashes

Hashes for stream_read_ods-0.0.13-py3-none-any.whl
Algorithm Hash digest
SHA256 e2bbb1ea436113b2812afb04af6421ef146fc4f141111c77376a8c0b11ba627f
MD5 8b2977016ce45adc48ed4e728f49d392
BLAKE2b-256 fbb902db38c98e0ac7819dd1bf72da41024c5574669a74eb172be6495e909eac

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page