Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Project description
stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
To construct ODS spreadsheets on the fly, try stream-write-ods.
Installation
pip install stream-read-ods
Usage
To extract the rows you must use the stream_read_ods
function, passing it an iterable of bytes
instances, and it will return an iterable of (sheet_name, sheet_rows)
pairs.
from stream_read_ods import stream_read_ods
import httpx
def ods_chunks():
# Iterable that yields the bytes of an ODS file
with httpx.stream('GET', 'https://www.example.com/my.ods') as r:
yield from r.iter_bytes(chunk_size=65536)
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
for sheet_row in sheet_rows:
print(row) # Tuple of cells
If the spreadsheet is of a fairly simple structure, then the sheet_rows
from above can be passed to the simple_table
function to extract the names of the columns and the rows of the table.
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
for row in rows:
print(row) # Tuple of cells
This can then be used to construct a Pandas dataframe from the ODS file (although this would store the entire sheet in memory).
import pandas as pd
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
df = pd.DataFrame(rows, columns=columns)
print(df)
Types
There are 8 possible data types in an Open Document Spreadsheet: boolean, currency, date, float, percentage, string, time, and void. These are converted to Python types according to the following table.
ODS type | Python type |
---|---|
boolean | bool |
currency | stream_read_ods.Currency |
date | date or datetime |
float | Decimal |
percentage | stream_read_ods.Percentage |
string | str |
time | stream_read_ods.Time |
void | NoneType |
stream_read_ods.Currency
A subclass of Decimal with an additional attribute code
that contains the currency code, for example the string GBP
. This can be None
if the ODS file does not specify a code.
stream_read_ods.Percentage
A subclass of Decimal.
stream_read_ods.Time
The Python built-in timedelta type is not used since timedelta does not offer a way to store intervals of years or months, other than converting to days which would be a loss of information.
Instead, a namedtuple is defined, stream_read_ods.Time, with members:
Member | Type |
---|---|
sign | str |
years | int |
months | int |
days | int |
hours | int |
minutes | int |
seconds | Decimal |
Running tests
pip install -r requirements-dev.txt
pytest
Exceptions
Exceptions raised by the source iterable are passed through stream_read_ods
unchanged. Other exceptions are in the stream_read_ods
module, and derive from its StreamReadODSError
.
Exception hierarchy
-
StreamReadODSError
Base class for all explicitly-thrown exceptions
-
InvalidOperationError
-
UnfinishedIterationError
The rows iterator of a sheet has not been iterated to completion
-
-
InvalidODSFileError (also inherits from the ValueError built-in)
Base class for errors relating to the bytes of the ODS file not being parsable. Several errors relate to the fact that ODS files are ZIP archives that require specific members and contents.
-
UnzipError
The ODS file does not appear to be a valid ZIP file. More detail is in the
__cause__
member of the raised exception, which is an exception that derives fromUnzipValueError
in stream-unzip. -
MissingMIMETypeError
The MIME type of the file was not present. In ZIP terms, this means that the first file of the ZIP archive is not named
mimetype
. -
IncorrectMIMETypeError
The MIME type was present, but does not match
application/vnd.oasis.opendocument.spreadsheet
. The can happen if a file such as an Open Document Text (ODT) file is passed rather than an ODS file. -
MissingContentXMLError
The file claims to be an ODS file according to its MIME type, but does not contain the requires
content.xml
file that contains the sheet data.
-
-
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stream_read_ods-0.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | c31bddf4645d46b77301c9baae219bf23d6e138a8871aa801d9b5418410847fd |
|
MD5 | 0be3181348edb827964f682dd01c592f |
|
BLAKE2b-256 | 2b0460d1b8dab44b19eafbfe76658c53fd766bc7733dbf37919b9fc376181db4 |