Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
Project description
stream-read-ods
Python function to extract data from an ODS spreadsheet on the fly - without having to store the entire file in memory or disk
To construct ODS spreadsheets on the fly, try stream-write-ods.
Installation
pip install stream-read-ods
Usage
To extract the rows you must use the stream_read_ods
function, passing it an iterable of bytes
instances, and it will return an iterable of (sheet_name, sheet_rows)
pairs.
from stream_read_ods import stream_read_ods
import httpx
def ods_chunks():
# Iterable that yields the bytes of an ODS file
with httpx.stream('GET', 'https://www.example.com/my.ods') as r:
yield from r.iter_bytes(chunk_size=65536)
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
for sheet_row in sheet_rows:
print(row) # Tuple of cells
If the spreadsheet is of a fairly simple structure, then the sheet_rows
from above can be passed to the simple_table
function to extract the names of the columns and the rows of the table.
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
for row in rows:
print(row) # Tuple of cells
This can then be used to construct a Pandas dataframe from the ODS file (although this would store the entire sheet in memory).
import pandas as pd
from stream_read_ods import stream_read_ods, simple_table
for sheet_name, sheet_rows in stream_read_ods(ods_chunks()):
columns, rows = simple_table(sheet_rows, skip_rows=2)
df = pd.DataFrame(rows, columns=columns)
print(df)
Types
There are 8 possible data types in an Open Document Spreadsheet: boolean, currency, date, float, percentage, string, time, and void. These are converted to Python types according to the following table.
ODS type | Python type |
---|---|
boolean | bool |
currency | stream_read_ods.Currency |
date | date or datetime |
float | Decimal |
percentage | stream_read_ods.Percentage |
string | str |
time | stream_read_ods.Time |
void | NoneType |
stream_read_ods.Currency
A subclass of Decimal.
stream_read_ods.Percentage
A subclass of Decimal.
stream_read_ods.Time
The Python built-in timedelta type is not used since timedelta does not offer a way to store intervals of years or months, other than converting to days which would be a loss of information.
Instead, a namedtuple is defined, stream_read_ods.Time, with members:
Member | Type |
---|---|
sign | str |
years | int |
months | int |
days | int |
hours | int |
minutes | int |
seconds | Decimal |
Running tests
pip install -r requirements-dev.txt
pytest
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for stream_read_ods-0.0.5-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 9edf3ab635ddf1e08ee03544dcd27fc0780275fac4d312a51f42eeb8620500e4 |
|
MD5 | f9cf1f06bbae4ec85ddf4fc8aae1c03d |
|
BLAKE2b-256 | 7a167764476a14e7d27aa96da26b385e8bd1c1ae53eb2018b1c72f60083cf981 |