Skip to main content

The Stingray Schema-Based File Reader

Project description

Spreadsheet format files are the lingua franca of data processing. CSV, Tab, XLS, XSLX and ODS files are used widely. Python’s csv module handles two common formats. Add-on packages are required for the variety of other physical file formats.

The problem is that each add-on package has a unique view of the underlying data.

The Stingray Schema-Based File Reader offers several features to help process files in spreadsheet formats.

  1. It wraps format-specific modules with a unified “workbook” Facade to make applications able to work with any of the physical formats.

  2. It extends the workbook concept to include non-delimited files, including COBOL files encoded in any of the Unicode encodings, as well as ASCII and EBCDIC.

  3. It provides a uniform way to load and use schema information based on JSONSchema. A schema can be as small as header rows in the individual sheets of a workbook, or it can be separate schema information in another spreadsheet, a JSONSchema document, or COBOL “copybook” data definitions.

  4. It provides a suite of data conversions that cover the most common cases.

Additionally, the Stingray Reader provides some guidance on how to structure file-processing applications so that they are testable and composable.

Stingray 5.1 requires Python >= 3.12. The code is fully annotated with type hints.

This depends on additional projects to read .XLS, .XLSX, .ODS, and .NUMBERS files.

  • CSV files are built-in using the csv module.

  • COBOL files are built-in using the estruct and cobol_parser modules.

  • NDJSON or JSON Newline files are JSON with an extra provision that each document must be complete on one physical line. These use the built-in json module.

  • XLS files are read via the xlrd project: http://www.lexicon.net/sjmachin/xlrd.htm

  • XLSX files are read via two projects: https://openpyxl.readthedocs.io/en/stable/

  • Numbers (v13 and higher) usees protobuf and and snappy compression. See https://pypi.org/project/numbers-parser/.

  • YAML files can be a sequence of documents, permitting a direct mapping to a Workbook with a single Sheet.

  • TOML files are – in effect – giant dictionaries with flexible syntax and can be described by a JSONSchema.

  • XML files can be wrapped in a Workbook. There’s no automated translation from XSD to JSONSchema here. A sample is provided, but this may not solve very many problems in general.

  • ODS files are read via http://docs.pyexcel.org/. NOTE. Currently, ODS file processing has problems with the 0.7.0 release.

A file-suffix registry is used to map a suffix to a Workbook subclass that handles the physical format. A decorator is used to add or replace file suffix mappings, permitting an application to fold in extensions.

Installation

python -m pip install stingray-reader

Or. Using uv.

uv add stingray-reader

Note that there’s a tall stack of dependencies.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

stingray_reader-5.1.1.tar.gz (1.4 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

stingray_reader-5.1.1-py3-none-any.whl (48.2 kB view details)

Uploaded Python 3

File details

Details for the file stingray_reader-5.1.1.tar.gz.

File metadata

  • Download URL: stingray_reader-5.1.1.tar.gz
  • Upload date:
  • Size: 1.4 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: uv/0.4.20

File hashes

Hashes for stingray_reader-5.1.1.tar.gz
Algorithm Hash digest
SHA256 bb9fc24171b21583ab0975227c8a632d9a27e395b883b552e13392b83a0b59e4
MD5 83646b4364540fc382f820d9a5b0e591
BLAKE2b-256 0f145b9adaca1f42aa08c7321c16ffedca1b725751406949a33c26a6f64aafac

See more details on using hashes here.

File details

Details for the file stingray_reader-5.1.1-py3-none-any.whl.

File metadata

File hashes

Hashes for stingray_reader-5.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4d07f9909fcf07f83b50b1892c7dc07af1cae200542fdd55a98ccec47d82b56e
MD5 6f1f4a224f493c1216aaa784d88d9993
BLAKE2b-256 304f484b27d8523692bd4e317ef146d2660701581882ef3a22854002f4dcd637

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page