Read & write to gzip compressed CSV files.
Project description
Compressed Spreadsheets
Compressed Spreadsheets is a simple Python library for reading & writing to gzip
compressed CSV files using a similar API as the builtin csv.DictReader
and csv.DictWriter
.
Priorities
- Simplicity
- Speed
- Ergonomics
- Compatibility with the API of
DictReader
andDictWriter
(though not the file format)
- Compatibility with the API of
Caveats
This code assumes that each row has to correct number of elements, in order to avoid imposing checks on each row.
The goal of the implementation is to be reasonably fast with a simple implementation. The CSVs it generates won't be compatible with any other library, because of the (simple, easy) way special characters are escaped.
If we were to use StringIO
to create a buffer that a real DictWriter
instance
would write to, and then shuffle this into the compressed file, then we'd have
compatiblity without sacrificing simplicity; however, speed was more important
than compatiblity for my purposes, so I opted for this implementation.
The library does not behave well on sheets with 0 columns.
Installation
From GitHub
Simply download the project and place compressed_spreadsheets.py
into your
project directory; it has no external requirements.
From PyPI
pip install compressed-spreadsheets
Examples
These examples enumerate common use cases. See the docstrings for full documentation.
Writing to a spreadsheet
sheet = CompressedDictWriter.open("my_sheet.csv.gz", ("Column A", "Column B"))
sheet.writeheader()
sheet.writerow({"Column A": "Value 1", "Column B": "Value 2"})
sheet.writerows((
{"Column A": "foo", "Column B": "bar"},
{"Column A": "baz", "Column B": "snafu"}
))
sheet.close()
Reading from a spreadsheet
Calling CompressedDictReader.open(filename)
returns an object we can iterate over
to retrieve our rows.
sheet = CompressedDictReader.open("my_sheet.csv.gz")
# If the optional fieldnames argument is omitted, it is assumed the first line is a header row
next(sheet) # {"Column A": "Value 1", "Column B": "Value 2"}
for row in sheet:
process(row)
Specifying types for fields
The fieldtypes
argument allows you to automatically convert values into their proper types.
write_sheet = CompressedDictWriter.open("my_numbers_sheet.csv.gz", fieldnames=("Column A", "Column B"))
write_sheet.writeheader()
write_sheet.writerow({"Column A": 10, "Column B": 5.1})
write_sheet.close()
read_sheet = CompressedDictReader.open("my_numbers_sheet.csv.gz", fieldtypes={"Column A": int, "Column B": float})
next(read_sheet) # {"Column A": 10, "Column B": 5.1})
Context managers
Both CompressedDictReader
and CompressedDictWriter
can be used as context
managers. This will ensure the file is closed properly.
with CompressedDictWriter.open("my_sheet.csv.gz") as sheet:
for row in data:
sheet.writerow(row)
Contributing
I'm open to contributions, and especially open to bug reports. Please open an issue for any bugs, and please include unit tests & docstrings for any pull requests.
Use pip -r development.txt
to install the testing dependencies. Run tests with
pytest
. If you've made a very significant change or you'd like to hear for computer
fan, you can use pytest --hypothesis-profile hammer
to generate 1000 testcases
for each test.
License
Compressed Spreadsheets is distributed under the MIT license. See LICENSE.txt for the full terms of the license.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Hashes for compressed_spreadsheets-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | fa7352a0c04caa34bc17b0f32f9bc04ab3538bb5bce2acb9077acb04a7fe005d |
|
MD5 | 382116c09578a76c2fb64f7e3f56ac6f |
|
BLAKE2b-256 | 4f2e36043a2c937827b4bed5cce5f4cbceeebcd6e16f7f71b8a7b222d17afcd7 |