Skip to main content

Read & write to gzip compressed CSV files.

Project description

Compressed Spreadsheets

Compressed Spreadsheets is a simple Python library for reading & writing to gzip compressed CSV files using a similar API as the builtin csv.DictReader and csv.DictWriter.

Priorities

  • Simplicity
  • Speed
  • Ergonomics
    • Compatibility with the API of DictReader and DictWriter (though not the file format)

Caveats

This code assumes that each row has to correct number of elements, in order to avoid imposing checks on each row.

The goal of the implementation is to be reasonably fast with a simple implementation. The CSVs it generates won't be compatible with any other library, because of the (simple, easy) way special characters are escaped.

If we were to use StringIO to create a buffer that a real DictWriter instance would write to, and then shuffle this into the compressed file, then we'd have compatiblity without sacrificing simplicity; however, speed was more important than compatiblity for my purposes, so I opted for this implementation.

The library does not behave well on sheets with 0 columns.

Installation

From GitHub

Simply download the project and place compressed_spreadsheets.py into your project directory; it has no external requirements.

From PyPI

pip install compressed-spreadsheets

Examples

These examples enumerate common use cases. See the docstrings for full documentation.

Writing to a spreadsheet

sheet = CompressedDictWriter.open("my_sheet.csv.gz", ("Column A", "Column B"))
sheet.writeheader()
sheet.writerow({"Column A": "Value 1", "Column B": "Value 2"})
sheet.writerows((
    {"Column A": "foo", "Column B": "bar"},
    {"Column A": "baz", "Column B": "snafu"}
))
sheet.close()

Reading from a spreadsheet

Calling CompressedDictReader.open(filename) returns an object we can iterate over to retrieve our rows.

sheet = CompressedDictReader.open("my_sheet.csv.gz")
# If the optional fieldnames argument is omitted, it is assumed the first line is a header row
next(sheet) # {"Column A": "Value 1", "Column B": "Value 2"}
for row in sheet:
    process(row)

Specifying types for fields

The fieldtypes argument allows you to automatically convert values into their proper types.

write_sheet = CompressedDictWriter.open("my_numbers_sheet.csv.gz", fieldnames=("Column A", "Column B"))
write_sheet.writeheader()
write_sheet.writerow({"Column A": 10, "Column B": 5.1})
write_sheet.close()

read_sheet = CompressedDictReader.open("my_numbers_sheet.csv.gz", fieldtypes={"Column A": int, "Column B": float})
next(read_sheet) # {"Column A": 10, "Column B": 5.1})

Context managers

Both CompressedDictReader and CompressedDictWriter can be used as context managers. This will ensure the file is closed properly.

with CompressedDictWriter.open("my_sheet.csv.gz") as sheet:
    for row in data:
        sheet.writerow(row)

Contributing

I'm open to contributions, and especially open to bug reports. Please open an issue for any bugs, and please include unit tests & docstrings for any pull requests.

Use pip -r development.txt to install the testing dependencies. Run tests with pytest. If you've made a very significant change or you'd like to hear for computer fan, you can use pytest --hypothesis-profile hammer to generate 1000 testcases for each test.

License

Compressed Spreadsheets is distributed under the MIT license. See LICENSE.txt for the full terms of the license.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

compressed_spreadsheets-1.0.1.tar.gz (4.9 kB view details)

Uploaded Source

File details

Details for the file compressed_spreadsheets-1.0.1.tar.gz.

File metadata

  • Download URL: compressed_spreadsheets-1.0.1.tar.gz
  • Upload date:
  • Size: 4.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.3.0 pkginfo/1.6.1 requests/2.25.1 setuptools/49.2.1 requests-toolbelt/0.9.1 tqdm/4.55.1 CPython/3.9.1

File hashes

Hashes for compressed_spreadsheets-1.0.1.tar.gz
Algorithm Hash digest
SHA256 fa7352a0c04caa34bc17b0f32f9bc04ab3538bb5bce2acb9077acb04a7fe005d
MD5 382116c09578a76c2fb64f7e3f56ac6f
BLAKE2b-256 4f2e36043a2c937827b4bed5cce5f4cbceeebcd6e16f7f71b8a7b222d17afcd7

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page