Utils for streaming large files (S3, HDFS, gzip, bz2...)
Project description
What?
smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:
FIXME EXAMPES
Why?
Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).
smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.
Installation
The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:
pip install smart_open
Or, if you prefer to install from the source tar.gz
python setup.py test # run unit tests python setup.py install
To run the unit tests (optional), you’ll also need to install mock and moto.
Todo
improve smart_open support for HDFS (streaming from/to Hadoop File System)
migrate smart_open streaming of gzip/bz2 files from gensim
better document support for the default file:// scheme
add py3k support
Documentation
FIXME TODO help()
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Comments, bug reports
smart_open lives on github. You can file issues or pull requests there.