Utils for streaming large files (S3, HDFS, gzip, bz2...)
smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:
Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).
smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.
The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:
pip install smart_open
Or, if you prefer to install from the source tar.gz
python setup.py test # run unit tests python setup.py install
- improve smart_open support for HDFS (streaming from/to Hadoop File System)
- migrate smart_open streaming of gzip/bz2 files from gensim
- better document support for the default file:// scheme
- add py3k support
FIXME TODO help()