Skip to main content

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Project description

Travis Downloads License


smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:



Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).

smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.


The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:

pip install smart_open

Or, if you prefer to install from the source tar.gz

python test # run unit tests
python install

To run the unit tests (optional), you’ll also need to install mock and moto.


  • improve smart_open support for HDFS (streaming from/to Hadoop File System)
  • migrate smart_open streaming of gzip/bz2 files from gensim
  • better document support for the default file:// scheme
  • add py3k support



Comments, bug reports

smart_open lives on github. You can file issues or pull requests there.

Project details

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Filename, size & hash SHA256 hash help File type Python version Upload date
smart_open-0.1.1.tar.gz (11.2 kB) Copy SHA256 hash SHA256 Source None

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page