Skip to main content

Utils for streaming large files (S3, HDFS, gzip, bz2...)

Project description

Travis Downloads License

What?

smart_open is a Python library for efficient streaming of (very large) files from/to S3. It is well tested (using moto), well documented and has a dead simple API:

FIXME EXAMPES

Why?

Amazon’s standard Python library, boto contains all the necessary building blocks for streaming, but has a really clumsy interface. There are nasty hidden gotchas when you want to stream large files from/to S3 (as opposed to simple in-memory read/write with key.set_contents_from_string() and key.get_contents_as_string()).

smart_open shields you from that, offering a cleaner API. The result is less code for you to write and fewer bugs to make.

Installation

The module has no dependencies beyond 2.6 <= Python < 3.0 and boto:

pip install smart_open

Or, if you prefer to install from the source tar.gz

python setup.py test # run unit tests
python setup.py install

To run the unit tests (optional), you’ll also need to install mock and moto.

Todo

  • improve smart_open support for HDFS (streaming from/to Hadoop File System)

  • migrate smart_open streaming of gzip/bz2 files from gensim

  • better document support for the default file:// scheme

  • add py3k support

Documentation

FIXME TODO help()

Comments, bug reports

smart_open lives on github. You can file issues or pull requests there.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

smart_open-0.1.0.tar.gz (10.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page