Utils for streaming large files (S3, HDFS, GCS, Azure Blob Storage, gzip, bz2...)
Python framework for fast Vector Space Modelling
Persistent dict in Python, backed up by sqlite3 and pickle, multithread-safe.
Counter for large datasets
GNU cat over the network with autocompletion
Joins large dataframes together
Fast & simple summary for large CSV files
Geographical queries made easy.
Derives type annotations from Sphinx comments in Python source
Tools for indexing gzip files to support random-like access.