Counter for large datasets
Utils for streaming large files (S3, HDFS, GCS, Azure Blob Storage, gzip, bz2...)
Python framework for fast Vector Space Modelling
Persistent dict in Python, backed up by sqlite3 and pickle, multithread-safe.
GNU cat over the network with autocompletion
Joins large dataframes together
Fast & simple summary for large CSV files
Geographical queries made easy.
Derives type annotations from Sphinx comments in Python source
UNIX cat with read support for S3, SSH, etc.
Tools for indexing gzip files to support random-like access.
Uploads videos to liveleak.com
Performs ElasticSearch bulk and scroll tasks