11 projects
brozzler
Distributed web crawling with browsers
warcprox
WARC writing MITM HTTP/S proxy
arklet
An unassuming ARK minter, binder, and resolver
warctools
Command line tools and libraries for handling and manipulating WARC files (and HTTP contents)
doublethink
rethinkdb python library
snakebite-py3
Pure Python HDFS client
Trough
surt
Sort-friendly URI Reordering Transform (SURT) python package.
urlcanon
url canonicalization library for python and java
ujson-ia
Ultra fast JSON encoder and decoder for Python (Internet Archive fork)
rethinkstuff
Rudimentary rethinkdb python library with some smarts, perhaps some dumbs