GraphQL API and Insights engine for Apache TinkerPop supported graph databases.
Simple, extendable HTML and XML data extraction engine using YAML configurations and some times pythonic functions.
Logging module that logs data into elasticsearch. Supports async too.
A web spider framework that can transform websites into datasets with Crawl, Transform and Index workflow.
A library to transforming JSON with parsers.
A light-weight python module, that generate suggested topics for a given topic from the sources like google, bing.
Reads a webpage and extracts the information out of it, based on the HTML5 tags/classes
A data gathering framework to search and get information from web sources
The missing I/O Transforms in python which already exist in Java SDK based on https://beam.apache.org/documentation/io/built-in/
A micro-framework to crawl the web pages with crawlers configs. It can use MongoDB, Elasticsearch and Solr databases to cache and save the extracted data.
The easiest way to create thumbnails for your images with Django. Works with any storage backend.