11 projects
invana-engine
GraphQL API and Insights engine for Apache TinkerPop supported graph databases.
web-parsers
Simple, extendable HTML and XML data extraction engine using YAML configurations and some times pythonic functions.
cf-loggers
Logging module that logs data into elasticsearch. Supports async too.
invana-bot
A web spider framework that can transform websites into datasets with Crawl, Transform and Index workflow.
invana-transformers
A library to transforming JSON with parsers.
topic-suggestor
A light-weight python module, that generate suggested topics for a given topic from the sources like google, bing.
webpage-reader
Reads a webpage and extracts the information out of it, based on the HTML5 tags/classes
trawler
A data gathering framework to search and get information from web sources
apache-beam-io-extras
The missing I/O Transforms in python which already exist in Java SDK based on https://beam.apache.org/documentation/io/built-in/
web-crawler-plus
A micro-framework to crawl the web pages with crawlers configs. It can use MongoDB, Elasticsearch and Solr databases to cache and save the extracted data.
django-thumbs-v2
The easiest way to create thumbnails for your images with Django. Works with any storage backend.