Profile of rrmerugu

invana-engine

Last released Jun 2, 2021

GraphQL API and Insights engine for Apache TinkerPop supported graph databases.

web-parsers

Last released Mar 20, 2020

Simple, extendable HTML and XML data extraction engine using YAML configurations and some times pythonic functions.

cf-loggers

Last released Oct 17, 2019

Logging module that logs data into elasticsearch. Supports async too.

invana-bot

Last released Aug 14, 2019

A web spider framework that can transform websites into datasets with Crawl, Transform and Index workflow.

invana-transformers

Last released Mar 11, 2019

A library to transforming JSON with parsers.

topic-suggestor

Last released Oct 30, 2018

A light-weight python module, that generate suggested topics for a given topic from the sources like google, bing.

webpage-reader

Last released Sep 17, 2018

Reads a webpage and extracts the information out of it, based on the HTML5 tags/classes

trawler

Last released Sep 6, 2018

A data gathering framework to search and get information from web sources

apache-beam-io-extras

Last released Aug 28, 2018

The missing I/O Transforms in python which already exist in Java SDK based on https://beam.apache.org/documentation/io/built-in/

web-crawler-plus

Last released Apr 1, 2018

A micro-framework to crawl the web pages with crawlers configs. It can use MongoDB, Elasticsearch and Solr databases to cache and save the extracted data.

django-thumbs-v2

Last released Mar 18, 2018

The easiest way to create thumbnails for your images with Django. Works with any storage backend.

11 projects