Observe PoI text data from the various sources, segment it and then inform about it
Project description
Obsei: OBserve, SEgment and Inform
Note: There are major breaking changes are on the way. Please use released version instead of master branch. To track progress of next release refer Release Progress.
Obsei
is intended to be a workflow automation tool for text segmentation need. Obsei
consist of -
- OBserver, observes platform like Twitter, Facebook, App Stores, Google reviews, Amazon reviews and feed that information to,
- SEgmenter, which perform text classification and sentiment analysis and feed that information to,
- Informer, which send it to ticketing system, data store or other places for further action and analysis.
Current flow -
A future concept (Coming Soon! :slightly_smiling_face:)
Release Progress
Following releases are on the way -
- v0.0.5: Documentation focused release
- v0.1.0: DAG support, CI improvements and few more (suggestions are welcome)
Installation
To use as SDK
Install via PyPi:
pip install obsei
Install from master branch (if you want to try the latest features):
git clone https://github.com/lalitpagaria/obsei.git
cd obsei
pip install --editable .
To update your installation, just do a git pull
. The --editable
flag
will update changes immediately.
To use as Rest interface
Start docker with default configuration file:
docker run -d --name obesi -p 9898:9898 lalitpagaria/obsei:latest
Start docker with custom configuration file (Assuming you have configfile config.yaml
at /home/user/obsei/config
at host machine):
docker run -d --name obesi -v "/home/user/obsei/config:/home/user/config" -e "OBSEI_CONFIG_PATH=/home/user/config" -e "OBSEI_CONFIG_FILENAME=config.yaml" -p 9898:9898 lalitpagaria/obsei:latest
Start docker locally with docker-compose
:
docker-compose up --build
Following environment variables are useful to customize various parameters -
OBSEI_CONFIG_PATH
: Configuration file path (default: ../config)OBSEI_CONFIG_FILENAME
: Configuration file name (default: rest.yaml)OBSEI_NUM_OF_WORKERS
: Number of workers for rest API server (default: 1)OBSEI_WORKER_TIMEOUT
: Worker idle timeout in seconds (default: 180)OBSEI_SERVER_PORT
: Rest API server port (default: 9898)OBSEI_WORKER_TYPE
: Gunicorn worker type (default: uvicorn.workers.UvicornWorker)
Use cases
Obsei
use cases are following, but not limited to -
- Automatic customer issue ticketing based on sentiment analysis
- Proper tagging of ticket like login issue, signup issue, delivery issue etc for faster disposal
- Checking effectiveness of social media marketing campaign
- Extraction of deeper insight from feedbacks on various platforms
- Research purpose
Components and Integrations
- Source/Observer: Twitter, Play Store Reviews, Apple App Store Reviews (Facebook, Instagram, Google reviews, Amazon reviews, Slack, Microsoft Team, Chat-bots etc planned in future)
- Analyzer/Segmenter: Sentiment and Text classification (QA, Natural Search, FAQ, Summarization etc planned in future)
- Sink/Informer: HTTP API, ElasticSearch, DailyGet, and Jira (Salesforce, Zendesk, Hubspot, Slack, Microsoft Team, etc planned in future)
- Processor/WorkflowEngine: Simple integration between Source, Analyser and Sink (Rich workflows using rule engine planned in future)
- Convertor: Very important part, which convert data from analyzer format to the format sink understand. It is very helpful in any customizations, refer
dailyget_sink.py
andjira_sink.py
.
Note: In order to use some integrations you would need credentials, refer following list -
- Twitter: To make authorized API call, get access from dev portal. Read about search api for more details.
- Play Store: To make authorized API calls, get service account's credentials. Read about review api for more details.
Examples and Screenshots
Refer example and config folders for obsei
usage and configurations.
Jira
Attribution
This could not have been possible without following open source software -
- searchtweets-v2: For Twitter's API v2 wrapper
- vaderSentiment: For rule-based sentiment analysis
- transformers: For text-classification pipeline
- tweet-preprocessor: For tweets preprocessing and cleaning
- atlassian-python-api: To interact with Jira
- elasticsearch: To interact with Elasticsearch
- hydra: To elegantly configuring Obsei
- apscheduler: To schedule task to execute desired workflow
- pydantic: For data validation
- sqlalchemy: As SQL toolkit to access DB storage
- fastapi & gunicorn: For HTTP server and API interface
- feedparser: To parse rss feed to fetch app store reviews
- google-play-scraper: To fetch the Google Play Store review without authentication
Contribution
Currently, we are not accepting any pull requests. If you want a feature or something doesn't work, please create an issue.
Changelog
Citing Obsei
If you use obsei
in your research please use the following BibTeX entry:
@Misc{Pagaria2020Obsei,
author = {Lalit Pagaria},
title = {Obsei - A workflow automation tool for text segmentation need},
howpublished = {Github},
year = {2020},
url = {https://github.com/lalitpagaria/obsei}
}
Acknowledgement
We would like to thank DailyGet for continuous support and encouragement. Please check DailyGet out. it is a platform which can easily be configured to solve any business process automation requirements.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.