Skip to main content

ChatDB is a toolkit to easily store chat messages in DB.

Project description

ChatDB for NLP

ChatDB is a toolkit to easily store the conversation such as chat messages in a database. You can use ChatDB as a way of storing text in a stage of collecting data for NLP.

DBMS: Neo4j

Installation

You can choose either A or B.

A. The case to use Neo4j Desktop

If you will work on a host OS and use Neo4j Desktop, it is recommended to install ChatDB from the PyPI:

pip install chatdb

Download Neo4j Desktop from the following: https://neo4j.com/download/

B. The case to use Neo4j on a Docker container

You can use Git to clone the repository from GitHub:

git clone https://github.com/A03ki/chatdb.git
cd chatdb

If you will work on a host OS:

pip install -e .
docker-compose up -d db

If you will work on a docker container:

docker-compose up -d
docker-compose exec app /bin/sh -c "[ -e /bin/bash ] && /bin/bash || /bin/sh"

Usage

First, store the text data in a database.

from chatdb import Graph, Status

# Create Status
s1 = Status(text="How are you today?")
s2 = Status(text="I’m okay, thanks. And you?")
s3 = Status(text="I’m awesome.")

# Construct a relationship between Statuses
s1.reply_from(s2)  # s2.reply_to(s1)
s2.reply_from(s3)  # s3.reply_to(s2)

# Create the handler for Neo4j
# Work on a docker container
graph = Graph("bolt://db:7687", password="your_password")

# Work on a host OS
# graph = Graph("bolt://localhost:7687", password="your_password")

# Store data
graph.merge(s2)

Next, extract the text from a database.

from chatdb import Graph, TextOutputer, Status

graph = Graph("bolt://db:7687", password="your_password")
# graph = Graph("bolt://localhost:7687", password="your_password")

outputer = TextOutputer(graph)

print(outputer.match([Status]).extract_text())

print(outputer.match([Status]*2).extract_text())

print(outputer.match([Status]*3).extract_text())

Output:

[['I’m okay, thanks. And you?'], ['How are you today?'], ['I’m awesome.']]
[['I’m okay, thanks. And you?', 'I’m awesome.'], ['How are you today?', 'I’m okay, thanks. And you?']]
[['How are you today?', 'I’m okay, thanks. And you?', 'I’m awesome.']]

You can also use the Neo4j Browser to check data.

Try to go to http://localhost:7474 in your web browser and run the query which is MATCH (n:Status) RETURN n.

Check data at http://localhost:7474

https://raw.githubusercontent.com/optuna/optuna/master/

How to delete all data: MATCH (n:Status) DETACH DELETE n

For more information on how to use Neo4j Browser, see https://neo4j.com/developer/neo4j-browser/.

Support for collecting Tweet data

pip install tweepy

This example will store the timeline of Twitter, Inc and the tweet which this account are replying to.

import tweepy
from chatdb import Graph, SimpleTweetStatus
from chatdb.tools import TweetArchiver

auth = tweepy.OAuthHandler(consumer_key, consumer_secret)
auth.set_access_token(access_token, access_token_secret)
api = tweepy.API(auth, wait_on_rate_limit=True,
                 wait_on_rate_limit_notify=True)

graph = Graph("bolt://db:7687", password="your_password")
# graph = Graph("bolt://localhost:7687", password="your_password")

archiver = TweetArchiver(graph, SimpleTweetStatus)

statuses = api.user_timeline(screen_name="Twitter")
for status in statuses:
    in_reply_to_status_id_str = status.in_reply_to_status_id_str
    if in_reply_to_status_id_str:
        in_reply_to_status = api.get_status(in_reply_to_status_id_str)
        archiver.add_status(**in_reply_to_status._json)
    archiver.add_status(**status._json)

For more information on how to use Tweepy, see Tweepy Documentation.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

chatdb-0.1.0.tar.gz (6.5 kB view details)

Uploaded Source

Built Distribution

chatdb-0.1.0-py3-none-any.whl (7.5 kB view details)

Uploaded Python 3

File details

Details for the file chatdb-0.1.0.tar.gz.

File metadata

  • Download URL: chatdb-0.1.0.tar.gz
  • Upload date:
  • Size: 6.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for chatdb-0.1.0.tar.gz
Algorithm Hash digest
SHA256 48dcf53cb6a4d32f5bf44518502b67a8345e868d1a55b98d998586119252b99d
MD5 4566cca746447a63580462255647c5a5
BLAKE2b-256 5358a20685f69f27c941a075535d27328a950f0bb8a2f78b2937c0df04cc9f19

See more details on using hashes here.

File details

Details for the file chatdb-0.1.0-py3-none-any.whl.

File metadata

  • Download URL: chatdb-0.1.0-py3-none-any.whl
  • Upload date:
  • Size: 7.5 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.6.0 requests/2.24.0 setuptools/45.1.0 requests-toolbelt/0.9.1 tqdm/4.50.2 CPython/3.7.6

File hashes

Hashes for chatdb-0.1.0-py3-none-any.whl
Algorithm Hash digest
SHA256 c29f9ebc52f72271b160b67432390b7ff7858faf0e22fa41203ca39157596468
MD5 a9817e28604cb6d1523c74536f4b11d4
BLAKE2b-256 465a2c5ce71b3b41e363ccc4980f38e1d13aecd9f7bffdfb3cb636515680124a

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page