Conversational dataset from the BYU PCCL Chit-Chat Challenge.
Project description
chitchat-dataset
Conversational dataset from the Chit-Chat Challenge.
download
curl -L git.io/ccc-dataset-json -o dataset.json
stats
- 7,168 conversations
- 258,145 utterances
- 1,315 unique participants
format
The dataset is a mapping from conversation UUID to a conversation:
{
"prompt": "What's the most interesting thing you've learned recently?",
"ratings": { "witty": "1", "int": 5, "upbeat": 5 },
"start": "2018-04-20T01:57:41",
"messages": [
[
{
"text": "Hello",
"timestamp": "2018-04-19T19:57:51",
"sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
}
],
[
{
"text": "I learned that the Queen of England's last corgi died",
"timestamp": "2018-04-19T19:58:14",
"sender": "bebad07e-15df-48c3-a04f-67db828503e3"
}
],
[
{
"text": "Wow that sounds so sad",
"timestamp": "2018-04-19T19:58:18",
"sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
},
{
"text": "was it a cardigan welsh corgi",
"timestamp": "2018-04-19T19:58:22",
"sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
},
{
"text": "?",
"timestamp": "2018-04-19T19:58:24",
"sender": "22578ac2-6317-44d5-8052-0a59076e0b96"
}
]
]
}
examples
A Python example using the Requests library:
import requests
for _id, convo in requests.get("https://git.io/ccc-dataset-json").json().items():
for message in convo["messages"]:
for utterance in message:
print(utterance["text"])
For more examples see examples/
.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
chitchat-dataset-0.1.2.tar.gz
(7.9 MB
view hashes)
Built Distribution
Close
Hashes for chitchat_dataset-0.1.2-py2.py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a509020ecb2ee4bd76e3a4be461e85db10f9bd680601521b8f9372d6aaacdf47 |
|
MD5 | 71c665c1cbb60315d551c9f6e2fb1689 |
|
BLAKE2b-256 | 64eaf605ca704555acf0281eb1ca582cb1cd56ad5a8b87d70eeda16a4f43b3f4 |