a library to analyse reply trees in forums and social media
Project description
Delab Trees
A library to analyze conversation trees.
Installation
pip install delab_trees
Get started
Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delab_trees/data/dataset_[reddit|twitter]_no_text.pkl. The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic access have been omitted.
The trees are loaded from tables like this:
| tree_id | post_id | parent_id | author_id | text | created_at | |
|---|---|---|---|---|---|---|
| 0 | 1 | 1 | nan | james | I am James | 2017-01-01 01:00:00 |
| 1 | 1 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 |
| 2 | 1 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 |
| 3 | 1 | 4 | 1 | john | I am John | 2017-01-01 04:00:00 |
| 4 | 2 | 1 | nan | james | I am James | 2017-01-01 01:00:00 |
| 5 | 2 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 |
| 6 | 2 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 |
| 7 | 2 | 4 | 3 | john | I am John | 2017-01-01 04:00:00 |
This dataset contains two conversational trees with four posts each.
Currently, you need to import conversational tables as a pandas dataframe like this:
import pandas as pd
from delab_trees import TreeManager
d = {'tree_id': [1] * 4,
'post_id': [1, 2, 3, 4],
'parent_id': [None, 1, 2, 1],
'author_id': ["james", "mark", "steven", "john"],
'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
"created_at": [pd.Timestamp('2017-01-01T01'),
pd.Timestamp('2017-01-01T02'),
pd.Timestamp('2017-01-01T03'),
pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df)
# creates one tree
test_tree = manager.random()
Note that the tree structure is based on the parent_id matching another rows post_id.
You can now analyze the reply trees basic metrics:
from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree
test_tree : DelabTree = get_test_tree()
assert test_tree.total_number_of_posts() == 4
assert test_tree.average_branching_factor() > 0
A summary of basic metrics can be attained by calling
from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree
test_tree : DelabTree = get_test_tree()
print(test_tree.get_author_metrics())
# >>> removed [] and changed {} (merging subsequent posts of the same author)
# >>>{'james': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5496110>, 'steven': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497dc0>, 'john': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497a00>, 'mark': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497bb0>}
More complex metrics that use the full dataset for training can be gotten by the manager:
import pandas as pd
from delab_trees import TreeManager
d = {'tree_id': [1] * 4,
'post_id': [1, 2, 3, 4],
'parent_id': [None, 1, 2, 1],
'author_id': ["james", "mark", "steven", "john"],
'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
"created_at": [pd.Timestamp('2017-01-01T01'),
pd.Timestamp('2017-01-01T02'),
pd.Timestamp('2017-01-01T03'),
pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df) # creates one tree
rb_vision_dictionary : dict["tree_id", dict["author_id", "vision_metric"]] = manager.get_rb_vision()
The following two complex metrics are implemented:
from delab_trees.main import get_test_manager
manager = get_test_manager()
rb_vision_dictionary = manager.get_rb_vision() # predict an author having seen a post
pb_vision_dictionary = manager.get_pb_vision() # predict an author to write the next post
How to cite
@article{dehne_dtrees_23,
author = {Dehne, Julian},
title = {Delab-Trees: measuring deliberation in online conversations},
url = {https://github.com/juliandehne/delab-trees}
year = {2023},
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file delab-trees-0.3.0.tar.gz.
File metadata
- Download URL: delab-trees-0.3.0.tar.gz
- Upload date:
- Size: 24.0 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
f6a7aa0bd393064364214ea6a1f1a58ca4386fab2dde6d453c27e9b3e0d162a9
|
|
| MD5 |
fe10325afa55d28adf4ff672e34a310b
|
|
| BLAKE2b-256 |
62e019a3b05d73cd1846824d776e2be904d7fef9344192c10e34d7c1afcff50c
|
File details
Details for the file delab_trees-0.3.0-py3-none-any.whl.
File metadata
- Download URL: delab_trees-0.3.0-py3-none-any.whl
- Upload date:
- Size: 24.2 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.2 CPython/3.10.6
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
b1282471407b24c605d5c8682f388532ca51cddae3cad65cf529773016b66d2a
|
|
| MD5 |
7022cb04926d4511bc4a0acd26a93dd0
|
|
| BLAKE2b-256 |
a14da39c94a8af6420e9c83a6d0957a1bc0ef4006599395abe2d8afcaca4bbf3
|