Skip to main content

a library to analyse reply trees in forums and social media

Project description

Delab Trees

A library to analyze conversation trees.

Installation

pip install delab_trees

Get started

Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delab_trees/data/dataset_[reddit|twitter]_no_text.pkl. The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic access have been omitted.

The trees are loaded from tables like this:

tree_id post_id parent_id author_id text created_at
0 1 1 nan james I am James 2017-01-01 01:00:00
1 1 2 1 mark I am Mark 2017-01-01 02:00:00
2 1 3 2 steven I am Steven 2017-01-01 03:00:00
3 1 4 1 john I am John 2017-01-01 04:00:00
4 2 1 nan james I am James 2017-01-01 01:00:00
5 2 2 1 mark I am Mark 2017-01-01 02:00:00
6 2 3 2 steven I am Steven 2017-01-01 03:00:00
7 2 4 3 john I am John 2017-01-01 04:00:00

This dataset contains two conversational trees with four posts each.

Currently, you need to import conversational tables as a pandas dataframe like this:

import pandas as pd
from delab_trees import TreeManager

d = {'tree_id': [1] * 4,
     'post_id': [1, 2, 3, 4],
     'parent_id': [None, 1, 2, 1],
     'author_id': ["james", "mark", "steven", "john"],
     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
     "created_at": [pd.Timestamp('2017-01-01T01'),
                    pd.Timestamp('2017-01-01T02'),
                    pd.Timestamp('2017-01-01T03'),
                    pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df) 
# creates one tree
test_tree = manager.random()

Note that the tree structure is based on the parent_id matching another rows post_id.

You can now analyze the reply trees basic metrics:

from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree

test_tree : DelabTree = get_test_tree()
assert test_tree.total_number_of_posts() == 4
assert test_tree.average_branching_factor() > 0

A summary of basic metrics can be attained by calling

from delab_trees.main import get_test_tree
from delab_trees.delab_tree import DelabTree

test_tree : DelabTree = get_test_tree()
print(test_tree.get_author_metrics())

# >>> removed [] and changed {} (merging subsequent posts of the same author)
# >>>{'james': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5496110>, 'steven': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497dc0>, 'john': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497a00>, 'mark': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497bb0>}

More complex metrics that use the full dataset for training can be gotten by the manager:

import pandas as pd
from delab_trees import TreeManager

d = {'tree_id': [1] * 4,
     'post_id': [1, 2, 3, 4],
     'parent_id': [None, 1, 2, 1],
     'author_id': ["james", "mark", "steven", "john"],
     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],
     "created_at": [pd.Timestamp('2017-01-01T01'),
                    pd.Timestamp('2017-01-01T02'),
                    pd.Timestamp('2017-01-01T03'),
                    pd.Timestamp('2017-01-01T04')]}
df = pd.DataFrame(data=d)
manager = TreeManager(df) # creates one tree
rb_vision_dictionary : dict["tree_id", dict["author_id", "vision_metric"]] = manager.get_rb_vision()

The following two complex metrics are implemented:

from delab_trees.main import get_test_manager

manager = get_test_manager()
rb_vision_dictionary = manager.get_rb_vision() # predict an author having seen a post
pb_vision_dictionary = manager.get_pb_vision() # predict an author to write the next post

How to cite

    @article{dehne_dtrees_23,
    author    = {Dehne, Julian},
    title     = {Delab-Trees: measuring deliberation in online conversations},        
    url = {https://github.com/juliandehne/delab-trees}     
    year      = {2023},
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delab-trees-0.3.0.tar.gz (24.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delab_trees-0.3.0-py3-none-any.whl (24.2 MB view details)

Uploaded Python 3

File details

Details for the file delab-trees-0.3.0.tar.gz.

File metadata

  • Download URL: delab-trees-0.3.0.tar.gz
  • Upload date:
  • Size: 24.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for delab-trees-0.3.0.tar.gz
Algorithm Hash digest
SHA256 f6a7aa0bd393064364214ea6a1f1a58ca4386fab2dde6d453c27e9b3e0d162a9
MD5 fe10325afa55d28adf4ff672e34a310b
BLAKE2b-256 62e019a3b05d73cd1846824d776e2be904d7fef9344192c10e34d7c1afcff50c

See more details on using hashes here.

File details

Details for the file delab_trees-0.3.0-py3-none-any.whl.

File metadata

  • Download URL: delab_trees-0.3.0-py3-none-any.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.2 CPython/3.10.6

File hashes

Hashes for delab_trees-0.3.0-py3-none-any.whl
Algorithm Hash digest
SHA256 b1282471407b24c605d5c8682f388532ca51cddae3cad65cf529773016b66d2a
MD5 7022cb04926d4511bc4a0acd26a93dd0
BLAKE2b-256 a14da39c94a8af6420e9c83a6d0957a1bc0ef4006599395abe2d8afcaca4bbf3

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page