Skip to main content

a library to analyse reply trees in forums and social media

Project description

Delab Trees

A library to analyze conversation trees.

Installation

pip install delab_trees

Get started

Example data for Reddit and Twitter are available here https://github.com/juliandehne/delab-trees/raw/main/delab_trees/data/dataset_[reddit|twitter]_no_text.pkl.

The data is structure only. Ids, text, links, or other information that would break confidentiality of the academic

access have been omitted.

The trees are loaded from tables like this:

| | tree_id | post_id | parent_id | author_id | text | created_at |

|---:|----------:|----------:|------------:|:------------|:------------|:--------------------|

| 0 | 1 | 1 | nan | james | I am James | 2017-01-01 01:00:00 |

| 1 | 1 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 |

| 2 | 1 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 |

| 3 | 1 | 4 | 1 | john | I am John | 2017-01-01 04:00:00 |

| 4 | 2 | 1 | nan | james | I am James | 2017-01-01 01:00:00 |

| 5 | 2 | 2 | 1 | mark | I am Mark | 2017-01-01 02:00:00 |

| 6 | 2 | 3 | 2 | steven | I am Steven | 2017-01-01 03:00:00 |

| 7 | 2 | 4 | 3 | john | I am John | 2017-01-01 04:00:00 |

This dataset contains two conversational trees with four posts each.

Currently, you need to import conversational tables as a pandas dataframe like this:

import pandas as pd

from delab_trees import TreeManager



d = {'tree_id': [1] * 4,

     'post_id': [1, 2, 3, 4],

     'parent_id': [None, 1, 2, 1],

     'author_id': ["james", "mark", "steven", "john"],

     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],

     "created_at": [pd.Timestamp('2017-01-01T01'),

                    pd.Timestamp('2017-01-01T02'),

                    pd.Timestamp('2017-01-01T03'),

                    pd.Timestamp('2017-01-01T04')]}

df = pd.DataFrame(data=d)

manager = TreeManager(df) 

# creates one tree

test_tree = manager.random()

Note that the tree structure is based on the parent_id matching another rows post_id.

You can now analyze the reply trees basic metrics:

from delab_trees.main import get_test_tree

from delab_trees.delab_tree import DelabTree



test_tree : DelabTree = get_test_tree()

assert test_tree.total_number_of_posts() == 4

assert test_tree.average_branching_factor() > 0

A summary of basic metrics can be attained by calling

from delab_trees.test_data_manager import get_test_tree

from delab_trees.delab_tree import DelabTree



test_tree : DelabTree = get_test_tree()

print(test_tree.get_author_metrics())



# >>> removed [] and changed {} (merging subsequent posts of the same author)

# >>>{'james': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5496110>, 'steven': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497dc0>, 'john': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497a00>, 'mark': <delab_trees.delab_author_metric.AuthorMetric object at 0x7fa9c5497bb0>}

More complex metrics that use the full dataset for training can be gotten by the manager:

import pandas as pd

from delab_trees import TreeManager



d = {'tree_id': [1] * 4,

     'post_id': [1, 2, 3, 4],

     'parent_id': [None, 1, 2, 1],

     'author_id': ["james", "mark", "steven", "john"],

     'text': ["I am James", "I am Mark", " I am Steven", "I am John"],

     "created_at": [pd.Timestamp('2017-01-01T01'),

                    pd.Timestamp('2017-01-01T02'),

                    pd.Timestamp('2017-01-01T03'),

                    pd.Timestamp('2017-01-01T04')]}

df = pd.DataFrame(data=d)

manager = TreeManager(df) # creates one tree

rb_vision_dictionary : dict["tree_id", dict["author_id", "vision_metric"]] = manager.get_rb_vision()

The following two complex metrics are implemented:

from delab_trees.test_data_manager import get_test_manager



manager = get_test_manager()

rb_vision_dictionary = manager.get_rb_vision() # predict an author having seen a post

pb_vision_dictionary = manager.get_pb_vision() # predict an author to write the next post

How to cite

    @article{dehne_dtrees_23,

    author    = {Dehne, Julian},

    title     = {Delab-Trees: measuring deliberation in online conversations},        

    url = {https://github.com/juliandehne/delab-trees}     

    year      = {2023},

}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

delab_trees-0.4.2.tar.gz (24.0 MB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

delab_trees-0.4.2-py3-none-any.whl (24.2 MB view details)

Uploaded Python 3

File details

Details for the file delab_trees-0.4.2.tar.gz.

File metadata

  • Download URL: delab_trees-0.4.2.tar.gz
  • Upload date:
  • Size: 24.0 MB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for delab_trees-0.4.2.tar.gz
Algorithm Hash digest
SHA256 08acdad46589ebe53a4d9a11dbb0f8006a63b33be86944533b68325691c2185f
MD5 32b11d158e5d39b67688267abf4b00b5
BLAKE2b-256 fde53afa157182f3031cee0df1fda67da66e7b70c1f00c5bd91cd6182b1d7284

See more details on using hashes here.

File details

Details for the file delab_trees-0.4.2-py3-none-any.whl.

File metadata

  • Download URL: delab_trees-0.4.2-py3-none-any.whl
  • Upload date:
  • Size: 24.2 MB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.9.18

File hashes

Hashes for delab_trees-0.4.2-py3-none-any.whl
Algorithm Hash digest
SHA256 fcb264b8ca0129dc7a783cd091516e005ae90e92fa52417e613587259eff54b4
MD5 7b0f9e80bf7106a36bb5bc9daf373544
BLAKE2b-256 95f9cb783a52212bef97cd9f64888f5ac42b0c354935571b9274cc82a1809501

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page