Algorithms for Pillar. Currently includes "mini" algorithms, nothing too sophisticated.

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Project description

Build and Publish
Background
1. Algorithms
2. Datasets
Current Goal
Long Term Goal

Build

To build and publish this package we are using the poetry python packager. It takes care of some background stuff that led to mistakes in the past.

Folder structure:

|-- pypi
    |-- pillaralgos
        |-- helpers
            |-- __init__.py
            |-- data_handler.py
            |-- graph_helpers.py
            |-- sanity_checks.py
        |-- __init__.py  # must include version number
        |-- algoXX.py
    |-- LICENSE
    |-- README.md
    |-- pyproject.toml  # must include version number

To publish just run the poetry publish --build command after update version numbers as needed.

Background

Pillar is creating an innovative way to automatically select and splice clips from Twitch videos for streamers. This repo is focusing on the algorithm aspect. Three main algorithms are being tested.

Algorithms

Algorithm 1: Find the best moments in clips based on where the most users participated. Most is defined as the ratio of unique users during a 2 min section to unique users for the entire session.
Algorithm 2 Find the best moments in clips based on when rate of messages per user peaked. This involves answering the question "at which 2 min segment do the most users send the most messages?". If users X, Y, and Z all send 60% of their messages at timestamp range delta, then that timestamp might qualify as a "best moment"
1. NOTE: Currently answers the question "at which 2 min segment do users send the most messages fastest"
Algorithm 3 (WIP) Weigh each user by their chat rate, account age, etc. Heavier users predicted to chat more often at "best moment" timestamps
1. STATUS: current weight determined by (num_words_of_user/num_words_of_top_user)
2. Algorithm 3.5 Finds the best moments in clips based on most number of words/emojis/both used in chat

Datasets:

Preliminary data prelim_df: 545 rows representing one 3 hour 35 minute 26 second twitch stream chat of Hearthstone by LiiHS
- Used to create initial json import and resulting df clean/merge function organize_twitch_chat
Big data big_df: 2409 rows representing one 7 hour 37 minute, 0 second twitch stream chat of Hearthstone by LiiHS
- Used to create all algorithms

Current Goal

To create one overarching algorithm that will find the most "interesting" clips in a twitch VOD. This will be created through the following steps:

Creation of various algorithms that isolate min_ (2 by default) minute chunks. The basic workflow:
1. Create variable (ex: num_words, for number of words in the body of a chat message)
2. Group df by min_ chunks, then average/sum/etc num_words for each min_ chunks
3. Sort new df by num_words, from highest "value" to lowest "value"
4. Return this new df as json (example)
Users rate clips provided by each algorithm
Useless algorithms thrown away
Rest of the algorithms merged into one overarching algorithm, with weights distributed based on user ratings

Long Term Goal

New objective measure: community created clips (ccc) for a given VOD id with start/end timestamps for each clip
Assumption: ccc are interesting and can be used to create a narrative for each VOD. We can test this by cross referencing with posts to /r/livestreamfails upvotes/comments
Hypothesis: if we can predict where ccc would be created, those are potentially good clips to show the user
- Short term test: Create a model to predict where ccc would be created using variables such as word count, chat rate, emoji usage, chat semantic analysis. We can do this by finding timestamps of ccc and correlating them with chat stats
- Medium term test: Use top 100 streamers as training data. What similarities do their ccc and reddit most upvoted of that VOD share? (chat rate etc)
  1. Get the transcript for these top 100
  2. Get the top 100's YT posted 15-30min story content for the 8 hour VOD
  3. Get the transcript for that story content
  4. Semantic analysis and correlations, etc.
- Long term test: what percentage of clips do our streamers actually end up using

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

View statistics for this project via Libraries.io, or by using our public dataset on Google BigQuery

Release history Release notifications | RSS feed

1.0.20

Aug 7, 2021

1.0.19

May 12, 2021

1.0.18

May 9, 2021

1.0.17

May 9, 2021

1.0.16

May 9, 2021

1.0.15

May 9, 2021

1.0.14

May 9, 2021

1.0.13

May 9, 2021

1.0.12

Apr 17, 2021

1.0.11

Apr 16, 2021

1.0.10

Apr 13, 2021

1.0.9

Apr 12, 2021

1.0.8

Apr 11, 2021

1.0.7

Apr 11, 2021

This version

1.0.6

Apr 11, 2021

1.0.5

Apr 10, 2021

1.0.4

Apr 10, 2021

1.0.3

Apr 10, 2021

1.0.2

Apr 10, 2021

1.0.1

Apr 9, 2021

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pillaralgos-1.0.6.tar.gz (14.2 kB view hashes)

Uploaded Apr 11, 2021 Source

Built Distribution

pillaralgos-1.0.6-py3-none-any.whl (16.1 kB view hashes)

Uploaded Apr 11, 2021 Python 3

Hashes for pillaralgos-1.0.6.tar.gz

Hashes for pillaralgos-1.0.6.tar.gz
Algorithm	Hash digest
SHA256	`e3177bbeaca4260d9c0e1c68c5a58fada0f7997d3ec4f8ba527af6602db48ea6`
MD5	`3b7cf3b5445b6f78623fe4850490f060`
BLAKE2b-256	`c60e6a5811bf3b91b8fae2d552e5110c04e7b89c922047920bfc2707666f5861`

Hashes for pillaralgos-1.0.6-py3-none-any.whl

Hashes for pillaralgos-1.0.6-py3-none-any.whl
Algorithm	Hash digest
SHA256	`d0bf9bb61f6c6c6c45af34c2de5517d09a75046dfe5785c0ebd60def1dc734c5`
MD5	`5f26198998ecc29fa2c7dd3ba8f51df5`
BLAKE2b-256	`1c26e7557a9cdc14ee23228c7f944ad49ec56e7c15712fd5d5b3ddc359da933a`