Algorithms for Pillar. Currently includes "mini" algorithms, nothing too sophisticated.
Project description
Table of Contents
Build
To build and publish this package we are using the poetry python packager. It takes care of some background stuff that led to mistakes in the past.
Folder structure:
|-- pypi
|-- pillaralgos
|-- helpers
|-- __init__.py
|-- data_handler.py
|-- graph_helpers.py
|-- sanity_checks.py
|-- __init__.py # must include version number
|-- algoXX.py
|-- LICENSE
|-- README.md
|-- pyproject.toml # must include version number
To publish just run the poetry publish --build
command after update version numbers as needed.
Background
Pillar is creating an innovative way to automatically select and splice clips from Twitch videos for streamers. This repo is focusing on the algorithm aspect. Three main algorithms are being tested.
Algorithms
- Algorithm 1: Find the best moments in clips based on where the most users participated. Most is defined as the ratio of unique users during a 2 min section to unique users for the entire session.
- Algorithm 2 Find the best moments in clips based on when rate of messages per user peaked. This involves answering the question "at which 2 min segment do the most users send the most messages?". If users X, Y, and Z all send 60% of their messages at timestamp range delta, then that timestamp might qualify as a "best moment"
- NOTE: Currently answers the question "at which 2 min segment do users send the most messages fastest"
- Algorithm 3 (WIP) Weigh each user by their chat rate, account age, etc. Heavier users predicted to chat more often at "best moment" timestamps
- STATUS: current weight determined by (
num_words_of_user
/num_words_of_top_user
) - Algorithm 3.5 Finds the best moments in clips based on most number of words/emojis/both used in chat
- STATUS: current weight determined by (
Timeit results
Results as of 4/11/21 12:21am EST
run on big_df
with 1039228 rows, 11 columns.
algo1 | algo2 | algo3_0 | algo3_5 |
---|---|---|---|
3.4 sec | 3 min 14 sec | 39.4 sec | 28 sec |
Current Goal
To create one overarching algorithm that will find the most "interesting" clips in a twitch VOD. This will be created through the following steps:
- Creation of various algorithms that isolate
min_
(2 by default) minute chunks. The basic workflow:- Create variable (ex:
num_words
, for number of words in the body of a chat message) - Group df by
min_
chunks, then average/sum/etcnum_words
for eachmin_
chunks - Sort new df by
num_words
, from highest "value" to lowest "value" - Return this new df as json (example)
- Create variable (ex:
- Users rate clips provided by each algorithm
- Useless algorithms thrown away
- Rest of the algorithms merged into one overarching algorithm, with weights distributed based on user ratings
Long Term Goal
- New objective measure: community created clips (
ccc
) for a given VOD id with start/end timestamps for each clip - Assumption:
ccc
are interesting and can be used to create a narrative for each VOD. We can test this by cross referencing with posts to /r/livestreamfails upvotes/comments - Hypothesis: if we can predict where
ccc
would be created, those are potentially good clips to show the user- Short term test: Create a model to predict where ccc would be created using variables such as word count, chat rate, emoji usage, chat semantic analysis. We can do this by finding timestamps of ccc and correlating them with chat stats
- Medium term test: Use top 100 streamers as training data. What similarities do their ccc and reddit most upvoted of that VOD share? (chat rate etc)
- Get the transcript for these top 100
- Get the top 100's YT posted 15-30min story content for the 8 hour VOD
- Get the transcript for that story content
- Semantic analysis and correlations, etc.
- Long term test: what percentage of clips do our streamers actually end up using
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for pillaralgos-1.0.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 084bfd021ec5b7f63273e6f4c8e7e6567dbfd84c562717893f0747f8b0e11027 |
|
MD5 | 87c914634c4de5992619a4b17789559d |
|
BLAKE2b-256 | 3e9f697dd3aa4b341c897ecca1d151ddc620f8d8d9bd65ce21672bdb8e22989b |