Clark University, Package for YouTube crawler and cleaning data
Project description
clarku-youtube-crawler
Clark University YouTube crawler and JSON decoder for YouTube json.
Version 0.0.1->0.0.3
This is beta without testing since python packaging is a pain. Please don't install these versions.
Version 0.0.5
Finally figured out testing. It works okay. More documentation to come. To install:
pip install clarku-youtube-crawler
Version 0.0.6
Stable release only for RawCrawler
feature
Version 1.0.0
Version 1.0.1
I think this might be our first full stable release.
Example usage
First, run only import to generate config.ini
from clarku_youtube_crawler import *
or
from clarku_youtube_crawler import RawCrawler, ChannelCrawler, JSONDecoder
After running import, go to config.ini
to configure file paths. Make sure DEVELOPER_KEY.txt
(or if the filename differs, configure also in config.ini
) is in the same folder. Then run:
test = RawCrawler.RawCrawler()
test.__build__()
test.crawl("food",start_date=1, start_month=12, start_year=2020, day_count=1)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()
channel = ChannelCrawler.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()
jsonn = JSONDecoder.JSONDecoder()
jsonn.load_json("FINAL_channel_merged.json")
If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt
here on this repo
and run
$ pip install -r requirements.txt
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clarku_youtube_crawler-1.0.1.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | baaa1ca53e5a6a5fe59942f359a7ecb84b8aa2141ed1440c6851f65bbdd00bd9 |
|
MD5 | 764a5ff67e45cb4cf5e741669bb2f2e4 |
|
BLAKE2b-256 | 7ba3158bc66cf8a5ceba38cac8f44ca7611b8bbf579f9f3a805b94dfc40abcf8 |
Hashes for clarku_youtube_crawler-1.0.1-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | a7b979b48593aec662785b5f57b3f0ba4b25e49ec4e5f1b513993c0f4437fde2 |
|
MD5 | f6e4606a8d8e36d3709b884eb6757e16 |
|
BLAKE2b-256 | 59191791f032697a7690bba6592b6a979e32801e1ade4d8d3f780aaeab8c9c5b |