Clark University, Package for YouTube crawler and cleaning data
Project description
clarku-youtube-crawler
Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS
Installing
To install,
pip install clarku-youtube-crawler
The crawler needs multiple other packages to function.
If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt
.
Navigate to the folder where it contains requirements.txt and run
pip install -r requirements.txt
Example usage
To initialize,
# your_script.py
import clarku_youtube_crawler as cu
test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()
channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()
jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")
Changelog
Version 0.0.1->0.0.3
This is beta without testing since python packaging is a pain. Please don't install these versions.
Version 0.0.5
Finally figured out testing. It works okay. More documentation to come.
Version 0.0.6
Stable release only for RawCrawler
feature
Version 1.0.0
Version 1.0.1
I think this might be our first full stable release.
Version 1.0.1.dev
Pre-release
Added different file types for ChannelCrawler. Added documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Hashes for clarku_youtube_crawler-1.1.8.tar.gz
Algorithm | Hash digest | |
---|---|---|
SHA256 | 1f9b2d768aa3551edc4613f2fb2331d5e08e28a9fc7bc856e354d98d96c654f1 |
|
MD5 | 77192af4c0a118ada2cb7427beb2c19f |
|
BLAKE2b-256 | ac1f4ffb11e6b318f9c35f5ea5f4aeac8f3c459d15b6df17857b657b6d38d1d0 |
Hashes for clarku_youtube_crawler-1.1.8-py3-none-any.whl
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8b876dcdad20597ecbd422e9f485a4471a68219ca3ba13627479b547916ea615 |
|
MD5 | 244a8c2b5c0c6bf58e26f5d1facbc112 |
|
BLAKE2b-256 | d115051eafeb44948d6f2e924fa8aa29d7d5e53641df54867860587f7f214d03 |