Clark University, Package for YouTube crawler and cleaning data
Project description
clarku-youtube-crawler
Clark University YouTube crawler and JSON decoder for YouTube json. Please read documentation in DOCS
Installing
To install,
pip install clarku-youtube-crawler
The crawler needs multiple other packages to function.
If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt .
Navigate to the folder where it contains requirements.txt and run
pip install -r requirements.txt
Example usage
To initialize,
# your_script.py
import clarku_youtube_crawler as cu
test = cu.RawCrawler()
test.__build__()
test.crawl("searchkey",start_date=14, start_month=12, start_year=2020, day_count=2)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()
channel = cu.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1000, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()
jsonn = cu.JSONDecoder()
jsonn.load_json("YouTube_RAW_20201221/FINAL_raw_merged.json")
Changelog
Version 0.0.1->0.0.3
This is beta without testing since python packaging is a pain. Please don't install these versions.
Version 0.0.5
Finally figured out testing. It works okay. More documentation to come.
Version 0.0.6
Stable release only for RawCrawler feature
Version 1.0.0 Version 1.0.1
I think this might be our first full stable release.
Version 1.0.1.dev Pre-release
Added different file types for ChannelCrawler. Added documentation
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file clarku_youtube_crawler-1.1.13.tar.gz.
File metadata
- Download URL: clarku_youtube_crawler-1.1.13.tar.gz
- Upload date:
- Size: 13.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
34044fc8c5c6b1fc2a27ad50b034c05c411f59fed89465713109fb49a64026c8
|
|
| MD5 |
dda6552c0f5c7a577cf56e74aad83c66
|
|
| BLAKE2b-256 |
38315861ff23213e1fdd61304b4ebddffd7dac98189d1912a0e88b1f10eb2550
|
File details
Details for the file clarku_youtube_crawler-1.1.13-py3-none-any.whl.
File metadata
- Download URL: clarku_youtube_crawler-1.1.13-py3-none-any.whl
- Upload date:
- Size: 18.0 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/41.2.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.8.1
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
8639d76deb9628fc6bc742b4adc67f313ad17093edffabe161ab25d4d7b3ff49
|
|
| MD5 |
c2fe0fb3ca9f8ad73d9c7c3eefc6c872
|
|
| BLAKE2b-256 |
9e1739446c3307ec15f8f10d44aea7559ef03cdae1ee533a19f2882622df2c14
|