Skip to main content

Clark University, Package for YouTube crawler and cleaning data

Project description

clarku-youtube-crawler

Clark University YouTube crawler and JSON decoder for YouTube json.

Version 0.0.1->0.0.3

This is beta without testing since python packaging is a pain. Please don't install these versions.

Version 0.0.5

Finally figured out testing. It works okay. More documentation to come. To install:

pip install clarku-youtube-crawler

Version 0.0.6

Stable release only for RawCrawler feature

Version 1.0.0

I think this might be our first full stable release.

Example usage

First, run only import to generate config.ini

from clarku_youtube_crawler import *

or

from clarku_youtube_crawler import RawCrawler, ChannelCrawler, JSONDecoder

After running import, go to config.ini to configure file paths. Make sure DEVELOPER_KEY.txt (or if the filename differs, configure also in config.ini) is in the same folder. Then run:

test = RawCrawler.RawCrawler()
test.__build__()
test.crawl("food",start_date=1, start_month=12, start_year=2020, day_count=1)
test.crawl_videos_in_list(comment_page_count=1)
test.merge_all()

channel = ChannelCrawler.ChannelCrawler()
channel.__build__()
channel.setup_channel(subscriber_cutoff=1, keyword="")
channel.crawl()
channel.crawl_videos_in_list(comment_page_count=1)
channel.merge_all()

jsonn = JSONDecoder.JSONDecoder()
jsonn.load_json("FINAL_channel_merged.json")

If missing requirements (I already include all dependencies so it shouldn't happen), download requirements.txt here on this repo and run

$ pip install -r requirements.txt

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

clarku_youtube_crawler-1.0.0.tar.gz (10.1 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

clarku_youtube_crawler-1.0.0-py3-none-any.whl (17.8 kB view details)

Uploaded Python 3

File details

Details for the file clarku_youtube_crawler-1.0.0.tar.gz.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.0.tar.gz
  • Upload date:
  • Size: 10.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for clarku_youtube_crawler-1.0.0.tar.gz
Algorithm Hash digest
SHA256 843aceb92882ea513687d7201bed1250e31f9bd26863e5c088acb66f707ca763
MD5 fbc6e0f1cb2a8b6fd675140881ac6159
BLAKE2b-256 a33f6cba36f4b0db2b84670715f2c5a010ab34a039dce9c212273dd5ad3e1ce9

See more details on using hashes here.

File details

Details for the file clarku_youtube_crawler-1.0.0-py3-none-any.whl.

File metadata

  • Download URL: clarku_youtube_crawler-1.0.0-py3-none-any.whl
  • Upload date:
  • Size: 17.8 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.2.0 pkginfo/1.5.0.1 requests/2.22.0 setuptools/51.0.0 requests-toolbelt/0.9.1 tqdm/4.42.1 CPython/3.7.6

File hashes

Hashes for clarku_youtube_crawler-1.0.0-py3-none-any.whl
Algorithm Hash digest
SHA256 04a39c5a411c92bec3b3685304c812372b2f4b052a420224cd614a2f92bf8dce
MD5 4846074c4b27d4c6f18286d5a3ce09db
BLAKE2b-256 5d7c45d687de69f77cf75c0abff96e1f7ad9129d9acd1b76d074cf73611f4461

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page