Archives YouTube community posts.
Project description
yt-community-post-archiver
Archives YouTube community posts. Will try and grab the post's text content, images at as large of a resolution as possible, polls, and some other various metadata.
Note this was written really quickly, and might not work every time (my Python is also a bit shit). It is also a bit fragile, and YT updates might break it. Feel free to let me know if it's broken, and if I have the bandwidth I'll try and fix it.
Usage
From the wheel
This is probably what you're going to want. From Releases install a wheel using Python.
-
Download one of the
.whlfiles from Releases -
Install the wheel file. For example, if the file you downloaded is called
yt_community_post_archiver-0.1.0-py3-none-any.whl:pip install yt_community_post_archiver-0.1.0-py3-none-any.whl
-
Run
yt-community-post-archiver. For example:yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community"
This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts it can find from the provided page, and save text metadata + images in an automatically created folder called
archive-outputin the same directory the program was called in. Note this will take a while!For info on the options you can use, run with
--help:yt-community-post-archiver --help
From the repo
-
Clone the repo.
-
(Optional) Create and source a venv:
python3 -m venv venv source venv/bin/activate
-
(Optional) Install
hatchif you do not already have it:pip3 install hatch
-
Make sure the computer you're running this on has Chrome or Firefox, as it uses a browser to grab posts.
-
Run the archiver using
hatch run yt-community-post-archiver. For example:hatch run yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community"
This will spawn a headless Chrome instance (that is, you won't see a Chrome window) and download all posts it can find from the provided page, and save text metadata + images in an automatically created folder called
archive-outputin the same directory the program was called in. Note this will take a while!For info on the options you can use, run with
--help:hatch run yt-community-post-archiver --help
Example
For example, let's say I ran:
hatch run yt-community-post-archiver "https://www.youtube.com/@IRyS/community" -o "output/testing" -m 1
This runs the archiver, directed to https://www.youtube.com/@IRyS/community, saving to output/testing, and gets
a maximum of one post.
At the time of writing, this gives me two files - post.json:
{
"url": "https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo",
"text": "Carbonated Love Wallpaper for those who love the thumbnail :D Courtesy of kanauru! Stream the song if you haven't yet!!\n\n⬇️FULL MV⬇️\nhttps://youtu.be/DjNNpw2x2dU?si=B0heA...",
"images": [
"https://yt3.ggpht.com/KfLmUOa22rydRozKY34zopeHP39EN0u_X5qLplQiKQd1i2rxxidrcG4RxH5s3ceGY9ql8VfIQgdA=s3840"
],
"links": [
"https://www.youtube.com/post/Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo",
"https://www.youtube.com/watch?v=DjNNpw2x2dU&t=0s",
],
"is_members": false,
"relative_date": "3 months ago",
"approximate_num_comments": "111",
"num_comments": "111",
"num_thumbs_up": "7.3K",
"poll": null,
"when_archived": "2024-10-16 05:20:18.045639+00:00"
}
and an image file (Ugkxbg1AcEsx5spUWRjgtF8cvXDDgUIW1SFo-0).
Set save location
If you want to set the save location, then use -o:
hatch run yt-community-post-archiver "https://www.youtube.com/@IRyS/community" -o "/home/me/my_save"
Logging in
You may want to provide a logged-in instance to this tool as this is the only way to get membership posts or certain details like poll vote percentages. The tool supports two methods:
Use browser profile
I've found this way works a bit better from personal experience. You can re-use an existing browser profile that is
logged into your YouTube account to grab membership posts with the -p flag, where the path is where your user
profiles are located (for example, in Chrome, you can find this with chrome://version). For example:
venv/bin/python archiver.py -o output/ -p ~/.config/chromium/ "https://www.youtube.com/@WatsonAmelia/membership"
By default this will use the default profile name; if you need to override this then use -n as well.
Use cookies file
Another method is if you have a Netscape-format cookies file, which you can pass the path with -c:
hatch run yt-community-post-archiver "https://www.youtube.com/@WatsonAmelia/community" -c "/home/me/my_cookies_file.txt"
Note that I've personally found this much flakier and occasionally fails in certain situations. It should
work fine if you just want to get a few posts though, and already have a cookie file for things like
ytarchive.
Use Firefox instead of Chrome as the driver
The default driver is Chrome, but Firefox should work as well.
hatch run yt-community-post-archiver "https://www.youtube.com/@PomuRainpuff/community" -d "firefox"
Notes
- Poll vote percentages can only be shown if you are logged in due to how vote results are only shown if the user has voted before.
- If you have not voted on the poll before, the tool will temporarily vote for you to grab vote percentages, but will then try to undo the vote to avoid messing with anything, but this isn't perfect!
Other
How does this work?
This is just a typical Selenium/BeautifulSoup program, that's it. As such, it's simulating being a user and manually copying + formatting all the data via a browser window. This is very evident if you disable headless mode, and see all the action.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distributions
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file yt_community_post_archiver-0.1.0-py3-none-any.whl.
File metadata
- Download URL: yt_community_post_archiver-0.1.0-py3-none-any.whl
- Upload date:
- Size: 21.9 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? Yes
- Uploaded via: twine/5.1.1 CPython/3.12.7
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
9d626c0139f5cf271931cadb53feec9682c5d1383c4839c0e747f2ec8ad4d558
|
|
| MD5 |
1c9f3716511c38f1054a0bfbd177c8a6
|
|
| BLAKE2b-256 |
0cbfebb118fbd477dd787411ba620930e801ec040e5de94f75d9eb1d71b287fb
|
Provenance
The following attestation bundles were made for yt_community_post_archiver-0.1.0-py3-none-any.whl:
Publisher:
build.yml on Pyreko/yt-community-post-archiver
-
Statement:
-
Statement type:
https://in-toto.io/Statement/v1 -
Predicate type:
https://docs.pypi.org/attestations/publish/v1 -
Subject name:
yt_community_post_archiver-0.1.0-py3-none-any.whl -
Subject digest:
9d626c0139f5cf271931cadb53feec9682c5d1383c4839c0e747f2ec8ad4d558 - Sigstore transparency entry: 147908923
- Sigstore integration time:
-
Permalink:
Pyreko/yt-community-post-archiver@b0053cc6df3d1d29ab3ee9b4dad32e8ca700db12 -
Branch / Tag:
refs/heads/fix_build - Owner: https://github.com/Pyreko
-
Access:
public
-
Token Issuer:
https://token.actions.githubusercontent.com -
Runner Environment:
github-hosted -
Publication workflow:
build.yml@b0053cc6df3d1d29ab3ee9b4dad32e8ca700db12 -
Trigger Event:
workflow_dispatch
-
Statement type: