Makes a complete hostable archive of imageboard threads including images, HTML, and JSON.
Project description
archive-chan
archive-chan archives threads from 4chan and other imageboards, including images and/or thumbnails, thread HTML, JSON if available, and produces a list of referenced external links.
To archive a thread and host it using a simple webserver:
pip3 install -U archive-chan npm i -g http-server archive-chan https://boards.4channel.org/vg/thread/338253176 --runonce cd archive/4chan http-server -p 1234 -c-1
Then open http://localhost:1234/vg/thread/338253176 in your browser. All thumbnail images, javascript, CSS, etc should work properly.
Once you’ve verified it works fine locally, you can rsync the archive to your webserver.
Here’s an example of how you might serve 4chan threads in production. Suppose you own www.example.com, and you set up nginx so that /path/to/www/foo.html on your server ends up being served at www.example.com/foo.html. You can host 4chan threads like this:
rsync -Pa archive/4chan/ you@www.example.com:/path/to/www/
NOTE: You must ensure each board is at the root of your static server, or else 4chan’s JS won’t work properly! In other words, make sure that the threads are served at www.example.com/vg/thread/338253176, not www.example.com/archive/4chan/vg/thread/338253176. (Notice /vg/ isn’t after .com; that’s bad. The board name needs to be directly after your domain name, at the root, or else you won’t be able to follow replies since 4chan’s JS reply parser breaks for some reason.)
You can host the files using any webserver you like. Personally, I use nginx + CloudFlare.
archive-chan was forked from BASC Archiver, which seemed to be unmaintained since 2018. It wasn’t able to properly save 4chan threads (because 4chan’s image CDN subdomain had changed, which broke the regex), nor did it save threads in a format that could easily be hosted. So I created this quick fix for my needs in 2021, and released it as archive-chan so others could use it too.
If you have questions or want to report a bug, DM me on twitter! I’m @theshawwn; always happy to say hello.
(Or you can file a GitHub issue here.)
The original BASC Archiver README from 2018 appears verbatim below:
Introduction
The BASC Archiver is a Python library (packaged with the thread-archiver script) used to archive imageboard threads. It uses the 4chan API with the py4chan wrapper. Developers are free to use the BASC-Archiver library for some interesting third-party applications, as it is licensed under the LGPLv3.
It comes with a CLI interface for archiving threads, the thread-archiver, with a GUI interface under development.
The thread-archiver is designed to archive all content from a 4chan thread:
Download all images and/or thumbnails in given threads.
Download all child threads (threads referred to in a post).
Download a JSON dump of thread comments using the 4chan API.
Download the HTML page.
Convert links in HTML to use the downloaded images.
Download CSS/JS and convert HTML to use them.
Keep downloading until 404 (with a user-set delay).
Can be restarted at any time.
Threaded downloading to download multiple files at the same time.
The thread-archiver replaces the typical “Right-click Save As, Web Page Complete” action, which does not save full-sized images or JSON. It works as a guerilla, static HTML alternative to Fuuka.
Usage
Usage: thread-archiver <url>... [options] thread-archiver -h | --help thread-archiver -v | --version Options: --path=<string> Path to folder where archives will be saved [default: ./archive] --runonce Downloads the thread as it is presently, then exits --thread-check-delay=<float> Delay between checks of the same thread [default: 90] --delay=<float> Delay between file downloads [default: 0] --poll-delay=<float> Delay between thread checks [default: 20] --dl-threads-per-site=<int> Download threads to use per site [default: 5] --dl-thread-wait=<float> Seconds to wait between downloads on each thread [default: 0.1] --nothumbs Don't download thumbnails --thumbsonly Download thumbnails, no images --nojs Don't download javascript --nocss Don't download css --ssl Download using HTTPS --follow-children Follow threads linked in downloaded threads --follow-to-other-boards Follow linked threads, even if from other boards --silent Suppresses mundane printouts, prints what's important -v --verbose Printout more information than normal -h --help Show help -V --version Show version
Example
thread-archiver http://boards.4chan.org/b/res/423861837 --delay 5 --thumbsonly
Installation
The BASC-Archiver is designed for Python 3.x, and can be installed on Windows, Linux, or Mac OS X.
(Python2 has intractable ascii->unicode conversion errors, whereas Python 3.x stores all strings in unicode, so we strongly recommend using 3.x.)
New stable releases can be found on our Releases page, or installed with the PyPi package BASC-Archiver.
Linux and OSX
Make sure you have Python3 and pip3 installed. On Debian/Ubuntu, Fedora/Red Hat/CentOS, install the packages python3 and python3-pip . Here’s a Mac OS X Installation Guide.
Run pip3 install basc-archiver
Linux users must run this command as root, or prefix the command with sudo.
Run thread-archiver http://boards.4chan.org/etc/thread/12345
Threads will be saved in ./archive, but you can change that by supplying a directory with the --path= argument.
Windows
Download the latest release from our page.
Open up a command prompt window (cmd.exe), and move to the directory with thread-archiver.exe
Run thread-archiver.exe http://boards.4chan.org/etc/thread/12345
Using the Windows version will become simpler once we finish writing the GUI.
Android (CLI)
Note: This is a temporary solution until we put together some kind of Android GUI app.
Thanks to the QPython interpreter, you can effortlessly run the BASC-Archiver on your Android phone.
Install the QPython app from Google Play.
Open the QPython app, and swipe left to reach the menu.
Tap Package Index. Then scroll down and tap Pip Console.
Run the following commands (after starting the pip_install.py script):
pip install requests pip install basc-archiver
Now you can just open QPython, tap My QPython, tap pip_console, and run the following command with your own thread URL:
thread-archiver --path=/sdcard/ http://boards.4chan.org/qa/thread/23839
To run the script in the background, press the back button, and tap OK at the Run in Background prompt. You can stop the script anytime using Vol Down + C.
Note: On Android (CLI), it is important to set the path to /sdcard/, so the thread dump can be accessed from the /sdcard/archives/4chan/ folder.
Note: To update the BASC-Archiver on Android (CLI), you must open QPython, press the 3-dot menu button, scroll down and tap Reset Private Space. Then just reinstall the BASC-Archiver.
License
Bibliotheca Anonoma Imageboard Thread Archiver (BASC Archiver)
Copyright (C) 2014 Antonizoon Overtwater, Daniel Oaks. Licensed under the GNU Lesser General Public License v3.
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file archive-chan-0.1.4.tar.gz
.
File metadata
- Download URL: archive-chan-0.1.4.tar.gz
- Upload date:
- Size: 16.7 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/3.4.1 importlib_metadata/4.5.0 pkginfo/1.7.0 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.59.0 CPython/3.9.2
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 4498f4a9533fd559ef234f27461ec4b78d8aaca6dc1e58f6dfa4eb67b8790ed7 |
|
MD5 | 95c0b45f9fde1f1f635af9ce535e848a |
|
BLAKE2b-256 | 8626e1a57e7aebdd50ddf8f7ba76c32f8d4b072fd5035cad5a4ba841a80bb4d3 |