Turn StackExchange dumps into ZIM files for offline usage
Project description
Sotoki
sotoki
(stackoverflow to kiwix) is an OpenZIM scraper to create offline versions of Stack Exchange websites such as stack overflow.
It is based on Stack Exchange's Data Dumps hosted by The Internet Archive.
⚠️ Warning
sotoki
is undergoing a major rewrite to use libzim7 and its python binding in order to bypass filesystem limitations seen in version 1.x
. Use tagged version until this warning is removed as current master is not-functionnal.
Usage
sotoki
works off a domain
that you must provide. That is the domain-name of the stackexchange website you want to scrape. Run sotoki --list-all
to get a list of those
Note: when running off the git repository, you'll need to download a few external dependencies that we pack in Python releases. Just run python src/sotoki/dependencies.py
Docker
docker run -v my_dir:/output openzim/sotoki sotoki --help
Virtualenv
sotoki
is a Python3 software. If you are not using the Docker image, you are advised to use it in a virtual environment to avoid installing software dependencies on your system.
python3 -m venv env # Create virtualenv
source env/bin/Activate # Activate the virtualenv
pip3 install sotoki # Install dependencies
sotoki --help # Display kolibri2zim help
Call deactivate
to quit the virtual environment.
See requirements.txt
for the list of python dependencies.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file sotoki-2.0.0.tar.gz
.
File metadata
- Download URL: sotoki-2.0.0.tar.gz
- Upload date:
- Size: 1.6 MB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 73758f53d5d20197ecc31235ed3ec92234c1292c66de7bc6d7e64fef9749dab7 |
|
MD5 | ef454914512cb69cb8d0dafff9637c2e |
|
BLAKE2b-256 | 283332829978351917ec15bddcc9886d648d6b33fff4733497a8cfd019e334fc |
File details
Details for the file sotoki-2.0.0-py3-none-any.whl
.
File metadata
- Download URL: sotoki-2.0.0-py3-none-any.whl
- Upload date:
- Size: 1.6 MB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.0 CPython/3.8.6
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 983f0d2b17bbcb872910855f1eff6a7afd4130633fbe63943a86e118fe79d289 |
|
MD5 | 0c94c49153ca2f8f7a57686947f9e98e |
|
BLAKE2b-256 | 130cb1ede2f96b1b559c3ec922db7962ca6e76d2db0595d350c78414145023b9 |