A Python library for interacting with 4chan in a programmatically-friendly way.
Project description
pychan
Overview
pychan
is a Python client for interacting with 4chan. 4chan does not have an official API, and
attempts to implement one have been less maintained than desired, so instead, this library provides
abstractions over interacting with (scraping) 4chan directly. pychan
is object-oriented and its
implementation is lazy wherever possible (using Python Generators) in order to optimize performance
and minimize unnecessary blocking I/O operations.
Usage
Setup
from pychan.api import FourChan
from pychan.logger import LogLevel, PychanLogger
# With all logging disabled (default)
fourchan = FourChan()
# Configure logging explicitly
logger = PychanLogger(LogLevel.INFO)
fourchan = FourChan(logger)
Iterating
For all thread-level iteration, the generators this library returns will maintain internal state about which page of 4chan you are currently on. Threads are fetched one page at a time, up to page 10 (which is the highest page at which 4chan renders threads for any given board). Once page 10 is reached internally by the generator, it stops returning threads.
Single Board
# Iterate over all threads in /b/ lazily (Python Generator)
for thread in fourchan.get_threads_for_board("b"):
# Iterate over all posts in each thread
for post in fourchan.get_posts(thread):
# Do stuff with the post
print(post.text)
All Boards
Boards are visited in random order. For example, this function may perform the following sequence of operations:
- Query page 1 of /b/
- Query page 1 of /pol/
- Query page 2 of /b/ (because page 1 was visited already)
- Query page 1 of /int/
- (and so on)
# Iterate over all threads across all boards lazily (Python Generator)
for thread in fourchan.get_all_threads():
# Iterate over all posts in each thread
for post in fourchan.get_posts(thread):
# Do stuff with the post
print(post.text)
Data Available on Threads and Posts
The following table enumerates all the kinds of data that are available on the various models used by this library.
Entity | Field | Example Value(s) |
---|---|---|
pychan.models.Thread |
thread.board |
"b" , "int" |
pychan.models.Thread |
thread.number |
882774935 , 168484869 |
pychan.models.Thread |
thread.title |
None , "YLYL thread" |
pychan.models.Post |
post.thread |
pychan.models.Thread |
pychan.models.Post |
post.number |
882774935 , 882774974 |
pychan.models.Post |
post.is_original_post |
True , False |
pychan.models.Post |
post.poster_id |
None , "BYagKQXI" |
pychan.models.Post |
post.file |
None , pychan.models.File |
pychan.models.File |
file.url |
"https://i.4cdn.org/pol/1658892700380132.jpg" |
pychan.models.File |
file.name |
"wojak.jpg" , "i feel alone.jpg" |
Other Features
Get All Boards
This function fetches dynamically from 4chan. It is not a hard-coded list within pychan
.
boards = fourchan.get_boards()
# Sample return value:
# ['a', 'b', 'c', 'd', 'e', 'g', 'gif', 'h', 'hr', 'k', 'm', 'o', 'p', 'r', 's', 't', 'u', 'v', 'vg', 'vm', 'vmg', 'vr', 'vrpg', 'vst', 'w', 'wg', 'i', 'ic', 'r9k', 's4s', 'vip', 'qa', 'cm', 'hm', 'lgbt', 'y', '3', 'aco', 'adv', 'an', 'bant', 'biz', 'cgl', 'ck', 'co', 'diy', 'fa', 'fit', 'gd', 'hc', 'his', 'int', 'jp', 'lit', 'mlp', 'mu', 'n', 'news', 'out', 'po', 'pol', 'pw', 'qst', 'sci', 'soc', 'sp', 'tg', 'toy', 'trv', 'tv', 'vp', 'vt', 'wsg', 'wsr', 'x', 'xs']
Fetch Posts for a Specific Thread
Warning: this will NOT work if the thread has become "stale" in 4chan and has entered an "archived" state. This happens to almost all threads after they have gone inactive long enough. Therefore, it is recommended to use the iterating-based functionality shown above instead of doing what is shown below.
from pychan.models import Thread
# Instantiate a Thread instance with which to query for posts
thread = Thread("int", 168484869)
# Note: the thread field in the returned posts will have a title if the thread had a title,
# regardless of whether you provided the title above - pychan will "auto-discover" the title and
# include it in the post models
posts = fourchan.get_posts(thread)
Installation
If you have Python >=3.10 and <4.0 installed, pychan
can be installed from PyPI using
something like
pip install pychan
Contributing
See CONTRIBUTING.md for developer-oriented information.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file pychan-0.1.1.tar.gz
.
File metadata
- Download URL: pychan-0.1.1.tar.gz
- Upload date:
- Size: 58.4 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dbcdd236c24cbb8895a643aec9485a910ce9b473f15fb2e35a964c5213814243 |
|
MD5 | 674dfaccb2a177d452967177643eaeb9 |
|
BLAKE2b-256 | ae862236a8576b712c1e6297d639e51d688659257c98850dbba7df6c76a14d3b |
Provenance
File details
Details for the file pychan-0.1.1-py3-none-any.whl
.
File metadata
- Download URL: pychan-0.1.1-py3-none-any.whl
- Upload date:
- Size: 20.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.9.13
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 0e1b6340d8ce85af953d0e665cd14c42d79c10d0cf060881e910f48965b96f25 |
|
MD5 | de7a2a69838b8f151490a2e3a6b62bc0 |
|
BLAKE2b-256 | a108c6e252b03e5d1b3b1aef270eaaec1048c9eeacf6195c84fb396f71ac5d47 |