Skip to main content

A Python library for interacting with 4chan in a programmatically-friendly way.

Project description

pychan master PyPI codecov

  1. Overview
  2. Usage
    1. Setup
    2. Iterating
      1. Single Board
      2. All Boards
    3. Data Available on Threads and Posts
    4. Other Features
      1. Get All Boards
      2. Fetch Posts for a Specific Thread
  3. Installation
  4. Contributing

Overview

pychan is a Python client for interacting with 4chan. 4chan does not have an official API, and attempts to implement one have been less maintained than desired, so instead, this library provides abstractions over interacting with (scraping) 4chan directly. pychan is object-oriented and its implementation is lazy wherever possible (using Python Generators) in order to optimize performance and minimize unnecessary blocking I/O operations.

Usage

Setup

from pychan.api import FourChan
from pychan.logger import LogLevel, PychanLogger

# With all logging disabled (default)
fourchan = FourChan()

# Configure logging explicitly
logger = PychanLogger(LogLevel.INFO)
fourchan = FourChan(logger)

Iterating

For all thread-level iteration, the generators this library returns will maintain internal state about which page of 4chan you are currently on. Threads are fetched one page at a time, up to page 10 (which is the highest page at which 4chan renders threads for any given board). Once page 10 is reached internally by the generator, it stops returning threads.

Single Board

# Iterate over all threads in /b/ lazily (Python Generator)
for thread in fourchan.get_threads_for_board("b"):
    # Iterate over all posts in each thread
    for post in fourchan.get_posts(thread):
        # Do stuff with the post
        print(post.text)

All Boards

Boards are visited in random order. For example, this function may perform the following sequence of operations:

  1. Query page 1 of /b/
  2. Query page 1 of /pol/
  3. Query page 2 of /b/ (because page 1 was visited already)
  4. Query page 1 of /int/
  5. (and so on)
# Iterate over all threads across all boards lazily (Python Generator)
for thread in fourchan.get_all_threads():
   # Iterate over all posts in each thread
   for post in fourchan.get_posts(thread):
      # Do stuff with the post
      print(post.text)

Data Available on Threads and Posts

The following table enumerates all the kinds of data that are available on the various models used by this library.

Entity Field Example Value(s)
pychan.models.Thread thread.board "b", "int"
pychan.models.Thread thread.number 882774935, 168484869
pychan.models.Thread thread.title None, "YLYL thread"
pychan.models.Post post.thread pychan.models.Thread
pychan.models.Post post.number 882774935, 882774974
pychan.models.Post post.is_original_post True, False
pychan.models.Post post.poster_id None, "BYagKQXI"
pychan.models.Post post.file None, pychan.models.File
pychan.models.File file.url "https://i.4cdn.org/pol/1658892700380132.jpg"
pychan.models.File file.name "wojak.jpg", "i feel alone.jpg"

Other Features

Get All Boards

This function fetches dynamically from 4chan. It is not a hard-coded list within pychan.

boards = fourchan.get_boards()
# Sample return value:
# ['a', 'b', 'c', 'd', 'e', 'g', 'gif', 'h', 'hr', 'k', 'm', 'o', 'p', 'r', 's', 't', 'u', 'v', 'vg', 'vm', 'vmg', 'vr', 'vrpg', 'vst', 'w', 'wg', 'i', 'ic', 'r9k', 's4s', 'vip', 'qa', 'cm', 'hm', 'lgbt', 'y', '3', 'aco', 'adv', 'an', 'bant', 'biz', 'cgl', 'ck', 'co', 'diy', 'fa', 'fit', 'gd', 'hc', 'his', 'int', 'jp', 'lit', 'mlp', 'mu', 'n', 'news', 'out', 'po', 'pol', 'pw', 'qst', 'sci', 'soc', 'sp', 'tg', 'toy', 'trv', 'tv', 'vp', 'vt', 'wsg', 'wsr', 'x', 'xs']

Fetch Posts for a Specific Thread

Warning: this will NOT work if the thread has become "stale" in 4chan and has entered an "archived" state. This happens to almost all threads after they have gone inactive long enough. Therefore, it is recommended to use the iterating-based functionality shown above instead of doing what is shown below.

from pychan.models import Thread

# Instantiate a Thread instance with which to query for posts
thread = Thread("int", 168484869)

# Note: the thread field in the returned posts will have a title if the thread had a title,
# regardless of whether you provided the title above - pychan will "auto-discover" the title and
# include it in the post models
posts = fourchan.get_posts(thread)

Installation

If you have Python >=3.10 and <4.0 installed, pychan can be installed from PyPI using something like

pip install pychan

Contributing

See CONTRIBUTING.md for developer-oriented information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychan-0.1.1.tar.gz (58.4 kB view details)

Uploaded Source

Built Distribution

pychan-0.1.1-py3-none-any.whl (20.3 kB view details)

Uploaded Python 3

File details

Details for the file pychan-0.1.1.tar.gz.

File metadata

  • Download URL: pychan-0.1.1.tar.gz
  • Upload date:
  • Size: 58.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pychan-0.1.1.tar.gz
Algorithm Hash digest
SHA256 dbcdd236c24cbb8895a643aec9485a910ce9b473f15fb2e35a964c5213814243
MD5 674dfaccb2a177d452967177643eaeb9
BLAKE2b-256 ae862236a8576b712c1e6297d639e51d688659257c98850dbba7df6c76a14d3b

See more details on using hashes here.

Provenance

File details

Details for the file pychan-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: pychan-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 20.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.13

File hashes

Hashes for pychan-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 0e1b6340d8ce85af953d0e665cd14c42d79c10d0cf060881e910f48965b96f25
MD5 de7a2a69838b8f151490a2e3a6b62bc0
BLAKE2b-256 a108c6e252b03e5d1b3b1aef270eaaec1048c9eeacf6195c84fb396f71ac5d47

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page