Skip to main content

A Python library for interacting with 4chan in a programmatically-friendly way.

Project description

pychan master PyPI codecov

  1. Overview
  2. Installation
  3. Usage
    1. General Notes
    2. Setup
    3. Fetch Board Names
    4. Fetch Threads
    5. Fetch Archived Threads
    6. Search 4chan
    7. Fetch Posts for a Specific Thread
  4. pychan Models
    1. Threads
    2. Posts
      1. A Note About Replies
    3. Posters
    4. Files
  5. Contributing

Overview

pychan is a Python client for interacting with 4chan. 4chan does not have an official API, and attempts to implement one by third parties have tended to languish, so instead, this library provides abstractions over interacting with (scraping) 4chan directly. pychan is object-oriented and its implementation is lazy where reasonable (using Python Generators) in order to optimize performance and minimize superfluous blocking I/O operations.

Installation

If you have Python >=3.9 and <4.0 installed, pychan can be installed from PyPI using something like

pip install pychan

Usage

General Notes

All 4chan interactions are throttled internally by sleeping the executing thread. If you execute pychan in a multithreaded way, you will not get the benefits of this throttling. pychan does not take responsibility for the consequences of excessive HTTP requests in such cases.

For all thread-level iteration shown below, the generators returned will maintain internal state about which page of 4chan you are currently on. Threads are fetched one page at a time up to page 10 (which is the highest page at which 4chan renders threads for any given board). Once page 11 is reached internally by the generator, it stops returning threads.

Setup

from pychan import FourChan, LogLevel, PychanLogger

# With all logging disabled (default)
fourchan = FourChan()

# Tell pychan to gracefully ignore HTTP exceptions, if any, within its internal logic
fourchan = FourChan(raise_http_exceptions=False)

# Tell pychan to gracefully ignore parsing exceptions, if any, within its internal logic
fourchan = FourChan(raise_parsing_exceptions=False)

# Configure logging explicitly
logger = PychanLogger(LogLevel.INFO)
fourchan = FourChan(logger=logger)

# Use all of the above settings at once
logger = PychanLogger(LogLevel.INFO)
fourchan = FourChan(logger=logger, raise_http_exceptions=True, raise_parsing_exceptions=True)

The rest of the examples in this README assume that you have already created an instance of the FourChan class as shown above.

Fetch Board Names

This function dynamically fetches boards from 4chan at call time.

Note: boards which are not compatible with pychan are not returned in this list.

boards = fourchan.get_boards()
# Sample return value:
# ['a', 'b', 'c', 'd', 'e', 'g', 'gif', 'h', 'hr', 'k', 'm', 'o', 'p', 'r', 's', 't', 'u', 'v', 'vg', 'vm', 'vmg', 'vr', 'vrpg', 'vst', 'w', 'wg', 'i', 'ic', 'r9k', 's4s', 'vip', 'qa', 'cm', 'hm', 'lgbt', 'y', '3', 'aco', 'adv', 'an', 'bant', 'biz', 'cgl', 'ck', 'co', 'diy', 'fa', 'fit', 'gd', 'hc', 'his', 'int', 'jp', 'lit', 'mlp', 'mu', 'n', 'news', 'out', 'po', 'pol', 'pw', 'qst', 'sci', 'soc', 'sp', 'tg', 'toy', 'trv', 'tv', 'vp', 'vt', 'wsg', 'wsr', 'x', 'xs']

Fetch Threads

# Iterate over all threads in /b/ lazily (Python Generator)
for thread in fourchan.get_threads("b"):
    # Do stuff with the thread
    print(thread.title)
    # You can also iterate over all the posts in the thread
    for post in fourchan.get_posts(thread):
        # Do stuff with the post - refer to the model documentation in pychan's README for details
        print(post.text)

Fetch Archived Threads

Note: some boards do not have an archive (e.g. /b/). Such boards will either return an empty list or raise an exception depending on how you have configured your FourChan instance.

The threads returned by this function will always have a title field containing the text shown in 4chan's interface under the "Excerpt" column header. This text can be either the thread's real title or a preview of the original post's text. Passing any of the threads returned by this method to the get_posts() method will automatically correct the title field (if necessary) on the thread that gets attached to the returned posts. See Fetch Posts for a Specific Thread for more details.

Technically, pychan could address the title behavior described above by issuing an additional HTTP request for each thread to get its real title, but in the spirit of making the smallest number of HTTP requests possible, pychan directly uses the excerpt instead.

# Unlike get_threads(), the get_archived_threads() method returns a list instead of a Python Generator
for thread in fourchan.get_archived_threads("pol"):
    # Do stuff with the thread
    print(thread.title)
    # You can also iterate over all the posts in the thread
    for post in fourchan.get_posts(thread):
        # Do stuff with the post - refer to the model documentation in pychan's README for details
        print(post.text)

Search 4chan

Note: closed/stickied/archived threads are never returned in search results.

# Iterate over all threads returned in the search results lazily (Python Generator)
for thread in fourchan.search(board="b", text="ylyl"):
    # The thread object is the same class as the one returned by get_threads()
    for post in fourchan.get_posts(thread):
       # Do stuff with the post - refer to the model documentation in pychan's README for details
       print(post.text)

Fetch Posts for a Specific Thread

from pychan.models import Thread

# Instantiate a Thread instance with which to query for posts
thread = Thread("int", 168484869)

# Note: the thread contained within the returned posts will have all applicable metadata (such as
# title and sticky status), regardless of whether you provided such data above - pychan will
# "auto-discover" all metadata and include it in the post models' copy of the thread
posts = fourchan.get_posts(thread)

pychan Models

The following tables summarize all the kinds of data that are available on the various models used by this library.

Also note that all model classes in pychan implement the following methods:

  • __repr__
  • __str__
  • __hash__
  • __eq__
  • __copy__
  • __deepcopy__

Threads

The table below corresponds to the pychan.models.Thread class.

Field Type Example Value(s)
thread.board str "b", "int"
thread.number int 882774935, 168484869
thread.title Optional[str] None, "YLYL thread"
thread.is_stickied bool True, False
thread.is_closed bool True, False
thread.is_archived bool True, False

Posts

The table below corresponds to the pychan.models.Post class.

Field Type Example Value(s)
post.thread Thread pychan.models.Thread
post.number int 882774935, 882774974
post.timestamp datetime.datetime datetime.datetime
post.poster Poster pychan.models.Poster
post.text str ">be me\n>be bored\n>write pychan\n>somehow it works"
post.is_original_post bool True, False
post.file Optional[File] None, pychan.models.File
post.replies list[Post] [], [pychan.models.Post, pychan.models.Post]

A Note About Replies

The replies field shown above is purely a convenience feature pychan provides for accessing all posts within a thread that use the >> operator to "reply" to a post earlier in the thread. If you were to iterate over all posts in a thread via get_posts(), you would obtain all posts and their replies (in the order they were posted) as a single list. You do not have to access the replies field to access all the posts in a given thread.

Posters

The table below corresponds to the pychan.models.Poster class.

Field Type Example Value(s)
poster.name str "Anonymous"
poster.is_moderator bool True, False
poster.id Optional[str] None, "BYagKQXI"
poster.flag Optional[str] None, "United States", "Canada"

Files

The table below corresponds to the pychan.models.File class.

Field Type Example Value(s)
file.url str "https://i.4cdn.org/pol/1658892700380132.jpg"
file.name str "wojak.jpg", "i feel alone.jpg"
file.size str "601 KB"
file.dimensions tuple[int, int] (1920, 1080), (800, 600)

Contributing

See CONTRIBUTING.md for developer-oriented information.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pychan-0.4.1.tar.gz (284.0 kB view details)

Uploaded Source

Built Distribution

pychan-0.4.1-py3-none-any.whl (25.1 kB view details)

Uploaded Python 3

File details

Details for the file pychan-0.4.1.tar.gz.

File metadata

  • Download URL: pychan-0.4.1.tar.gz
  • Upload date:
  • Size: 284.0 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for pychan-0.4.1.tar.gz
Algorithm Hash digest
SHA256 68fca9c30a9b9bd87335e8d437edc698c5d924845c380948b130ba563d096283
MD5 1fbceee32ee3d4b0495b045d50cebeed
BLAKE2b-256 c058aacfddf8a1d675300e03d1fa401d35c5b567798d71c1f3b662f996f35790

See more details on using hashes here.

Provenance

File details

Details for the file pychan-0.4.1-py3-none-any.whl.

File metadata

  • Download URL: pychan-0.4.1-py3-none-any.whl
  • Upload date:
  • Size: 25.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/4.0.1 CPython/3.9.15

File hashes

Hashes for pychan-0.4.1-py3-none-any.whl
Algorithm Hash digest
SHA256 648e99c9e098499cb97cd86b90b5e284f6f49815f3d87a6f65949ef25db38888
MD5 05ce8f8a92fe85f7ac7358c5599fbf55
BLAKE2b-256 96ea538afffa110bb201ab722005282314c4eb6e86547bb2d6c7195f22905aea

See more details on using hashes here.

Provenance

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page