Skip to main content

A simple python library to extract pages and posts information from https://www.ptt.cc/bbs/ to json format

Project description

Ptt2Json

A simple python library to extract pages and posts information from https://www.ptt.cc/bbs/ to json format

>>> from ptt2json import *
>>> ptt = PttPage(boardname="Gossiping")
>>> print(ptt.posts)

[{'url': '/bbs/Gossiping/M.1560591164.A.B9C.html',
  'post_id': 'M.1560591164.A.B9C',
  'timestamp': '1560591164',
  'title': '[新聞] 暴動!財經女神訪歐曬日光浴 白皙長腿惹',
  'nrec': '',
  'author': 'cycling',
  'mark': ''},
 {'url': '/bbs/Gossiping/M.1560591174.A.B05.html',
  'post_id': 'M.1560591174.A.B05',
  'timestamp': '1560591174',
  'title': '[新聞] 韓國瑜造勢到底多少人? 椅子精算師四叉貓算給你',
  'nrec': '',
  'author': 'sweat992001',
  'mark': ''},
 {'url': '/bbs/Gossiping/M.1560591182.A.50D.html',
  'post_id': 'M.1560591182.A.50D',
  'timestamp': '1560591182',
  'title': 'Re: [新聞] 大烏龍!攝影師砸30萬修MacBook 最後發現',
  'nrec': '',
  'author': 'YHOTV4096',
  'mark': ''},
  ...]

PttPage

[
	{
		"url": str,
		"post_id": str,
		"timestamp": str,  # unix time
		"title": str,      
		"nrec": str,       # 推噓文相加總和
		"author": str,
		"mark":            # 標記
	},
	...
]

PttPost

{
	"article_id": str,
    "article_title": str,
    "author": str,
    "board": str,
    "content": str,
    "timestamp": int,
    "ip": str,           # ipv4 address 
    "ip_country": str,   # ip <-> country mapping
    "message_count": {
         "all": str,     # 推、噓、箭頭總數
         "boo": str,     # 噓文
         "count": str,   # 推 - 噓文
         "neutral": str, # 箭頭
         "push": str,    # 推文
    },
    "messages": [
    	{
    		"push_tag": str, # 評論符號
    		"push_userid": str,
    		"push_content": str,
    		"push_ipdatetime # ip 與時間(無日期)
    	}
    ],
    "url": str,
    "is_404": 是否刪文,
}

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

ptt2json-0.1.1.tar.gz (4.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

ptt2json-0.1.1-py3-none-any.whl (4.1 kB view details)

Uploaded Python 3

File details

Details for the file ptt2json-0.1.1.tar.gz.

File metadata

  • Download URL: ptt2json-0.1.1.tar.gz
  • Upload date:
  • Size: 4.3 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/5.1.8-200.fc29.x86_64

File hashes

Hashes for ptt2json-0.1.1.tar.gz
Algorithm Hash digest
SHA256 bd6d4c856fbc71b103122134650f1962cf6a1dea22360f5e8e9f30cec53e51a4
MD5 25d44f4782f25916dbf736922c7963fc
BLAKE2b-256 529cf2061efa8b24b8c27f2a6edf97989a2c684bc20b386ec7f35acef6347aca

See more details on using hashes here.

File details

Details for the file ptt2json-0.1.1-py3-none-any.whl.

File metadata

  • Download URL: ptt2json-0.1.1-py3-none-any.whl
  • Upload date:
  • Size: 4.1 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/5.1.8-200.fc29.x86_64

File hashes

Hashes for ptt2json-0.1.1-py3-none-any.whl
Algorithm Hash digest
SHA256 4108b6af536f8cd5206189441c534f40f496034d891012f64c1fd0a3e2edd26a
MD5 085ff09cade37587943f405f1e85141b
BLAKE2b-256 b1e20d8d4e13397db8e04fb9c60d9a8b35d086c12b6deba740c00a1b74afd2d6

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page