A simple python library to extract pages and posts information from https://www.ptt.cc/bbs/ to json format
Project description
Ptt2Json
A simple python library to extract pages and posts information from https://www.ptt.cc/bbs/ to json format
>>> from ptt2json import *
>>> ptt = PttPage(boardname="Gossiping")
>>> print(ptt.posts)
[{'url': '/bbs/Gossiping/M.1560591164.A.B9C.html',
'post_id': 'M.1560591164.A.B9C',
'timestamp': '1560591164',
'title': '[新聞] 暴動!財經女神訪歐曬日光浴 白皙長腿惹',
'nrec': '',
'author': 'cycling',
'mark': ''},
{'url': '/bbs/Gossiping/M.1560591174.A.B05.html',
'post_id': 'M.1560591174.A.B05',
'timestamp': '1560591174',
'title': '[新聞] 韓國瑜造勢到底多少人? 椅子精算師四叉貓算給你',
'nrec': '',
'author': 'sweat992001',
'mark': ''},
{'url': '/bbs/Gossiping/M.1560591182.A.50D.html',
'post_id': 'M.1560591182.A.50D',
'timestamp': '1560591182',
'title': 'Re: [新聞] 大烏龍!攝影師砸30萬修MacBook 最後發現',
'nrec': '',
'author': 'YHOTV4096',
'mark': ''},
...]
PttPage
[
{
"url": str,
"post_id": str,
"timestamp": str, # unix time
"title": str,
"nrec": str, # 推噓文相加總和
"author": str,
"mark": # 標記
},
...
]
PttPost
{
"article_id": str,
"article_title": str,
"author": str,
"board": str,
"content": str,
"timestamp": int,
"ip": str, # ipv4 address
"ip_country": str, # ip <-> country mapping
"message_count": {
"all": str, # 推、噓、箭頭總數
"boo": str, # 噓文
"count": str, # 推 - 噓文
"neutral": str, # 箭頭
"push": str, # 推文
},
"messages": [
{
"push_tag": str, # 評論符號
"push_userid": str,
"push_content": str,
"push_ipdatetime # ip 與時間(無日期)
}
],
"url": str,
"is_404": 是否刪文,
}
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
ptt2json-0.1.1.tar.gz
(4.3 kB
view details)
Built Distribution
Filter files by name, interpreter, ABI, and platform.
If you're not sure about the file name format, learn more about wheel file names.
Copy a direct link to the current filters
File details
Details for the file ptt2json-0.1.1.tar.gz.
File metadata
- Download URL: ptt2json-0.1.1.tar.gz
- Upload date:
- Size: 4.3 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/5.1.8-200.fc29.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
bd6d4c856fbc71b103122134650f1962cf6a1dea22360f5e8e9f30cec53e51a4
|
|
| MD5 |
25d44f4782f25916dbf736922c7963fc
|
|
| BLAKE2b-256 |
529cf2061efa8b24b8c27f2a6edf97989a2c684bc20b386ec7f35acef6347aca
|
File details
Details for the file ptt2json-0.1.1-py3-none-any.whl.
File metadata
- Download URL: ptt2json-0.1.1-py3-none-any.whl
- Upload date:
- Size: 4.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: poetry/0.12.16 CPython/3.7.3 Linux/5.1.8-200.fc29.x86_64
File hashes
| Algorithm | Hash digest | |
|---|---|---|
| SHA256 |
4108b6af536f8cd5206189441c534f40f496034d891012f64c1fd0a3e2edd26a
|
|
| MD5 |
085ff09cade37587943f405f1e85141b
|
|
| BLAKE2b-256 |
b1e20d8d4e13397db8e04fb9c60d9a8b35d086c12b6deba740c00a1b74afd2d6
|