PTT crawler using asyncio
Project description
AioPTTCrawler (PTT 網路版爬蟲)
This is Python Package use to crawl PTT's article data by using asyncio.
Documentation
PyPi Page
pip install AioPTTCrawler
from AioPTTCrawler import AioPTTCrawler
ptt_crawler = AioPTTCrawler()
Usage
get data from PTT
ptt_crawler = AioPTTCrawler()
BOARD = "Gossiping"
ptt_data = ptt_crawler.get_board_latest_articles(board=BOARD, page_count=10)
ptt_crawler = AioPTTCrawler()
BOARD = "Gossiping"
ptt_data = ptt_crawler.get_board_articles(board=BOARD, start_index=100, end_index=200)
ptt_data is a PTTData object. To extract data you need to use get_article_dict(), get_article_dataframe(), get_article_list() etc
get dict from PTTData
article_dict = ptt_data.get_article_dict()
comment_dict = ptt_data.get_comment_dict()
article's dict format
[
{
"article" : "Article's ID. ex:M.1663144920.A.A6E",
"article_title" : "Article's title. ex:[公告] 批踢踢27週年活動宣導公告更新",
"user_id" : "Author's ID. ex: ubcs",
"user_name" : "Author's name. ex:(覺★青年超冒險蓋)",
"board" : "BBS Board ex: Gossiping",
"datetime" : "Post time. ex: Wed Sep 14 16:41:58 2022.",
"context" : "Context of article. ex: PTT 27 周年活動開始囉,本篇為置底宣導,詳情參閱下面資料...",
"ip_address" : "IP address. ex: 59.120.192.119",
"comment_list" : [
{"comment_dict"},
{"comment_dict"},
]
}, {"..."}
]
comment's dict format
[
{
"article_id" : "Article's ID. ex:M.1663144920.A.A6E",
"tag" : "comment's reaction. ex: 推 噓 →",
"user_id" : "User's ID. ex: bill403777",
"comment_order" : "order of comment. ex: 1",
"context" : "Context of comment. ex: 錢",
"datetime" : "Post time. ex: 09/14 16:42",
"ip_address" : "27.53.96.42",
}, {"..."}
]
use this article for example
Comparison
Used time difference between normal method and async method
(unit: second)
Support
You may report bugs, ask for help and discuss various other issues on the issuse
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
AioPTTCrawler-0.0.12.tar.gz
(12.5 kB
view details)
Built Distribution
File details
Details for the file AioPTTCrawler-0.0.12.tar.gz
.
File metadata
- Download URL: AioPTTCrawler-0.0.12.tar.gz
- Upload date:
- Size: 12.5 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 8d5bdfbd34c4afa0b1268776b1959fb1520a95ecf5bb9d587380871461e1a506 |
|
MD5 | 60b8d2a9ddb0d4b9bc8a90119aa4c88d |
|
BLAKE2b-256 | 4c6b5d52e90fbeaf578a4623ff664c44bece76107aa876a2ac1c8ca79b375542 |
File details
Details for the file AioPTTCrawler-0.0.12-py3-none-any.whl
.
File metadata
- Download URL: AioPTTCrawler-0.0.12-py3-none-any.whl
- Upload date:
- Size: 15.1 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/4.0.1 CPython/3.10.5
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 02c5267d23ce245c689bd23d0b367cb19ce7391060029a1bb9fb2fa046f069a6 |
|
MD5 | 2e333b92a9e928af6669901336f506da |
|
BLAKE2b-256 | 474e2547ae8806dda799f39b971f22a053e7080054ce2e3d321209f089c65a10 |