简易、强大的推特（Twitter）采集程序,支持用户,发文,评论等采集

Project description

easy_twitter_crawler

推特（Twitter）采集程序，支持用户，发文，评论采集，希望能为使用者带来益处。如果您也想贡献好的代码片段，请将代码以及描述，通过邮箱（ xinkonghan@gmail.com ）发送给我。代码格式是遵循自我主观，如存在不足敬请指出！

安装

pip install easy_twitter_crawler

主要功能

search_crawler 关键词搜索采集（支持热门,用户,最新,视频,照片;支持条件过滤）
user_crawler 用户采集（支持用户信息,用户发文,用户回复）
common_crawler 通用采集（支持发文,评论）

简单使用

设置代理及cookie

proxy = {
    'http': 'http://127.0.0.1:10808',
    'https': 'http://127.0.0.1:10808'
}
cookie = 'auth_token=686fa28f49400698820d0a3c344c51e3e44af73a; ct0=5bed99b7faad9dcc742eda564ddbcf37777f8794abd6d4d736919234440be2172da1e9a9fc48bb068db1951d1748ba5467db2bc3e768f122794265da0a9fa6135b4ef40763e7fd91f730d0bb1298136b'

关键词采集使用案例（对关键词指定条件采集10条数据）

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, search_crawler, TwitterFilter

key_word = 'elonmusk'

twitter_filter = TwitterFilter(key_word)
twitter_filter.word_category(lang='en')
twitter_filter.account_category(filter_from='', to='', at='')
twitter_filter.filter_category(only_replies=None, only_links=None, exclude_replies=None, exclude_links=None)
twitter_filter.interact_category(min_replies='', min_faves='', min_retweets='')
twitter_filter.date_category(since='', until='')
key_word = twitter_filter.filter_join()

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in search_crawler(
        key_word,
        data_type='Top',
        count=10,
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

用户信息采集使用案例（采集该用户信息及10条文章，10条回复，10个粉丝信息，10个关注信息）

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, user_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in user_crawler(
        'elonmusk',
        article_count=10,
        reply_count=10,
        following_count=10,
        followers_count=10,
        # start_time='2023-07-20 00:00:00',
        # end_time='2023-07-27 00:00:00',
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))
    print(f"文章数：{len(info.get('article', []))}")
    print(f"粉丝数：{len(info.get('followers', []))}")
    print(f"关注数：{len(info.get('following', []))}")
    print(f"回复数：{len(info.get('reply', []))}")

通用采集使用案例（已知文章id，采集此文章信息）

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, common_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in common_crawler(
        '1684447438864785409',
        data_type='article',
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

通用采集使用案例（已知文章id，采集此文章下10条评论）

from easy_spider_tool import cookie_to_dic, format_json
from easy_twitter_crawler import set_proxy, set_cookie, common_crawler

set_proxy(proxy)
set_cookie(cookie_to_dic(cookie))

for info in common_crawler(
        '1684447438864785409',
        data_type='comment',
        comment_count=10,
):
    set_proxy(proxy)
    set_cookie(cookie_to_dic(cookie))
    print(format_json(info))

链接

Github：https://github.com/hanxinkong/easy_twitter_crawler

在线文档：https://easy_twitter_crawler.xink.top/

贡献者

Project details

Release history Release notifications | RSS feed

1.0.4

Sep 9, 2023

1.0.3

Aug 12, 2023

1.0.2

Aug 12, 2023

1.0.1

Aug 12, 2023

This version

1.0.0

Aug 12, 2023

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

easy_twitter_crawler-1.0.0.tar.gz (21.3 kB view hashes)

Uploaded Aug 12, 2023 Source

Built Distribution

easy_twitter_crawler-1.0.0-py3-none-any.whl (24.9 kB view hashes)

Uploaded Aug 12, 2023 Python 3

Hashes for easy_twitter_crawler-1.0.0.tar.gz

Hashes for easy_twitter_crawler-1.0.0.tar.gz
Algorithm	Hash digest
SHA256	`3ee2514552dccfcebdc4a35da8aa6f664a02b321c9b3d772646813748401ba36`
MD5	`0ce3eaf84ca206174fccd11cb17f1250`
BLAKE2b-256	`bdd81fd553807e4be7e67b2303372e9da5d84569e4aa5e48cbc1f88912ebbad0`

Hashes for easy_twitter_crawler-1.0.0-py3-none-any.whl

Hashes for easy_twitter_crawler-1.0.0-py3-none-any.whl
Algorithm	Hash digest
SHA256	`9d674ebb7edc40944802f71badb79917b4d404172c52bbf58d49ad44d90b0e58`
MD5	`3319509e205e0c6953d26d59805bbf88`
BLAKE2b-256	`0b599bea96d19727060c026833df660a62f841a4e5d4144bcff0912adf502ce6`