Skip to main content

this is dvsnier sniffer.

Project description

Python NetWork Sniffer

Python Logo

一. 配置

1.1. 脚本配置

打开 .\scripts\crawler_script.py 脚本配置如下:

1.1.1. 嗅探单页

嗅探单页数据如下所示:

if __name__ == "__main__":
    '''主函数入口'''
    page_size = 1
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()

1.1.2. 嗅探多页

嗅探[1, 5) 页数据如下所示:

方式一:

if __name__ == "__main__":
    '''主函数入口'''
    page_size = 5
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()

方式二:

if __name__ == "__main__":
    '''主函数入口'''
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_start=1, page_stop=5).run()

1.2. 文件配置

文件配置列表如下:

# the version information
version_name = v0.0.1.dev1
version_code = 1
version_info = 0.0.1.dev1


# CRAWLER URL PREFIX
article-alias = 'crawler_alias'
sn-url-prefix = ['http://xxx.yyy.zzz/post-xxx-yyy-{}.shtml']


# REGION_INCLUSIVE_EXCLUSIVE = 0
# REGION_EXCLUSIVE_INCLUSIVE = 1
# REGION_EXCLUSIVE_EXCLUSIVE = 2
# REGION_INCLUSIVE_INCLUSIVE = 3
page-start = 1
page-stop = 0
page-flag = 0
# page-flag = 1
# page-flag = 2
# page-flag = 3
# False: first pull , second translate True: one pull after another translate
article-flag = True
# article-flag = False


# True: multi media resources are stored locally, otherwise they are not
# article-multi-media-persistence = True
# article-multi-media-persistence = False

# True: multi media resources are high quality, otherwise they are not
# article-multi-media-quality = True
# article-multi-media-quality = False


User-Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/512.36 (KHTML, like Gecko) Chrome/92.0.1235.131 Safari/277.36'
# User-Agent = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 12_10_3) AppleWebKit/512.36 (KHTML, like Gecko) Chrome/92.0.1235.131 Safari/277.36'


# output-directory = "..."
# output-uuid-encryptor = True
# output-uuid-encryptor = False

说明如下:

  • version_name: 配置文件版本名称;
  • version_code: 配置文件版本号;
  • version_info: 配置文件版本信息;
  • article-alias: 文章别名(OPTIONAL), 生成的目录格式为 [bbs|json]_YYmmdd[_alias];
  • sn-url-prefix: 文章的地址格式, 一般格式为 http://xxx.yyy.zzz/post-xxx-yyy-{}.shtml;
  • page-start: 页面开始;
  • page-stop: 页面结束(OPTIONAL);
  • page-flag: 页面标记(OPTIONAL), 支持4 种类型, REGION_INCLUSIVE_EXCLUSIVE, REGION_EXCLUSIVE_INCLUSIVE, REGION_EXCLUSIVE_EXCLUSIVE, REGION_INCLUSIVE_INCLUSIVE;
  • article-flag: 文章数据流生成风格(OPTIONAL);
  • article-multi-media-persistence: 文章关联到的媒体资源持久化到本地(OPTIONAL, RECOMMENDED);
  • article-multi-media-quality: 文章关联到的媒体资源质量(OPTIONAL);
  • User-Agent: 文章请求的用户代理(OPTIONALRECOMMENDED);
  • output-directory: 文章输出到指定目录(OPTIONALRECOMMENDED);
  • output-uuid-encryptor: 文章输出 id 加密处理(OPTIONAL);

1.3. CLI 配置

$ dvs-sniffer -h
usage: dvs-sniffer [-h] [-V] [-amp] [-amq] [-a2 [article-alias]]
                   [-ad [article-describe]] [-af] [-rs [region-start]]
                   [-re [region-end]] [-rm [region-mask]] [-due]
                   [-ua [User-Agent]]
                   sn-url [destination-directory]

    this is a dvs network sniffer execution program.

    the sniffer destination url format must conform to the following continuous URLs:

        eg:

            1. http://bbs.xxx.cn/list-xyz-1.shtml
            2. http://bbs.xxx.cn/list-xyz-2.shtml
            3. http://bbs.xxx.cn/list-xyz-3.shtml
            4. ...
            5. http://bbs.xxx.cn/list-xyz-{}.shtml


positional arguments:
  sn-url                the sniffer destination url.
  destination-directory
                        the sniffer destination directory.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         the show version and exit.
  -amp, --article-multi-media-persistence
                        if True: multi media resources are stored locally,
                        otherwise they are not, and the default value is True.
  -amq, --article-multi-media-quality
                        if True: multi media resources are high quality,
                        otherwise they are not, and the default value is
                        False.
  -a2 [article-alias], --article-alias [article-alias]
                        a short text article alias of the sniffer to be.
  -ad [article-describe], --article-describe [article-describe]
                        a short text article description of the sniffer to be.
  -af, --article-flag   if False: first pull, second translate True: one pull
                        after another translate, and the default value is
                        True.
  -rs [region-start], --region-start [region-start]
                        a briefly describe the range start to be sniffed
                        mathematically.
  -re [region-end], --region-end [region-end]
                        a briefly describe the range end to be sniffed
                        mathematically.
  -rm [region-mask], --region-mask [region-mask]
                        The olfactory spatial range of the sniffer can only be
                        the following values: REGION_INCLUSIVE_EXCLUSIVE = 0,
                        REGION_EXCLUSIVE_INCLUSIVE = 1,
                        REGION_EXCLUSIVE_EXCLUSIVE = 2,
                        REGION_INCLUSIVE_INCLUSIVE = 3, and the default value
                        is REGION_INCLUSIVE_EXCLUSIVE.
  -due, --destination-uuid-encryptor
                        the sniffer destination uuid encryptor, and the
                        default value is True.
  -ua [User-Agent], --user-agent [User-Agent]
                        the user agent flag of set sniffer for default network
                        access, which is the macintosh system identifier by
                        default.

the copyright belongs to dvs that reserve the right of final interpretation.

二. 运行

2.1. 脚本运行

脚本运行如下:

# Windows
python ./scripts/crawler_script.py 
# Macintosh
python .\scripts\crawler_script.py 

2.1. CLI 运行

# Windows and Macintosh
# region: [1, 2)
dvs-sniffer -amq -rs 1 -re 2 http://xxx.yyy.zzz/post-1.html # the default destination directory
dvs-sniffer -amq -rs 1 -re 2 http://xxx.yyy.zzz/post-1.html \var\...\dvs-sniffer\ # the special destination directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

com.dvsnier.sniffer-0.0.1a2.dev2.tar.gz (41.3 kB view hashes)

Uploaded Source

Built Distribution

com.dvsnier.sniffer-0.0.1a2.dev2-py2.py3-none-any.whl (58.9 kB view hashes)

Uploaded Python 2 Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page