Skip to main content

this is dvsnier sniffer.

Project description

Python NetWork Sniffer

Python Logo

一. 配置

1.1. 脚本配置

打开 .\scripts\crawler_script.py 脚本配置如下:

1.1.1. 嗅探单页

嗅探单页数据如下所示:

if __name__ == "__main__":
    '''主函数入口'''
    page_size = 1
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()

1.1.2. 嗅探多页

嗅探[1, 5) 页数据如下所示:

方式一:

if __name__ == "__main__":
    '''主函数入口'''
    page_size = 5
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_size).run()

方式二:

if __name__ == "__main__":
    '''主函数入口'''
    crawler = Crawler_Bbs_TianYa()
    # crawler.set_flag(False)
    crawler.set_range(page_start=1, page_stop=5).run()

1.2. 文件配置

文件配置列表如下:

# the version information
version_name = v0.0.1.dev1
version_code = 1
version_info = 0.0.1.dev1


# CRAWLER URL PREFIX
article-alias = 'crawler_alias'
sn-url-prefix = ['http://xxx.yyy.zzz/post-xxx-yyy-{}.shtml']


# REGION_INCLUSIVE_EXCLUSIVE = 0
# REGION_EXCLUSIVE_INCLUSIVE = 1
# REGION_EXCLUSIVE_EXCLUSIVE = 2
# REGION_INCLUSIVE_INCLUSIVE = 3
page-start = 1
page-stop = 0
page-flag = 0
# page-flag = 1
# page-flag = 2
# page-flag = 3
# False: first pull , second translate True: one pull after another translate
article-flag = True
# article-flag = False


# True: multi media resources are stored locally, otherwise they are not
# article-multi-media-persistence = True
# article-multi-media-persistence = False

# True: multi media resources are high quality, otherwise they are not
# article-multi-media-quality = True
# article-multi-media-quality = False


User-Agent = 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/512.36 (KHTML, like Gecko) Chrome/92.0.1235.131 Safari/277.36'
# User-Agent = 'User-Agent: Mozilla/5.0 (Macintosh; Intel Mac OS X 12_10_3) AppleWebKit/512.36 (KHTML, like Gecko) Chrome/92.0.1235.131 Safari/277.36'


# output-directory = "..."
# output-uuid-encryptor = True
# output-uuid-encryptor = False

说明如下:

  • version_name: 配置文件版本名称;
  • version_code: 配置文件版本号;
  • version_info: 配置文件版本信息;
  • article-alias: 文章别名(OPTIONAL), 生成的目录格式为 [bbs|json]_YYmmdd[_alias];
  • sn-url-prefix: 文章的地址格式, 一般格式为 http://xxx.yyy.zzz/post-xxx-yyy-{}.shtml;
  • page-start: 页面开始;
  • page-stop: 页面结束(OPTIONAL);
  • page-flag: 页面标记(OPTIONAL), 支持4 种类型, REGION_INCLUSIVE_EXCLUSIVE, REGION_EXCLUSIVE_INCLUSIVE, REGION_EXCLUSIVE_EXCLUSIVE, REGION_INCLUSIVE_INCLUSIVE;
  • article-flag: 文章数据流生成风格(OPTIONAL);
  • article-multi-media-persistence: 文章关联到的媒体资源持久化到本地(OPTIONAL, RECOMMENDED);
  • article-multi-media-quality: 文章关联到的媒体资源质量(OPTIONAL);
  • User-Agent: 文章请求的用户代理(OPTIONALRECOMMENDED);
  • output-directory: 文章输出到指定目录(OPTIONALRECOMMENDED);
  • output-uuid-encryptor: 文章输出 id 加密处理(OPTIONAL);

1.3. CLI 配置

$ dvs-sniffer -h
usage: dvs-sniffer [-h] [-V] [-amp] [-amq] [-a2 [article-alias]]
                   [-ad [article-describe]] [-af] [-rs [region-start]]
                   [-re [region-end]] [-rm [region-mask]] [-due]
                   [-ua [User-Agent]]
                   sn-url [destination-directory]

    this is a dvs network sniffer execution program.

    the sniffer destination url format must conform to the following continuous URLs:

        eg:

            1. http://bbs.xxx.cn/list-xyz-1.shtml
            2. http://bbs.xxx.cn/list-xyz-2.shtml
            3. http://bbs.xxx.cn/list-xyz-3.shtml
            4. ...
            5. http://bbs.xxx.cn/list-xyz-{}.shtml


positional arguments:
  sn-url                the sniffer destination url.
  destination-directory
                        the sniffer destination directory.

optional arguments:
  -h, --help            show this help message and exit
  -V, --version         the show version and exit.
  -amp, --article-multi-media-persistence
                        if True: multi media resources are stored locally,
                        otherwise they are not, and the default value is True.
  -amq, --article-multi-media-quality
                        if True: multi media resources are high quality,
                        otherwise they are not, and the default value is
                        False.
  -a2 [article-alias], --article-alias [article-alias]
                        a short text article alias of the sniffer to be.
  -ad [article-describe], --article-describe [article-describe]
                        a short text article description of the sniffer to be.
  -af, --article-flag   if False: first pull, second translate True: one pull
                        after another translate, and the default value is
                        True.
  -rs [region-start], --region-start [region-start]
                        a briefly describe the range start to be sniffed
                        mathematically.
  -re [region-end], --region-end [region-end]
                        a briefly describe the range end to be sniffed
                        mathematically.
  -rm [region-mask], --region-mask [region-mask]
                        The olfactory spatial range of the sniffer can only be
                        the following values: REGION_INCLUSIVE_EXCLUSIVE = 0,
                        REGION_EXCLUSIVE_INCLUSIVE = 1,
                        REGION_EXCLUSIVE_EXCLUSIVE = 2,
                        REGION_INCLUSIVE_INCLUSIVE = 3, and the default value
                        is REGION_INCLUSIVE_EXCLUSIVE.
  -due, --destination-uuid-encryptor
                        the sniffer destination uuid encryptor, and the
                        default value is True.
  -ua [User-Agent], --user-agent [User-Agent]
                        the user agent flag of set sniffer for default network
                        access, which is the macintosh system identifier by
                        default.

the copyright belongs to dvs that reserve the right of final interpretation.

二. 运行

2.1. 脚本运行

脚本运行如下:

# Windows
python ./scripts/crawler_script.py 
# Macintosh
python .\scripts\crawler_script.py 

2.1. CLI 运行

# Windows and Macintosh
# region: [1, 2)
dvs-sniffer -amq -rs 1 -re 2 http://xxx.yyy.zzz/post-1.html # the default destination directory
dvs-sniffer -amq -rs 1 -re 2 http://xxx.yyy.zzz/post-1.html \var\...\dvs-sniffer\ # the special destination directory

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

com.dvsnier.sniffer-0.0.1a2.dev2.tar.gz (41.3 kB view details)

Uploaded Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

com.dvsnier.sniffer-0.0.1a2.dev2-py2.py3-none-any.whl (58.9 kB view details)

Uploaded Python 2Python 3

File details

Details for the file com.dvsnier.sniffer-0.0.1a2.dev2.tar.gz.

File metadata

File hashes

Hashes for com.dvsnier.sniffer-0.0.1a2.dev2.tar.gz
Algorithm Hash digest
SHA256 e58e8bcd05472b3a3eca2f3a79fb4114b335adf62548caedbc3433a0d10cb4fb
MD5 17c9181595a5d5484bc8751e30e0387e
BLAKE2b-256 c32682ea3b9dc46bd14666105f78d5f149b2c66961623a89e25102bd2678d8b5

See more details on using hashes here.

File details

Details for the file com.dvsnier.sniffer-0.0.1a2.dev2-py2.py3-none-any.whl.

File metadata

File hashes

Hashes for com.dvsnier.sniffer-0.0.1a2.dev2-py2.py3-none-any.whl
Algorithm Hash digest
SHA256 27bcf06a6ce774119c037dad2688c96288ed4218a22543983a94f44fb0b01466
MD5 465dcd4b79862d6079f6ce024baf9de0
BLAKE2b-256 67c9e9833867c33dd4e53483034069a719db53a4e13436eef77b916ced76757c

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page