A web spider for Chinese graduate student examination catalogue.

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Project description

yzw

scrapy爬取研招网研究生考试专业信息

含有考试范围、专业等，可输出到Excel或MySQL。

发布页有爬取好的数据，可用excel或mysql直接查看

数据大概这个样子，获得数据之后我们就能方便地进行筛选了。

安装：

可直接使用pip自主安装

pip install --upgrade yzwspider
python -m yzwspider

或者clone到本地使用

git clone https://github.com/Hthing/yzw.git
cd yzw
pip install -r requirements.txt
python -m yzwspider

运行环境：

python3.7以上

使用方法

需提前建立数据库

省市代码，学科门类，一级学科代码（学科类别）可在研招网查得。例，计算机科学与技术：0812

https://yz.chsi.com.cn/zsml/queryAction.do

python -m yzwspider [-h] [-ssdm] [-mldm] [-yjxk] [--all] [--log] 输出目标 [其他参数]

yzwspider参数：（括号内为默认值）

-ssdm：省市代码(11) 支持中文名即北京、上海等, 0表示全国

-mldm： 门类代码(01) 支持中文名：理学、工学等

-yjxk: 一级学科代码(0101)

**--all：**爬取全部专业信息并只可输出到mysql

--log： 保存日志文件至当前目录

命令 "excel" 参数：

-o： .xls文件输出路径，默认为当前目录

命令 "mysql" 参数：

**-host：**主机地址(localhost)

**-port：**端口号(3306)

-u： 用户名(root)

-p： 密码('')

-db： 数据库名(yanzhao)

**-table：**数据表名（major）

例如，我们想获取北京市(11)的计算机科学与技术专业(0812)并输出为excel文件

 python -m yzwspider -ssdm 11 -yjxk 0812 excel

上条语句可将"-ssdm 11"替换为"-ssdm 北京"同样生效。

最终将会得到如下的信息

2019-12-04 15:13:57 [scrapy.core.engine] INFO: Closing spider (finished)
2019-12-04 15:13:57 [YzwPipeline] INFO: excel文件已存储于 /home/研招网专业信息.xls
2019-12-04 15:13:57 [yzwspider.yzw.collector] INFO: 数据抓取完成, 共计 691 条数据，
                    程序开始时间 2019-12-04 15:13:44 , 结束时间 2019-12-04 15:13:57, 耗时 0 分钟
2019-12-04 15:13:57 [scrapy.core.engine] INFO: Spider closed (finished)

若输出至mysql（默认参数可以不填）

python -m yzwspider -ssdm 11 -yjxk 0812 mysql -u root -p **** -host ******* -table test

爬取全部数据(仅支持mysql)

python -m yzwspider --all mysql -u root -p **** -host ******* -port *** -db 数据库名 -table 数据表名

爬取全国某专业

python -m yzwspider  -ssdm 0 -yjxk 0812 mysql -u *** -p ***

输出信息类似于excel. 如果想保存日志则加上--log即可

爬取页面

爬取的页面如下，另外每行数据的id由页面的id以及考试范围顺序组成

Project details

These details have not been verified by PyPI

Project links

Homepage

License
- OSI Approved :: MIT License
Operating System
- OS Independent
Programming Language
- Python :: 3

Release history Release notifications | RSS feed

This version

0.1.6.3

Dec 15, 2023

0.1.6.2

Sep 19, 2023

0.1.6.1

Sep 19, 2023

0.1.5.2

Sep 20, 2021

0.1.5.1

Sep 19, 2021

0.1.5.0

Sep 19, 2021

0.1.4.4

Feb 23, 2021

0.1.4.2

Apr 20, 2020

0.1.4.1

Apr 15, 2020

0.1.4

Dec 16, 2019

0.1.3

Dec 4, 2019

0.1.2

Dec 4, 2019

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

yzwspider-0.1.6.3.tar.gz (14.5 kB view details)

Uploaded Dec 15, 2023 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

yzwspider-0.1.6.3-py3-none-any.whl (16.3 kB view details)

Uploaded Dec 15, 2023 Python 3

File details

Details for the file yzwspider-0.1.6.3.tar.gz.

File metadata

Download URL: yzwspider-0.1.6.3.tar.gz
Upload date: Dec 15, 2023
Size: 14.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.12

File hashes

Hashes for yzwspider-0.1.6.3.tar.gz
Algorithm	Hash digest
SHA256	`1d54bd01c011191702bc0610784faff886361d78a64397516df03a074779a426`
MD5	`ee78e89de3478056da8cfc3cb3aa8f7f`
BLAKE2b-256	`27d34bc8c24e7da616b7ba1d01bd10a9f10eb9300737ccf119b8f630979d5ab4`

See more details on using hashes here.

File details

Details for the file yzwspider-0.1.6.3-py3-none-any.whl.

File metadata

Download URL: yzwspider-0.1.6.3-py3-none-any.whl
Upload date: Dec 15, 2023
Size: 16.3 kB
Tags: Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/4.0.2 CPython/3.8.12

File hashes

Hashes for yzwspider-0.1.6.3-py3-none-any.whl
Algorithm	Hash digest
SHA256	`239755c0d20c06ff35e4fa6a69ab1163f2a1e6f4a4128d54b031dce03a946c6f`
MD5	`fae5ef2cea3b592127cbde3910c7f9af`
BLAKE2b-256	`806abd87ab091f377adbe408d15dbed2abc21ff59895bf1369b365ab66cc4480`

See more details on using hashes here.

yzwspider 0.1.6.3

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

yzw

安装：

运行环境：

使用方法

爬取页面

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes