A simple web crawling framework.
Project description
simple Spider
=============
| |python -> 3.4+|
| |coverage -> 37%|
| |build -> passing|
::
_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|
`中文 <./Readme-zh.md>`__
Overview
--------
A simple web crawling
framework.\ `Document <https://duiliuliu.github.io/simple-spiders/>`__
Getting Started
---------------
``pip install sspider``
You should construst project.py to suit your needs
::
>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
...
>>> # 保存爬取结果
>>> spider.write('test.txt)
``python project.py``
``Ctrl-C to stop``
Referenced Libraries
--------------------
- Using `requests <https://github.com/requests/requests>`__ as
htmlDownloader
- Using `lxml <https://github.com/lxml/lxml>`__ as default htmlParser
- Using `csv <http://www.python-csv.org>`__ provide feature that export
file as csv type
- Using `xlwt <http://www.python-excel.org/>`__ provide feature that
export file as excel type
- Using `xlsxwriter <https://xlsxwriter.readthedocs.io>`__ provide
feature that export file as xexcel type
License
-------
This project is published open source under [|license|\ ] agreement.
Please maintain the open source release after modification and sign the
name of the original author. Thank you for your respect
If you need to apply this project for commercial purposes, please
contact me( `@pengr <https://github.com/duiliuliu>`__ ) separately to
obtain commercial authorization
.. |python -> 3.4+| image:: ./images/python-3.4+-green.svg
.. |coverage -> 37%| image:: https://img.shields.io/badge/coverage-37%25-yellowgreen.svg
.. |build -> passing| image:: ./images/build-passing-orange.svg
.. |license| image:: ./images/license-LGPL--3.0-orange.svg
=============
| |python -> 3.4+|
| |coverage -> 37%|
| |build -> passing|
::
_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|
`中文 <./Readme-zh.md>`__
Overview
--------
A simple web crawling
framework.\ `Document <https://duiliuliu.github.io/simple-spiders/>`__
Getting Started
---------------
``pip install sspider``
You should construst project.py to suit your needs
::
>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
...
>>> # 保存爬取结果
>>> spider.write('test.txt)
``python project.py``
``Ctrl-C to stop``
Referenced Libraries
--------------------
- Using `requests <https://github.com/requests/requests>`__ as
htmlDownloader
- Using `lxml <https://github.com/lxml/lxml>`__ as default htmlParser
- Using `csv <http://www.python-csv.org>`__ provide feature that export
file as csv type
- Using `xlwt <http://www.python-excel.org/>`__ provide feature that
export file as excel type
- Using `xlsxwriter <https://xlsxwriter.readthedocs.io>`__ provide
feature that export file as xexcel type
License
-------
This project is published open source under [|license|\ ] agreement.
Please maintain the open source release after modification and sign the
name of the original author. Thank you for your respect
If you need to apply this project for commercial purposes, please
contact me( `@pengr <https://github.com/duiliuliu>`__ ) separately to
obtain commercial authorization
.. |python -> 3.4+| image:: ./images/python-3.4+-green.svg
.. |coverage -> 37%| image:: https://img.shields.io/badge/coverage-37%25-yellowgreen.svg
.. |build -> passing| image:: ./images/build-passing-orange.svg
.. |license| image:: ./images/license-LGPL--3.0-orange.svg
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
sspider-1.1.2.tar.gz
(14.9 kB
view hashes)