Skip to main content

A simple web crawling framework.

Project description

simple Spider
=============

| |python -> 3.4+|
| |coverage -> 37%|
| |build -> passing|

::

_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|


`中文 <./Readme-zh.md>`__

Overview
--------

A simple web crawling
framework.\ `Document <https://duiliuliu.github.io/simple-spiders/>`__

Getting Started
---------------

``pip install sspider``

You should construst project.py to suit your needs

::

>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
...
>>> # 保存爬取结果
>>> spider.write('test.txt)

``python project.py``

``Ctrl-C to stop``

Referenced Libraries
--------------------

- Using `requests <https://github.com/requests/requests>`__ as
htmlDownloader
- Using `lxml <https://github.com/lxml/lxml>`__ as default htmlParser
- Using `csv <http://www.python-csv.org>`__ provide feature that export
file as csv type
- Using `xlwt <http://www.python-excel.org/>`__ provide feature that
export file as excel type
- Using `xlsxwriter <https://xlsxwriter.readthedocs.io>`__ provide
feature that export file as xexcel type



License
-------

This project is published open source under [|license|\ ] agreement.
Please maintain the open source release after modification and sign the
name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please
contact me( `@pengr <https://github.com/duiliuliu>`__ ) separately to
obtain commercial authorization

.. |python -> 3.4+| image:: ./images/python-3.4+-green.svg
.. |coverage -> 37%| image:: https://img.shields.io/badge/coverage-37%25-yellowgreen.svg
.. |build -> passing| image:: ./images/build-passing-orange.svg
.. |license| image:: ./images/license-LGPL--3.0-orange.svg

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sspider-0.1.0.tar.gz (13.6 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page