Skip to main content

A simple web crawling framework.

Project description

simple Spider
=============

| |python -> 3.4+|
| |coverage -> 37%|
| |build -> passing|

::

_ _ ___ _ _
___<_>._ _ _ ___ | | ___ / __> ___ <_> _| | ___ _ _
<_-<| || ' ' || . \| |/ ._>\__ \| . \| |/ . |/ ._>| '_>
/__/|_||_|_|_|| _/|_|\___.<___/| _/|_|\___|\___.|_|
|_| |_|


`中文 <./Readme-zh.md>`__

Overview
--------

A simple web crawling
framework.\ `Document <https://duiliuliu.github.io/simple-spiders/>`__

Getting Started
---------------

``pip install sspider``

You should construst project.py to suit your needs

::

>>> from sspider import Spider, Request
>>> # 建立request对象
>>> request = Request('get', 'https://movie.douban.com/subject/27202819/reviews')
>>> # 建立爬虫对象
>>> spider = Spider()
>>> # 运行爬虫
>>> spider.run(request)
...
>>> # 保存爬取结果
>>> spider.write('test.txt)

``python project.py``

``Ctrl-C to stop``

Referenced Libraries
--------------------

- Using `requests <https://github.com/requests/requests>`__ as
htmlDownloader
- Using `lxml <https://github.com/lxml/lxml>`__ as default htmlParser
- Using `csv <http://www.python-csv.org>`__ provide feature that export
file as csv type
- Using `xlwt <http://www.python-excel.org/>`__ provide feature that
export file as excel type
- Using `xlsxwriter <https://xlsxwriter.readthedocs.io>`__ provide
feature that export file as xexcel type



License
-------

This project is published open source under [|license|\ ] agreement.
Please maintain the open source release after modification and sign the
name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please
contact me( `@pengr <https://github.com/duiliuliu>`__ ) separately to
obtain commercial authorization

.. |python -> 3.4+| image:: ./images/python-3.4+-green.svg
.. |coverage -> 37%| image:: https://img.shields.io/badge/coverage-37%25-yellowgreen.svg
.. |build -> passing| image:: ./images/build-passing-orange.svg
.. |license| image:: ./images/license-LGPL--3.0-orange.svg

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sspider-1.0.2.tar.gz (14.7 kB view details)

Uploaded Source

File details

Details for the file sspider-1.0.2.tar.gz.

File metadata

  • Download URL: sspider-1.0.2.tar.gz
  • Upload date:
  • Size: 14.7 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for sspider-1.0.2.tar.gz
Algorithm Hash digest
SHA256 756303bf66801097b45a40e4aa6cfb80ce2ca864749c79f4ef9f5b19edd50dae
MD5 cd9d8511a4f65dd37d02a85437521876
BLAKE2b-256 d53e4ef84bc38db5af65788b920cc13b5fe51e6303869c4f6130f75b0a31566f

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page