Skip to main content

A simple web crawling framework.

Project description

python -> 3.4+
coverage -> 37%
build -> passing
     _                 _         _____       _     _
    (_)               | |       / ____|     (_)   | |
 ___ _ _ __ ___  _ __ | | ___  | (___  _ __  _  __| | ___ _ __
/ __| | '_ ` _ \| '_ \| |/ _ \  \___ \| '_ \| |/ _` |/ _ \ '__|
\__ \ | | | | | | |_) | |  __/  ____) | |_) | | (_| |  __/ |
|___/_|_| |_| |_| .__/|_|\___| |_____/| .__/|_|\__,_|\___|_|
                | |                   | |
                |_|                   |_|

中文

Overview

A simple web crawling framework.Document

Getting Started

pip install simple-spiders

You should construst project.py to suit your needs

from crawler.spider import Spider
from crawler.writter import DataWriter

spider = Spider(
    'https://movie.douban.com/subject/26810318/comments?start=0&limit=20&sort=new_score&status=P')
spider.start_crawl()

python project.py

Ctrl-C to stop

Referenced Libraries

  • Using requests as htmlDownloader

  • Using lxml as default htmlParser

  • Using csv provide feature that export file as csv type

  • Using xlwt provide feature that export file as excel type

  • Using xlsxwriter provide feature that export file as xexcel type

Usage

Project structure

- crawler/
    - __init__.py
    - test/
      - htmlDownloder_test
      - htmlParser_test
      - requestManager_test
      - writter_test
      - logger_test
      - spider_test

    - htmlDownloder
    - htmlParser
    - requestManager
    - writter
    - logger
    - spider

- main.py

License

This project is published open source under [license] agreement. Please maintain the open source release after modification and sign the name of the original author. Thank you for your respect

If you need to apply this project for commercial purposes, please contact me( @pengr ) separately to obtain commercial authorization

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

simple-spiders-0.1.5.tar.gz (11.4 kB view details)

Uploaded Source

File details

Details for the file simple-spiders-0.1.5.tar.gz.

File metadata

  • Download URL: simple-spiders-0.1.5.tar.gz
  • Upload date:
  • Size: 11.4 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: Python-urllib/3.7

File hashes

Hashes for simple-spiders-0.1.5.tar.gz
Algorithm Hash digest
SHA256 7a97d0fe9fb38f6b794cadf5ddf0f9f5a68e2115b631bd7718a70350a44c646f
MD5 9e25e1f529051c26f6f709676e0511b7
BLAKE2b-256 50665063469f4d0f3eadbfc5b25511feafb2b9e07c0da891cd0be9b90f3701cc

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Depot Continuous Integration Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page