pyspider·PyPI

A Powerful Spider System in Python

These details have not been verified by PyPI

Project links

Homepage

Project description

pyspider [![Build Status]][Travis CI] [![Coverage Status]][Coverage] [![Try]][Demo]
========

A Powerful Spider(Web Crawler) System in Python. **[TRY IT NOW!][Demo]**

- Write script in Python
- Powerful WebUI with script editor, task monitor, project manager and result viewer
- [MySQL](https://www.mysql.com/), [MongoDB](https://www.mongodb.org/), [Redis](http://redis.io/), [SQLite](https://www.sqlite.org/), [Elasticsearch](https://www.elastic.co/products/elasticsearch); [PostgreSQL](http://www.postgresql.org/) with [SQLAlchemy](http://www.sqlalchemy.org/) as database backend
- [RabbitMQ](http://www.rabbitmq.com/), [Beanstalk](http://kr.github.com/beanstalkd/), [Redis](http://redis.io/) and [Kombu](http://kombu.readthedocs.org/) as message queue
- Task priority, retry, periodical, recrawl by age, etc...
- Distributed architecture, Crawl Javascript pages, Python 2.{6,7}, 3.{3,4,5,6} support, etc...

Tutorial: [http://docs.pyspider.org/en/latest/tutorial/](http://docs.pyspider.org/en/latest/tutorial/)
Documentation: [http://docs.pyspider.org/](http://docs.pyspider.org/)
Release notes: [https://github.com/binux/pyspider/releases](https://github.com/binux/pyspider/releases)

Sample Code
-----------

```python
from pyspider.libs.base_handler import *

class Handler(BaseHandler):
crawl_config = {
}

@every(minutes=24 * 60)
def on_start(self):
self.crawl('http://scrapy.org/', callback=self.index_page)

@config(age=10 * 24 * 60 * 60)
def index_page(self, response):
for each in response.doc('a[href^="http"]').items():
self.crawl(each.attr.href, callback=self.detail_page)

def detail_page(self, response):
return {
"url": response.url,
"title": response.doc('title').text(),
}
```

[![Demo][Demo Img]][Demo]

Installation
------------

* `pip install pyspider`
* run command `pyspider`, visit [http://localhost:5000/](http://localhost:5000/)

**WARNING:** WebUI is open to the public by default, it can be used to execute any command which may harm your system. Please use it in an internal network or [enable `need-auth` for webui](http://docs.pyspider.org/en/latest/Command-Line/#-config).

Quickstart: [http://docs.pyspider.org/en/latest/Quickstart/](http://docs.pyspider.org/en/latest/Quickstart/)

Contribute
----------

* Use It
* Open [Issue], send PR
* [User Group]
* [中文问答](http://segmentfault.com/t/pyspider)

TODO
----

### v0.4.0

- [ ] a visual scraping interface like [portia](https://github.com/scrapinghub/portia)

License
-------
Licensed under the Apache License, Version 2.0

[Build Status]: https://img.shields.io/travis/binux/pyspider/master.svg?style=flat
[Travis CI]: https://travis-ci.org/binux/pyspider
[Coverage Status]: https://img.shields.io/coveralls/binux/pyspider.svg?branch=master&style=flat
[Coverage]: https://coveralls.io/r/binux/pyspider
[Try]: https://img.shields.io/badge/try-pyspider-blue.svg?style=flat
[Demo]: http://demo.pyspider.org/
[Demo Img]: https://github.com/binux/pyspider/blob/master/docs/imgs/demo.png
[Issue]: https://github.com/binux/pyspider/issues
[User Group]: https://groups.google.com/group/pyspider-users

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.3.10

Apr 18, 2018

0.3.9

Mar 18, 2017

0.3.8

Aug 18, 2016

0.3.7

Apr 20, 2016

0.3.6

Nov 10, 2015

0.3.5

May 22, 2015

0.3.4

Apr 21, 2015

0.3.3

Mar 8, 2015

0.3.2

Feb 11, 2015

0.3.1

Jan 22, 2015

0.3.0

Jan 11, 2015

0.3.0b1 pre-release

Jan 6, 2015

0.3.0a1 pre-release

Dec 27, 2014

0.3.0.dev7 pre-release

Dec 24, 2014

0.3.0.dev6 pre-release

Dec 5, 2014

0.3.0.dev5 pre-release

Nov 24, 2014

0.3.0.dev4 pre-release

Nov 24, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pyspider-0.3.10.tar.gz (110.9 kB view details)

Uploaded Apr 18, 2018 Source

File details

Details for the file pyspider-0.3.10.tar.gz.

File metadata

Download URL: pyspider-0.3.10.tar.gz
Upload date: Apr 18, 2018
Size: 110.9 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for pyspider-0.3.10.tar.gz
Algorithm	Hash digest
SHA256	`0148e5db79e743c8a096d260ae279e77cf4bf28736b593702ef006b9bb471cbb`
MD5	`bc064430ba86f117cb3b418f4b6e9f2b`
BLAKE2b-256	`d097d6062c928f53d899ff2a8538fed11d4d425ba3d27c96248a2c601c1c9fef`

See more details on using hashes here.

pyspider 0.3.10

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes