crawl-me

This is a plugin based web picture crawler. Now pixiv/gamersky plugin available!

These details have not been verified by PyPI

Project links

Homepage

Project description

crawl-me

Crawl-me is a light-weight fast plugin based web picture crawler. You can download your favorite pictures via the plugin if the website is supported. For now, the plugins include gamersky and pixiv. If you want to contribute, please just feel free to contact with me.

Fork me on Github :) https://github.com/nyankosama/crawl-me

Features

Crawl-me core supports muti-thread downloading using http range-headers, so it’s very fast.
It’s plugin based, so you can free add any plugin you want.

Available plugins

pixiv : This plugin allows you to download any author’s paintings in pixiv site.
gamersky : This plugin supports downloading all pictures in special topic from gamersky site.

Installation

install via pip

Make sure you have already installed python2.7 and pip.

Due to the fact that package relies on lxml, if your platform is linux, please make sure you have installed lib libxslt-devel libxml2-devel. And for windows please select a suitable lxml installer to install.

And then:

$ pip install crawl-me

For windows, please add {$python-home}/Scripts/ to systempath

install via git

1. Ubuntu

Install the prerequisite library first:

sudo apt-get install libxml2-dev
sudo apt-get install libxslt1-dev

And then you should install setuptools in order to run the setup.py file

sudo apt-get install python-setuptools

Finally, git clone the source, and install:

$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install

2. Windows

Make sure you have already installed python2.7 and pip

You can install python2.7 via windows installer. You can install pip via downloading the get-pip.py, and run it via python:

python get-pip.py

And then install the prerequisite library lxml. please select a suitable lxml installer to install.

Finally git clone the source, and install:

$ git clone https://github.com/nyankosama/crawl-me.git
$ cd crawl-me/
$ sudo python setup.py install

For windows, please add {$python-home}/Scripts/ to systempath

Usage

Examples

Download 10 pages pictures at the url of http://www.gamersky.com/ent/201404/352055.shtml in gamersky site, and store the pictures into local direcotry.
```
crawl-me gamersky http://www.gamersky.com/ent/201404/352055.shtml ./gamersky-crawl 1 10
```
Download all the paintings of 藤原(Fujiwara, Pixiv ID=27517), and store them into local directory.
```
crawl-me pixiv 27517 ./pixiv-crawl <your pixiv loginid> <your password>
```

Command line options

general help

$ crawl-me -h

usage: crawl-me [-h] plugin

positional arguments:
    plugin      plugin the crawler uses

optional arguments:
    -h, --help  show this help message and exit

available plugins:
----gamersky
----pixiv

gamersky

$ crawl-me gamersky -h

usage: crawl-me [-h] plugin url savePath beginPage endPage

positional arguments:
    plugin      plugin the crawler uses
    url         your url to crawl
    savePath    the path where the imgs ars saved
    beginPage   the page where we start crawling
    endPage     the page where we end crawling

optional arguments:
    -h, --help  show this help message and exit

pixiv

$ crawl-me pixiv -h

usage: crawl-me [-h] plugin authorId savePath pixivId password

positional arguments:
    plugin      plugin the crawler uses
    authorId    the author id you want to crawl
    savePath    the path where the imgs ars saved
    pixivId     your pixiv login id
    password    your pixiv login password

optional arguments:
    -h, --help  show this help message and exit

TODO

Functions:
- support breakpoint resume
Plugins:
- weibo
- qq zone

Licenses

MIT

ChangeLog

0.1.9dev-20140617-1

Date: 2014-06-17

add the projconf.py into crawl_me package
bug fix: pixiv plugin gets page size <= 9

0.1.8

Date: 2014-06-15

add English README

0.1.8dev-20140615

Date: 2014-06-15

bug fix:-v –version option load project.json fail

0.1.8dev-20140612

Date: 2014-06-12

add -v –version option for main runnable file to show the package version

0.1.7

Date: 2014-06-11

add the http range header support auto-check

0.1.6

Date: 2014-06-11

bug fix: terminal without colour doesnt display syslog prefix

0.1.5

Date: 2014-06-11

bug fix:pip install bug in windows platform

0.1.5dev-20140611

Date: 2014-06-11

bug fix:pypi data_files

0.1.4

Date: 2014-06-11

the latest release

0.1.4dev-20140611

Date: 2014-06-11

modify the README.md. Now it is consistent with rst format to display on pypi

0.1.4dev-20140610

Date: 2014-06-10

add support for installing from pip

0.1.4dev3

Date: 2014-06-10

bug fix
fix the binary write problem in windows platform

0.1.4dev2

Date: 2014-06-10

add setuptools install support

0.1.4dev1

Date: 2014-06-09

bug fix
rangedownloader:http range-headers may not be supported

0.1.3

Date: 2014-06-07

do some refactory
add conf dictionary

0.1.2

Date: 2014-06-06

add plugin
pixiv

0.1.1

Date: 2014-06-05

add plugin
gamersky

0.0.1

Date: 2014-06-05

init the project

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

0.1.9dev-20140619-2 pre-release

Jun 19, 2014

0.1.9dev-20140619-1 pre-release

Jun 19, 2014

This version

0.1.9dev-20140617-1 pre-release

Jun 17, 2014

0.1.8

Jun 15, 2014

0.1.8dev-20140615-1 pre-release

Jun 15, 2014

0.1.8dev-20140615 pre-release

Jun 15, 2014

0.1.8dev-20140612 pre-release

Jun 12, 2014

0.1.7

Jun 11, 2014

0.1.6

Jun 11, 2014

0.1.5

Jun 11, 2014

0.1.5dev-20140611-1 pre-release

Jun 11, 2014

0.1.5dev-20140611 pre-release

Jun 10, 2014

0.1.4

Jun 10, 2014

0.1.4a3 pre-release

Jun 10, 2014

0.1.4a2 pre-release

Jun 10, 2014

0.1.4a1 pre-release

Jun 10, 2014

0.1.4dev-20140611 pre-release

Jun 10, 2014

0.1.4dev-20140610 pre-release

Jun 10, 2014

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

crawl-me-0.1.9dev-20140617-1.tar.gz (12.2 kB view details)

Uploaded Jun 17, 2014 Source

File details

Details for the file crawl-me-0.1.9dev-20140617-1.tar.gz.

File metadata

Download URL: crawl-me-0.1.9dev-20140617-1.tar.gz
Upload date: Jun 17, 2014
Size: 12.2 kB
Tags: Source
Uploaded using Trusted Publishing? No

File hashes

Hashes for crawl-me-0.1.9dev-20140617-1.tar.gz
Algorithm	Hash digest
SHA256	`5ba607bfc10dd6e55d3c706af2b676b63fb8f4aaa5cc09407b27a32102369520`
MD5	`5a4cba468ee0b3656222aac6e9b86d0d`
BLAKE2b-256	`af60796e4352dc7b6f2f22c128cfea58ef5e02dc3b032d078d5c3bebbe483fac`

See more details on using hashes here.

crawl-me 0.1.9dev-20140617-1

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

crawl-me

Features

Available plugins

Installation

install via pip

install via git

1. Ubuntu

2. Windows

Usage

Examples

Command line options

TODO

Licenses

ChangeLog

0.1.9dev-20140617-1

0.1.8

0.1.8dev-20140615

0.1.8dev-20140612

0.1.7

0.1.6

0.1.5

0.1.5dev-20140611

0.1.4

0.1.4dev-20140611

0.1.4dev-20140610

0.1.4dev3

0.1.4dev2

0.1.4dev1

0.1.3

0.1.2

0.1.1

0.0.1

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes