This is a plugin based web picture crawler. Now pixiv/gamersky plugin available!
Project description
crawl-me
Crawl-me is a light-weight fast plugin based web picture crawler. You can download your favorite pictures via the plugin if the website is supported. For now, the plugins include gamersky and pixiv. If you want to contribute, please just feel free to contact with me.
Fork me on Github :) https://github.com/nyankosama/crawl-me
Features
Crawl-me core supports muti-thread downloading using http range-headers, so it’s very fast.
It’s plugin based, so you can free add any plugin you want.
Available plugins
Installation
install via pip
Make sure you have already installed python2.7 and pip.
Due to the fact that package relies on lxml, if your platform is linux, please make sure you have installed lib libxslt-devel libxml2-devel. And for windows please select a suitable lxml installer to install.
And then:
$ pip install crawl-me
For windows, please add {$python-home}/Scripts/ to systempath
install via git
1. Ubuntu
Install the prerequisite library first:
sudo apt-get install libxml2-dev sudo apt-get install libxslt1-dev
And then you should install setuptools in order to run the setup.py file
sudo apt-get install python-setuptools
Finally, git clone the source, and install:
$ git clone https://github.com/nyankosama/crawl-me.git $ cd crawl-me/ $ sudo python setup.py install
2. Windows
Make sure you have already installed python2.7 and pip
You can install python2.7 via windows installer. You can install pip via downloading the get-pip.py, and run it via python:
python get-pip.py
And then install the prerequisite library lxml. please select a suitable lxml installer to install.
Finally git clone the source, and install:
$ git clone https://github.com/nyankosama/crawl-me.git $ cd crawl-me/ $ sudo python setup.py install
For windows, please add {$python-home}/Scripts/ to systempath
Usage
Examples
Download 10 pages pictures at the url of http://www.gamersky.com/ent/201404/352055.shtml in gamersky site, and store the pictures into local direcotry.
crawl-me gamersky http://www.gamersky.com/ent/201404/352055.shtml ./gamersky-crawl 1 10
Download all the paintings of 藤原(Fujiwara, Pixiv ID=27517), and store them into local directory.
crawl-me pixiv 27517 ./pixiv-crawl <your pixiv loginid> <your password>
Command line options
general help
$ crawl-me -h usage: crawl-me [-h] plugin positional arguments: plugin plugin the crawler uses optional arguments: -h, --help show this help message and exit available plugins: ----gamersky ----pixiv
gamersky
$ crawl-me gamersky -h usage: crawl-me [-h] plugin url savePath beginPage endPage positional arguments: plugin plugin the crawler uses url your url to crawl savePath the path where the imgs ars saved beginPage the page where we start crawling endPage the page where we end crawling optional arguments: -h, --help show this help message and exit
pixiv
$ crawl-me pixiv -h usage: crawl-me [-h] plugin authorId savePath pixivId password positional arguments: plugin plugin the crawler uses authorId the author id you want to crawl savePath the path where the imgs ars saved pixivId your pixiv login id password your pixiv login password optional arguments: -h, --help show this help message and exit
TODO
Functions:
support breakpoint resume
Plugins:
weibo
qq zone
Licenses
ChangeLog
0.1.9dev-20140617-1
Date: 2014-06-17
add the projconf.py into crawl_me package
bug fix: pixiv plugin gets page size <= 9
0.1.8
Date: 2014-06-15
add English README
0.1.8dev-20140615
Date: 2014-06-15
bug fix:-v –version option load project.json fail
0.1.8dev-20140612
Date: 2014-06-12
add -v –version option for main runnable file to show the package version
0.1.7
Date: 2014-06-11
add the http range header support auto-check
0.1.6
Date: 2014-06-11
bug fix: terminal without colour doesnt display syslog prefix
0.1.5
Date: 2014-06-11
bug fix:pip install bug in windows platform
0.1.5dev-20140611
Date: 2014-06-11
bug fix:pypi data_files
0.1.4
Date: 2014-06-11
the latest release
0.1.4dev-20140611
Date: 2014-06-11
modify the README.md. Now it is consistent with rst format to display on pypi
0.1.4dev-20140610
Date: 2014-06-10
add support for installing from pip
0.1.4dev3
Date: 2014-06-10
bug fix
fix the binary write problem in windows platform
0.1.4dev2
Date: 2014-06-10
add setuptools install support
0.1.4dev1
Date: 2014-06-09
bug fix
rangedownloader:http range-headers may not be supported
0.1.3
Date: 2014-06-07
do some refactory
add conf dictionary
0.1.2
Date: 2014-06-06
add plugin
pixiv
0.1.1
Date: 2014-06-05
add plugin
gamersky
0.0.1
Date: 2014-06-05
init the project
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file crawl-me-0.1.9dev-20140617-1.tar.gz
.
File metadata
- Download URL: crawl-me-0.1.9dev-20140617-1.tar.gz
- Upload date:
- Size: 12.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 5ba607bfc10dd6e55d3c706af2b676b63fb8f4aaa5cc09407b27a32102369520 |
|
MD5 | 5a4cba468ee0b3656222aac6e9b86d0d |
|
BLAKE2b-256 | af60796e4352dc7b6f2f22c128cfea58ef5e02dc3b032d078d5c3bebbe483fac |