Dead simple web crawler for Python
Project description
# Creepy
Dead simple web crawler for Python
There are already a lot of web crawlers for Python, such as Scrapy. Creepy is
yet another web crawler for Python, which ains to provide a simple and light way
to write web crawlers.
## Example usage
```python
from creepy import Crawler
class MyCrawler(Crawler):
def process_document(self, doc):
if doc.status == 200:
print '[%d] %s' % (doc.status, doc.url)
# Do something with doc.text (the content of the page)
else:
pass
crawler = MyCrawler()
crawler.set_follow_mode(Crawler.F_SAME_HOST)
crawler.add_url_filter('\.(jpg|jpeg|gif|png|js|css|swf)$')
crawler.crawl('http://www.example.com/')
```
## Installation
1. Install from PyPI:
`pip install creepy`
2. Arch Linux users can find it on AUR or using [Yaourt](https://wiki.archlinux.org/index.php/Yaourt):
`yaourt -S python2-creepy-git`
## Bugs
* Please report bugs to the github issure tracker.
## Contributing
1. Fork it
2. Create your feature branch (git checkout -b my-new-feature)
3. Commit your changes (git commit -am 'Add some feature')
4. Push to the branch (git push origin my-new-feature)
5. Create new Pull Request
Dead simple web crawler for Python
There are already a lot of web crawlers for Python, such as Scrapy. Creepy is
yet another web crawler for Python, which ains to provide a simple and light way
to write web crawlers.
## Example usage
```python
from creepy import Crawler
class MyCrawler(Crawler):
def process_document(self, doc):
if doc.status == 200:
print '[%d] %s' % (doc.status, doc.url)
# Do something with doc.text (the content of the page)
else:
pass
crawler = MyCrawler()
crawler.set_follow_mode(Crawler.F_SAME_HOST)
crawler.add_url_filter('\.(jpg|jpeg|gif|png|js|css|swf)$')
crawler.crawl('http://www.example.com/')
```
## Installation
1. Install from PyPI:
`pip install creepy`
2. Arch Linux users can find it on AUR or using [Yaourt](https://wiki.archlinux.org/index.php/Yaourt):
`yaourt -S python2-creepy-git`
## Bugs
* Please report bugs to the github issure tracker.
## Contributing
1. Fork it
2. Create your feature branch (git checkout -b my-new-feature)
3. Commit your changes (git commit -am 'Add some feature')
4. Push to the branch (git push origin my-new-feature)
5. Create new Pull Request
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
creepy-0.1.5.tar.gz
(22.2 kB
view details)
File details
Details for the file creepy-0.1.5.tar.gz
.
File metadata
- Download URL: creepy-0.1.5.tar.gz
- Upload date:
- Size: 22.2 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 7225fc30271370b603973167c6a822add3f7c1fbab2930c17056905529ca5c13 |
|
MD5 | 74bed9a977f85936dc55d5835ddbc050 |
|
BLAKE2b-256 | fcc0f000fd575d6ccd642b18e2d1fbf9302d196544e3a96ebee4544f3f63ac4b |