Django application for collecting online content following user-defined instructions
Project description
django-scraper is a Django application for collecting online content following user-defined instructions
Features
Extract content of given online website/pages and stored under JSON data
Crawl then extract content in multiple pages, with given depth.
Can download media files present in page
Have option for storing data under ZIP file
Support standard file system and AWS S3 storage
Customisable crawling requests for different scenarios
Process can be started from Django management command (~cron job) or with Python code
Support extracting multiple content (text, html, images, binary files) in the same page
Have content refinement (replacement) rules and black words filtering
Support custom proxy servers, and user-agents
Support Django 1.6, 1.7, and 1.8
Samples
Below is sample result from scraping https://news.ycombinator.com/ask
JSON result via a renderer:
Installation
This application requires some other tools installed first:
lxml requests
django-scraper installation can be made using pip:
pip install django-scraper
For more and latest information about configuration or usage, please visit the repository in github: https://github.com/zniper/django-scraper
Support
If you have any questions about this application, please email to: me@zniper.net
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
File details
Details for the file django-scraper-0.3.8.tar.gz
.
File metadata
- Download URL: django-scraper-0.3.8.tar.gz
- Upload date:
- Size: 62.8 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | dcac254b2493e73f491875dcec1444f6048559d5e0b4d3132626eee3ca10e208 |
|
MD5 | cf7a41e58d474a93b86acf35d08726bd |
|
BLAKE2b-256 | e05669acca1b5acefee25ef7d96237bb5ccc2dc43fec845577abf3bc131c03f5 |