Django application which crawls and downloads online content following instructions
Project description
Features
Extract content of given online websites/pages using XPath queries.
Process can be started from command line (~cron job) or inside Django code
Can be called from command line (~cron job) or inside Django code
Automatically browse and download content in related pages, with given depth.
Support metadata extract along with other content
Have content refinement rules and black words filtering
Store and prevent duplication of downloaded content
Allow changing User Agent
Support proxy servers
Documentation
The full documentation is not ready yet, please go here for notes about installation and usage: https://github.com/zniper/django-scraper
Support
If you have any questions about this application, please email to me[at]zniper.net
Project details
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.