Django application which crawls and downloads online content following instructions
django-scraper is a Django application which crawls and downloads online content following configurable instructions.
- Extract content of given online websites/pages using XPath queries.
- Automatically browse and download content in related pages, with given depth.
- Support metadata extract along with other content
- Have content refinement rules and black words filtering
- Store and prevent duplication of downloaded content
- Support HTTP, HTTPS proxies.
The full documentation is not ready yet, please go here for notes about installation and usage: https://github.com/zniper/django-scraper
If you have any questions or any ideas regarding this application, please email to me[at]zniper.net