Lazy Crawler is a Python package that simplifies web scraping tasks. It builds upon Scrapy, a powerful web crawling and scraping framework, providing additional utilities and features for easier data extraction. With Lazy Crawler, you can quickly set up and deploy web scraping projects, saving time and effort.
Project description
Lazy Crawler is a Python package that simplifies web scraping tasks. Built upon the powerful Scrapy framework, it provides additional utilities and features for easier data extraction. With Lazy Crawler, you can quickly set up and deploy web scraping projects, saving time and effort.
Features
- Simplified Setup: Streamlines the process of setting up and configuring web scraping projects.
- Predefined Library: Comes with a library of functions and utilities for common web scraping tasks, reducing the need for manual coding.
- Easy Data Extraction: Simplifies extracting and processing data from websites, allowing you to focus on analysis and insights.
- Versatile Utilities: Includes tools for finding emails, numbers, mentions, hashtags, links, and more.
- Flexible Data Storage: Provides a pipeline for storing data in various formats such as CSV, JSON, Google Sheets, and Excel.
Getting Started
To get started with Lazy Crawler:
- Install: Ensure Python and Scrapy are installed. Then, install Lazy Crawler via pip:
pip install lazy-crawler
- Create a Project: Create a Python file for your project (e.g.,
scrapy_example.py
) and start coding.
Example Usage
Here's an example of how to use Lazy Crawler in a project:
import os
import scrapy
from scrapy.crawler import CrawlerProcess
from lazy_crawler.crawler.spiders.base_crawler import LazyBaseCrawler
from lazy_crawler.lib.user_agent import get_user_agent
class LazyCrawler(LazyBaseCrawler):
name = "example"
custom_settings = {
'DOWNLOAD_DELAY': 0.5,
'CONCURRENT_REQUESTS': 32,
}
headers = get_user_agent('random')
def start_requests(self):
url = 'https://example.com'
yield scrapy.Request(url, self.parse)
def parse(self, response):
title = response.xpath('//title/text()').get()
yield {'Title': title}
settings_file_path = 'lazy_crawler.crawler.settings'
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', settings_file_path)
process = CrawlerProcess()
process.crawl(LazyCrawler)
process.start()
Further Resources
For more information and examples of how to use Lazy Crawler, see the project documentation.
Credits
Lazy Crawler was created by Pradip P.
License
Lazy Crawler is released under the MIT License.
Project details
Release history Release notifications | RSS feed
Download files
Download the file for your platform. If you're not sure which to choose, learn more about installing packages.
Source Distribution
Built Distribution
File details
Details for the file lazy_crawler-0.15.tar.gz
.
File metadata
- Download URL: lazy_crawler-0.15.tar.gz
- Upload date:
- Size: 21.9 kB
- Tags: Source
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | 89066be7b3f519f368375e75645eeded281fb9d3f622f7729940f061dda1b732 |
|
MD5 | a363c553d60fcb310c201f2e27bffcf5 |
|
BLAKE2b-256 | de16d55222ee4930676ba22c9a4d4cb1be6d5d60f2c085fd28414e33643e9fb5 |
File details
Details for the file lazy_crawler-0.15-py3-none-any.whl
.
File metadata
- Download URL: lazy_crawler-0.15-py3-none-any.whl
- Upload date:
- Size: 25.3 kB
- Tags: Python 3
- Uploaded using Trusted Publishing? No
- Uploaded via: twine/5.1.1 CPython/3.10.12
File hashes
Algorithm | Hash digest | |
---|---|---|
SHA256 | f3e83f6c2393cda06c69c8afa75bddca085ff1d7beaf3f2f67338b9b15c98284 |
|
MD5 | 07c61f24f2cf817ab9946db5ee5539d7 |
|
BLAKE2b-256 | 72ef9673416585b7c69d761278a9fb38f38e693603ce09aee8617ac9c0a0eddd |