Skip to main content

Lazy Crawler is a Python package that simplifies web scraping tasks. It builds upon Scrapy, a powerful web crawling and scraping framework, providing additional utilities and features for easier data extraction. With Lazy Crawler, you can quickly set up and deploy web scraping projects, saving time and effort.

Project description

Lazy Py Crawler

Simplify your web scraping tasks with ease.

Scrape smarter, not harder.

Release Version

CI/CD N/A
Tech Stack Python Scrapy
Code Style PEP8 Style pre-commit
Other Info docs license

Lazy Crawler is a Python package that simplifies web scraping tasks. Built upon the powerful Scrapy framework, it provides additional utilities and features for easier data extraction. With Lazy Crawler, you can quickly set up and deploy web scraping projects, saving time and effort.

Features

  • Simplified Setup: Streamlines the process of setting up and configuring web scraping projects.
  • Predefined Library: Comes with a library of functions and utilities for common web scraping tasks, reducing the need for manual coding.
  • Easy Data Extraction: Simplifies extracting and processing data from websites, allowing you to focus on analysis and insights.
  • Versatile Utilities: Includes tools for finding emails, numbers, mentions, hashtags, links, and more.
  • Flexible Data Storage: Provides a pipeline for storing data in various formats such as CSV, JSON, Google Sheets, and Excel.

Getting Started

To get started with Lazy Crawler:

  1. Install: Ensure Python and Scrapy are installed. Then, install Lazy Crawler via pip:
    pip install lazy-crawler
    
  2. Create a Project: Create a Python file for your project (e.g., scrapy_example.py) and start coding.

Example Usage

Here's an example of how to use Lazy Crawler in a project:

import os
import scrapy
from scrapy.crawler import CrawlerProcess
from lazy_crawler.crawler.spiders.base_crawler import LazyBaseCrawler
from lazy_crawler.lib.user_agent import get_user_agent

class LazyCrawler(LazyBaseCrawler):
    name = "example"
    custom_settings = {
        'DOWNLOAD_DELAY': 0.5,
        'CONCURRENT_REQUESTS': 32,
    }
    headers = get_user_agent('random')

    def start_requests(self):
        url = 'https://example.com'
        yield scrapy.Request(url, self.parse)

    def parse(self, response):
        title = response.xpath('//title/text()').get()
        yield {'Title': title}

settings_file_path = 'lazy_crawler.crawler.settings'
os.environ.setdefault('SCRAPY_SETTINGS_MODULE', settings_file_path)
process = CrawlerProcess()
process.crawl(LazyCrawler)
process.start()

Further Resources

For more information and examples of how to use Lazy Crawler, see the project documentation.

Credits

Lazy Crawler was created by Pradip P.

License

Lazy Crawler is released under the MIT License.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

lazy_crawler-0.15.tar.gz (21.9 kB view details)

Uploaded Source

Built Distribution

lazy_crawler-0.15-py3-none-any.whl (25.3 kB view details)

Uploaded Python 3

File details

Details for the file lazy_crawler-0.15.tar.gz.

File metadata

  • Download URL: lazy_crawler-0.15.tar.gz
  • Upload date:
  • Size: 21.9 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for lazy_crawler-0.15.tar.gz
Algorithm Hash digest
SHA256 89066be7b3f519f368375e75645eeded281fb9d3f622f7729940f061dda1b732
MD5 a363c553d60fcb310c201f2e27bffcf5
BLAKE2b-256 de16d55222ee4930676ba22c9a4d4cb1be6d5d60f2c085fd28414e33643e9fb5

See more details on using hashes here.

File details

Details for the file lazy_crawler-0.15-py3-none-any.whl.

File metadata

  • Download URL: lazy_crawler-0.15-py3-none-any.whl
  • Upload date:
  • Size: 25.3 kB
  • Tags: Python 3
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/5.1.1 CPython/3.10.12

File hashes

Hashes for lazy_crawler-0.15-py3-none-any.whl
Algorithm Hash digest
SHA256 f3e83f6c2393cda06c69c8afa75bddca085ff1d7beaf3f2f67338b9b15c98284
MD5 07c61f24f2cf817ab9946db5ee5539d7
BLAKE2b-256 72ef9673416585b7c69d761278a9fb38f38e693603ce09aee8617ac9c0a0eddd

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page