Skip to main content

A simple and efficient web crawler in Python.

Project description

Tiny Web Crawler

A simple and efficient web crawler in Python.

Features

  • Crawl web pages and extract links
  • Handle relative and absolute URLs
  • Save crawl results to a JSON file
  • Easy to use and extend

Installation

Install using pip:

pip install tiny-web-crawler

Usage

from tiny_web_crawler.crawler import Spider

root_url = 'http://example.com'
max_links = 2

spider = Spider(root_url, max_links)
spider.start()

Output Format

Crawled output sample for https://github.com

{
    "http://github.com": {
        "urls": [
            "http://github.com/",
            "https://githubuniverse.com/",
            ...
        ],
    "https://github.com/solutions/ci-cd": {
        "urls": [
            "https://github.com/solutions/ci-cd/",
            "https://githubuniverse.com/",
            ...
        ]
      }
    }
}

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_web_crawler-0.1.1.tar.gz (4.7 kB view hashes)

Uploaded Source

Built Distribution

tiny_web_crawler-0.1.1-py3-none-any.whl (6.1 kB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page