A simple and efficient web crawler in Python.

These details have not been verified by PyPI

Project links

GitHub Statistics

Project description

Tiny Web Crawler

A simple and efficient web crawler in Python.

Features

Crawl web pages and extract links
Handle relative and absolute URLs
Save crawl results to a JSON file
Easy to use and extend

Installation

Install using pip:

pip install tiny-web-crawler

Usage

from tiny_web_crawler.crawler import Spider

root_url = 'http://example.com'
max_links = 2

spider = Spider(root_url, max_links)
spider.start()

Output Format

Crawled output sample for https://github.com

{
    "http://github.com": {
        "urls": [
            "http://github.com/",
            "https://githubuniverse.com/",
            ...
        ],
    "https://github.com/solutions/ci-cd": {
        "urls": [
            "https://github.com/solutions/ci-cd/",
            "https://githubuniverse.com/",
            ...
        ]
      }
    }
}

License

This project is licensed under the GNU GPLv3 License - see the LICENSE file for details.

Project details

These details have not been verified by PyPI

Project links

GitHub Statistics

Release history Release notifications | RSS feed

0.5.0

Jul 11, 2024

0.4.0b0 pre-release

Jun 22, 2024

0.3.0

Jun 13, 2024

0.2.0

Jun 12, 2024

0.1.5

Jun 12, 2024

0.1.4

Jun 12, 2024

0.1.3

Jun 12, 2024

0.1.2

Jun 12, 2024

This version

0.1.1

Jun 12, 2024

0.1

Jun 12, 2024

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

tiny_web_crawler-0.1.1.tar.gz (4.7 kB view hashes)

Uploaded Jun 12, 2024 Source

Built Distribution

tiny_web_crawler-0.1.1-py3-none-any.whl (6.1 kB view hashes)

Uploaded Jun 12, 2024 Python 3

Hashes for tiny_web_crawler-0.1.1.tar.gz

Hashes for tiny_web_crawler-0.1.1.tar.gz
Algorithm	Hash digest
SHA256	`2bf8cc9e94b7d5b02cde2c586d9a4ce0c587713831e4a1ffdfe96f8c0254c30d`
MD5	`7d10e72cb980ce9aa3231e5614845243`
BLAKE2b-256	`70c6de6eea46c4d4d9fb6f2967f97db7aa7733a0e7dfcbbd229fc3e926f6f521`

Hashes for tiny_web_crawler-0.1.1-py3-none-any.whl

Hashes for tiny_web_crawler-0.1.1-py3-none-any.whl
Algorithm	Hash digest
SHA256	`a6c1030841fa582769fe7dff79514ae71d0af250f768731be42091d0bbe8d100`
MD5	`c9b10da6097836bc46dda28ab92ded32`
BLAKE2b-256	`4cc7e576e2132815a3ce1e2713c842600dc43ffc06067de7892f173d206f14f2`