sitecrawl

Simple Python3 module to crawl a website and extract URLs

These details have not been verified by PyPI

Project links

Homepage

Project description

Simple Python module to crawl a website and extract URLs.

Installation

Using pip:

pip3 install sitecrawl

sitecrawl --help

Or build from sources:

# Clone project
git clone https://github.com/gabfl/sitecrawl && cd sitecrawl

# Installation
pip3 install .

Usage

CLI

sitecrawl --url https://www.yahoo.com/ --depth 2 --max 4 --verbose

* Found 4 internal URLs
  https://www.yahoo.com
  https://www.yahoo.com/entertainment
  https://www.yahoo.com/lifestyle
  https://www.yahoo.com/plus

* Found 5 external URLs
  https://mail.yahoo.com/
  https://news.yahoo.com/
  https://finance.yahoo.com/
  https://sports.yahoo.com/
  https://shopping.yahoo.com/

* Skipped 0 URLs

As a module

Basic example:

from sitecrawl import crawl

crawl.base_url = 'https://www.yahoo.com'
crawl.deep_crawl(depth=2)

print('Internal URLs:', crawl.get_internal_urls())
print('External URLs:', crawl.get_external_urls())
print('Skipped URLs:', crawl.get_skipped_urls())

A more detailed example is available in example.py.

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

1.0.5

Jan 28, 2022

1.0.4

Jan 25, 2022

1.0.3

Jan 25, 2022

1.0.2

Jan 25, 2022

1.0.1

Jan 25, 2022

1.0

Jan 24, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

sitecrawl-1.0.5.tar.gz (5.4 kB view details)

Uploaded Jan 28, 2022 Source

Built Distribution

If you're not sure about the file name format, learn more about wheel file names.

The dropdown lists show the available interpreters, ABIs, and platforms. Enable javascript to be able to filter the list of wheel files.

sitecrawl-1.0.5-py2.py3-none-any.whl (6.0 kB view details)

Uploaded Jan 28, 2022 Python 2Python 3

File details

Details for the file sitecrawl-1.0.5.tar.gz.

File metadata

Download URL: sitecrawl-1.0.5.tar.gz
Upload date: Jan 28, 2022
Size: 5.4 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for sitecrawl-1.0.5.tar.gz
Algorithm	Hash digest
SHA256	`203417c73038f3beb7ad185985ffe7fe0aef2c6d0b88a6bfbcab886e4350eb07`
MD5	`7b46609d564f7aafe48d19210b407364`
BLAKE2b-256	`99d6cf003181dc0a933c51e8274ca964adf7d6a63b2cbe1c2c42819e4aaf0d5e`

See more details on using hashes here.

File details

Details for the file sitecrawl-1.0.5-py2.py3-none-any.whl.

File metadata

Download URL: sitecrawl-1.0.5-py2.py3-none-any.whl
Upload date: Jan 28, 2022
Size: 6.0 kB
Tags: Python 2, Python 3
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.1.1 pkginfo/1.5.0.1 requests/2.23.0 setuptools/40.8.0 requests-toolbelt/0.9.1 tqdm/4.46.0 CPython/3.7.3

File hashes

Hashes for sitecrawl-1.0.5-py2.py3-none-any.whl
Algorithm	Hash digest
SHA256	`39d8e219c7e75252395ef91666626c8c4fb72657f632481af3f4e6db019f1e60`
MD5	`3afd6f6ebf96483946368b2bac7ca4f3`
BLAKE2b-256	`aaf556d45dffb05dd630a0dc062859088e99f0178e3c39bcd3332814d9ff6523`

See more details on using hashes here.

sitecrawl 1.0.5

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Installation

Usage

CLI

As a module

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

Built Distribution

File details

File metadata

File hashes

File details

File metadata

File hashes