django-web-crawler·PyPI

A Django app to gather the links of a website.

These details have not been verified by PyPI

Project links

Homepage

Project description

Crawler is a Django app to help connect to a website and gather as much links as you want.

Quick start

Add “gatherlinks” to your INSTALLED_APPS setting like this:
```
INSTALLED_APPS = [
    ...
    'gatherlinks',
]
```
Import the “main” module like this:
```
from gatherlinks.crawler import main
```

Initialize the StartPoint class like this:

crawler = main.StartPoint(https://example.com, max_crawl=50, number_of_threads=10)

The StartPoint class can be initialized with three arguments.
1. homepage (a positional argument of the website to gather it’s link.)
2. max_crawl (maximum number of links to gather from the website. Default is 50)
3. number_of_threads (Number of threads to be doing the work simultaneously. Default is 10)
After initialising the class, you can then call the “start” method like this:
```
crawler.start()
```
When the crawler must have finished gathering the link, you can access the gathered links like this:
```
crawler.result
```

That result attribute is a “set” datatype that holds all the links that the crawler could gather. You can then loop through the “crawler.result” and do whatever you want with it (write to file or save to database).

Project details

These details have not been verified by PyPI

Project links

Homepage

Release history Release notifications | RSS feed

This version

0.9

Mar 1, 2022

0.8

Mar 1, 2022

0.7

Feb 5, 2022

0.6

Feb 2, 2022

0.5

Feb 2, 2022

0.4

Jan 31, 2022

0.3

Jan 31, 2022

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-web-crawler-0.9.tar.gz (4.5 kB view details)

Uploaded Mar 1, 2022 Source

File details

Details for the file django-web-crawler-0.9.tar.gz.

File metadata

Download URL: django-web-crawler-0.9.tar.gz
Upload date: Mar 1, 2022
Size: 4.5 kB
Tags: Source
Uploaded using Trusted Publishing? No
Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.7

File hashes

Hashes for django-web-crawler-0.9.tar.gz
Algorithm	Hash digest
SHA256	`718c2b15b86f701be27cfe22104f7f299e14351222a7e5cae47ea6dff3dca489`
MD5	`dd55293661415621f034ed57d47d0a67`
BLAKE2b-256	`248cd2de600b731cd3e20093eaba925c29810a4c6e851aca9de4d0f476d22b75`

See more details on using hashes here.

django-web-crawler 0.9

Navigation

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Project description

Quick start

Project details

Verified details

Maintainers

Unverified details

Project links

Meta

Classifiers

Release history Release notifications | RSS feed

Download files

Source Distribution

File details

File metadata

File hashes