Skip to main content

A Django app to gather the links of a website.

Project description

Crawler is a Django app to help connect to a website and gather as much links as you want.

Quick start

  1. Add “gatherlinks” to your INSTALLED_APPS setting like this:

    INSTALLED_APPS = [
        ...
        'gatherlinks',
    ]
  2. Import the “main” module like this:

    from gatherlinks.crawler import main
  3. Initialize the StartPoint class like this:

    crawler = main.StartPoint(https://example.com, max_crawl=50, number_of_threads=10)
  4. The StartPoint class can be initialized with three arguments.
    1. homepage (a positional argument of the website to gather it’s link.)

    2. max_crawl (maximum number of links to gather from the website. Default is 50)

    3. number_of_threads (Number of threads to be doing the work simultaneously. Default is 10)

  5. After initialising the class, you can then call the “start” method like this:

    crawler.start()
  6. When the crawler must have finished gathering the link, you can access the gathered links like this:

    crawler.result

That result attribute is a “set” datatype that holds all the links that the crawler could gather. You can then loop through the “crawler.result” and do whatever you want with it (write to file or save to database).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-web-crawler-0.9.tar.gz (4.5 kB view details)

Uploaded Source

File details

Details for the file django-web-crawler-0.9.tar.gz.

File metadata

  • Download URL: django-web-crawler-0.9.tar.gz
  • Upload date:
  • Size: 4.5 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/3.7.1 importlib_metadata/4.6.4 pkginfo/1.8.2 requests/2.25.1 requests-toolbelt/0.9.1 tqdm/4.57.0 CPython/3.9.7

File hashes

Hashes for django-web-crawler-0.9.tar.gz
Algorithm Hash digest
SHA256 718c2b15b86f701be27cfe22104f7f299e14351222a7e5cae47ea6dff3dca489
MD5 dd55293661415621f034ed57d47d0a67
BLAKE2b-256 248cd2de600b731cd3e20093eaba925c29810a4c6e851aca9de4d0f476d22b75

See more details on using hashes here.

Supported by

AWS Cloud computing and Security Sponsor Datadog Monitoring Fastly CDN Google Download Analytics Pingdom Monitoring Sentry Error logging StatusPage Status page