Skip to main content

A Django app to gather the links of a website.

Project description

Crawler is a Django app to help connect to a website and gather as much links as you want.

Quick start

  1. Add “gatherlinks” to your INSTALLED_APPS setting like this:

    INSTALLED_APPS = [
        ...
        'gatherlinks',
    ]
  2. Import the “main” module like this:

    from gatherlinks.crawler import main
  3. Initialize the StartPoint class like this:

    crawler = main.StartPoint(https://example.com, max_crawl=50, number_of_threads=10)
  4. The StartPoint class can be initialized with three arguments.
    1. homepage (a positional argument of the website to gather it’s link.)

    2. max_crawl (maximum number of links to gather from the website. Default is 50)

    3. number_of_threads (Number of threads to be doing the work simultaneously. Default is 10)

  5. After initialising the class, you can then call the “start” method like this:

    crawler.start()
  6. When the crawler must have finished gathering the link, you can access the gathered links like this:

    crawler.result

That result attribute is a “set” datatype that holds all the links that the crawler could gather. You can then loop through the “crawler.result” and do whatever you want with it (write to file or save to database).

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

django-web-crawler-0.9.tar.gz (4.5 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page