Skip to main content
Join the official 2019 Python Developers SurveyStart the survey!

A package to extract assets from webpages

Project description

Assets Crawler

Structure for app :open_file_folder: :octocat:

Python

Installing the dependencies

Install the dependencies with pip, running: $ pip install -r requeriments.txt.

Setup a site to crawl

In file run.py in should set a site to crawl, like the example in line 50 of the file, and the filename and format of output. Below is the code snippet:

    # example
    # declaring url to get using the crawler
    url_to_scrape = 'https://elixir-lang.org/'

    # creating crawler
    new_crawler = Crawler(url_to_scrape)

    # for running crawl result with tables uncomment the line bellow
    # new_crawler.storage_data('my_scraped_assets.txt', 'my_scraped_relations.txt')

    """
    For run and plot graph uncomment the three lines
    bellow(and comment the two above lines),after
    see the result in a network map."""
    # get_relations = new_crawler.run()
    # json_file = save_json(get_relations, 'data.json')
    # plot_map(json_file)

Runinng

Make sure you have installed all the dependencies. Run the file:

$ python run.py.

Result

If everything goes well you will have this result in your folder:

.
├── .gitignore                  # File with ignored files
├── crawler.py                  # Module with crawler
├── data.json                   # Your scraped data in json, to plot, if you choose plot graph
├── README.md                   # Readme with how to use the crawler
├── requeriments.txt            # Dependencies file
├── my_scraped_assets.txt       # Your scraped data assets in tables
├── my_scraped_relations.txt    # Your scraped data relations in tables
├── run.py                      # File to run the crawler
└── utils.py                    # File with helpers

Project details


Release history Release notifications

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Files for Assets-crawler, version 0.1
Filename, size File type Python version Upload date Hashes
Filename, size Assets_crawler-0.1-py3-none-any.whl (3.1 kB) File type Wheel Python version py3 Upload date Hashes View hashes

Supported by

Elastic Elastic Search Pingdom Pingdom Monitoring Google Google BigQuery Sentry Sentry Error logging AWS AWS Cloud computing DataDog DataDog Monitoring Fastly Fastly CDN SignalFx SignalFx Supporter DigiCert DigiCert EV certificate StatusPage StatusPage Status page