Skip to main content

Web Scraper

Project description

# default-scraper

Python Web Scraper

## Features

  • Scrap all search results for a keyword entered as an argument.

  • Can be saved as .csv and .json.

  • Also collect user data who uploaded contents included in search results.

## Usage

### Install

`bash pip install git+https://github.com/Seongbuming/crawler.git `

### Scrap Instagram contents in python script

`python from default_scraper.instagram.parser import InstagramParser USERNAME = "" PASSWORD = "" KEYWORD = "" parser = InstagramParser(USERNAME, PASSWORD, KEYWORD, False) parser.run() `

### Scrap Instagram contents using bash command

Run following command to scrap contents from Instagram:

`bash python main.py --platform instagram --keyword {KEYWORD} [--output_file OUTPUT_FILE] [--all] `

Use –all or -a option to also scrap unstructured fields.

## Data description

### Instagram

  • Structured fields - pk - id - taken_at - media_type - code - comment_count - user - like_count - caption - accessibility_caption - original_width - original_height - images

  • Some fields may be missing depending on Instagram’s response data.

## Future works

  • Will support scraping from more platform services.

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distributions

No source distribution files available for this release.See tutorial on generating distribution archives.

Built Distribution

default_scraper-1.0.1-py3-none-any.whl (7.6 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page