Skip to main content

Python based code to scrap and download data from quora website: questions related to certain topics, answers given on certain questions and users profile data

Project description

Quora-scraper

N|Solid

Build Status

Quora-scraper simulates a browser environment to let you scrape Quora rich textual data. You can use one of the three scraping modules to: Find questions that discuss about certain topics (such as Finance, Politics, Tesla or Donald-Trump). Scrape Quora answers related to certain question(s), or scrape users profile.

Install

To use our scraper, please follow the steps below:

$ pip install quora-scraper

To update quora-scraper:

$ pip install quora-scraper --upgrade

Usage

quora-scraper has three scraping modules : questions ,answers,users.

1) Scraping questions URL:

You can scrape questions related to certain topics using questions command. This module takes as an input a list of topic keywords. Output is a questions_URL file containing the topic's question links.

Scraping a topic questions can be done as follows:

  • a) Use -l parameter + topic keywords list.

    $ quora-scraper questions -l [finance,politics,Donald-Trump]
    
  • b) Use -f parameter + topic keywords file location. (keywords must be line separated inside the file):

    $ quora-scraper questions -f  topics_file.txt
    

2) Scraping answers:

Quora answers are scraped using answers command. This module takes as an input a list of Questions URL. Output is a file of scraped answers (answers.txt). An answer consists of :

Quest-ID | AnswerDate | AnswerAuthor-ID | Quest-tags | Answer-Text

To scrap answers, use one of the following methods:

  • a) Use -l parameter + question URLs list.

    $ quora-scraper answers -l [https://www.quora.com/Is-milk-good,https://www.quora.com/Was-Einstein-a-fake-and-a-plagiarist]
    
  • b) Use -f parameter + question URLs file location:

    $ quora-scraper answers -f  questions_url.txt
    

3) Scraping Quora user profile:

You can scrape Quora Users profile using users command. The users module takes as an input a list of Quora user IDs. The output is UserProfile file containing:

First line : UserID | ProfileDescription |ProfileBio | Location | TotalViews |NBAnswers | NBQuestions | NBFollowers | NBFollowing

Remaining lines (User's answers): AnswerDate | QuestionID | AnswerText

Scraping Users profile can be done as follows:

  • a) Use -l parameter + User-IDs list.

    $ quora-scraper users -l [Albert-Einstein-195,Jackie-Chan-8]
    
  • b) Use -f parameter + User-IDs file.

    $ quora-scraper users -f quora_username_file.txt
    

Notes

a) Please note that output file fields are tab separated.

b) You can add a list/line index parameter In order to start the scraping from that index. The code below will start scraping from "physics" keyword: sh $ quora-scraper questions -l [finance,politics,tech,physics,life,sports] -i 3

c) Quora-scraper is a command-line application written in Python that scrapes Quora data. It uses xpaths method to scrap Quora webpage elements. Since Quora HTML Structure is constantly changing, the code may need modification from time to time. Please feel free to update and contribute to the source-code in order to keep the scraper up-to-date.

d) Please note that Quora website puts limit on the number of questions accessible on the topic page. Thus, even if a topic has a large number of questions (ex: 100k). The number scraped questions links will not exceed 2k or 3k questions.

e) For more help use :

   $ quora-scraper --help

License

This project uses the following license: MIT

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

quora-scraper-1.0.8.tar.gz (5.2 MB view hashes)

Uploaded Source

Built Distribution

quora_scraper-1.0.8-py3-none-any.whl (5.2 MB view hashes)

Uploaded Python 3

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page