Skip to main content

Package to retrieve comments from the New York Time articles that also serves as API Wrapper for NYT article search and performs the function of the now deprecated NYT Community API

Project description

The package includes three main functions to perform three distinct tasks involving the retrieval of comments’ and articles’ from New York Times as ready-to-use dataset for data science/machine learning projects:

  1. The main function get_dataset returns two dataframes - one each for the articles and the comments on them. The retrieval can be customized based on a number of optional parameters such as a specific timeline for the articles, search keywords, filter queries based on a number of options such as the week of the day, the word count of the articles, source, etc., maximum limit on the number of comments or articles or both, sorting the articles chronologically based on either the newest or oldest articles, option to suppress or activate the output log for the process, option to save the data as two csv files, etc. The function returns only the articles that were open to comments along with the comments on them.

  2. The function get_articles can be used as an API wrapper for NYT article search API. It returns the cleaned up and preprocessed data for articles as a ready-to-use pandas dataframe (with an option to store it in csv files). The retrieval can be customized with the same options as above and unlike the above function, it returns all the articles that satisfy the search criteria.

  3. The function get_comments retrieves the comments on NYT article(s) given their urls. It can be used as a substitute for the comments by url option in the NYT Community API that is now deprecated and only return comments that were picked as editor’s selection on account of an unresolved issue. This function does not use NYT API for the retrieval unlike the above two.

Dependencies

  • Python 3.4+

  • pandas

  • requests

Usage

from nytcomments.nytcomments import get_dataset
articles_df, comments_df = get_dataset(ARTICLE_API_KEY, page_lower=0, page_upper=2)

Please refer to the tutorial here for illustration of the three functions get_dataset, get_comments and get_articles as well as detailed information about the function arguments. The functions get_dataset and get_articles requires the use of NYT API key that can be obtained by registering at the NYT developers’ site whereas get_comments can be used without the API key. You must agree to the Terms of Use for the NYT article search API to use the key.

Note : The dataset of comments posted on NYT articles in the period Jan - May 2017 and Jan - April, 2018 is available on Kaggle and is at the top among the 20 featured datsets as of April 28, 2018.

Acknowledgement

  • The url used to retrieve comments from a given article in the function get_comments is taken from the blog by Neal Caren.

  • NYT article search API is used for the article search.

Project details


Release history Release notifications | RSS feed

This version

0.1

Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

nytcomments-0.1.tar.gz (8.7 kB view hashes)

Uploaded Source

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page