Package to retrieve comments from the New York Time articles that also serves as API Wrapper for NYT article search and performs the function of the now deprecated NYT Community API
The package includes three main functions to perform three distinct tasks involving the retrieval of comments’ and articles’ from New York Times as ready-to-use dataset for data science/machine learning projects:
- The main function get_dataset returns two dataframes - one each for the articles and the comments on them. The retrieval can be customized based on a number of optional parameters such as a specific timeline for the articles, search keywords, filter queries based on a number of options such as the week of the day, the word count of the articles, source, etc., maximum limit on the number of comments or articles or both, sorting the articles chronologically based on either the newest or oldest articles, option to suppress or activate the output log for the process, option to save the data as two csv files, etc. The function returns only the articles that were open to comments along with the comments on them.
- The function get_articles can be used as an API wrapper for NYT article search API. It returns the cleaned up and preprocessed data for articles as a ready-to-use pandas dataframe (with an option to store it in csv files). The retrieval can be customized with the same options as above and unlike the above function, it returns all the articles that satisfy the search criteria.
- The function get_comments retrieves the comments on NYT article(s) given their urls. It can be used as a substitute for the comments by url option in the NYT Community API that is now deprecated and only return comments that were picked as editor’s selection on account of an unresolved issue. This function does not use NYT API for the retrieval unlike the above two.
- Python 3.4+
from nytcomments.nytcomments import get_dataset articles_df, comments_df = get_dataset(ARTICLE_API_KEY, page_lower=0, page_upper=2)
- The url used to retrieve comments from a given article in the function get_comments is taken from the blog by Neal Caren.
- NYT article search API is used for the article search.