Skip to main content

Tag your pocket articles from getpocket.com automatically using NLP

Project description

Auto Pocket Tagger

Use Google cloud's Natural Language Processing API to automatically analyze the webpage from articles saved in your Pocket list, derive tags/keywords based on the content of the page, and add tags to the articles in Pocket list for free.

Pocket has suggested tags service for their paid premium plans. You can find more about it here. This still requires manual work of adding the tags to each article one-by-one. This package automates all of it for free.

Features

Usage

Installation

Install published version from pypi

$ pip install pocket-tagger

Install latest version from git

$ pip install git+https://github.com/sanghviharshit/pocket-tagger

Prerequisites

Google Cloud

This package relies on Google cloud natural language processing API, which requires billing enabled on your project. You can find the quickstart instructions here Options:

  1. Create a service account and download the credentials file - https://cloud.google.com/video-intelligence/docs/common/auth
tagger = PocketTagger(gcloud_credentials_file="gcloud_credentials_file.json")
  1. or Configure gloud locally - https://cloud.google.com/sdk/gcloud/reference/init
tagger = PocketTagger()

Pocket API

To fetch the articles list and add tags, you need a developer key from here Create a new Application with modify and retrieve permissions. Save the Consumer Key and Access Token.

tagger = PocketTagger(consumer_key='your-consumer-key',
                access_token='your-access-token')

Examples

# Initialize PocketTagger with GCloud and Pocket API Credentials
tagger = PocketTagger(gcloud_credentials_file="gcloud_credentials_file.json",
                consumer_key='pocket-consumer-key',
                access_token='pocket-access-token')

# Check https://getpocket.com/developer/docs/v3/retrieve for additional list of options you can pass for retrieving pocket list
articles = tagger.get_articles_from_api(count=10, offset=10, detailType='complete')

# Alternatively you can load the articles from file if you saved them previously using save_articles_to_file
# articles = tagger.get_articles_from_file("20190621.json")
# Generate tags for each article
articles_with_tags = tagger.get_tags_for_articles(articles)

# Save the articles with tags to file. You can use this file to verify it looks good before running the final step to tag the articles.
tagger.save_articles_to_file(today.strftime('%Y%m%d-with-tags.json'), articles_with_tags)

# You can skip this step if you want to do a dry run. Verify the tags in the file we generated in the previous step.
tagger.add_tags_to_articles(articles_with_tags)

Optional overrides

You can override the default thresholds for entity salience and category confidence

thresholds = {
  'entity_salience_threshold': 0.7
  'category_confidence_threshold': 0.3
}
articles_with_tags = tagger.get_tags_for_articles(articles, thresholds)

Sample

Sample output from running it for my 490 items long Pocket list

X under Entities or Categories denotes the NLP client returned those as potential candidates, but we skipped them because it didn't meet the threshold. You can see the last line Tags: abc, xyz for list of tags pocket-tagger added for each URL.

(1/490) https://www.reddit.com/r/explainlikeimfive/comments/bvweym/eli5_why_do_coffee_drinkers_feel_more_clear/?utm_source=share&utm_medium=ios_app
         Title: ELI5: Why do coffee drinkers feel more clear headed after consuming caffeine? Why do some get a headache without it? Does caffeine cause any permanent brain changes and can the brain go back to 'normal' after years of caffeine use? : explainlikeimfive
         Description: r/explainlikeimfive: **Explain Like I'm Five is the best forum and archive on the internet for layperson-friendly explanations.**   Don't Panic!
         Entities:
            X Coffee Drinkers: 0.2438652664422989
            X Eli5: 0.14941969513893127
            X Caffeine: 0.12065556645393372
            X Caffeine: 0.0874909833073616
            X Some: 0.06917785853147507
            X Headache: 0.0606028214097023
            X Brain: 0.03606536239385605
            X Explainlikeimfive: 0.033727116882801056
            X Brain Changes: 0.03211209550499916
            X Caffeine Use: 0.029848895967006683
            X R: 0.02966366335749626
            X Forum: 0.028598546981811523
            X Internet: 0.022404097020626068
            X Archive: 0.022404097020626068
            X Explainlikeimfive: 0.017647551372647285
            X Don'T Panic: 0.009302889928221703
            X Five: 0.007013489492237568
            X Five: 0.0
         Categories:
              /Food & Drink/Beverages/Coffee & Tea: 0.6700000166893005
         Tags: Food & Drink, Beverages, Coffee & Tea
(2/490) https://www.reddit.com/r/television/comments/bnpwe3/enjoy_three_full_minutes_of_the_cast_of_game_of/?utm_source=share&utm_medium=ios_app
         Title: Enjoy three full minutes of the cast of 'Game of Thrones' expressing disappointment in Season 8. : television
         Description: r/television:
         Entities:
            X Cast: 0.31218624114990234
            X Disappointment: 0.20341947674751282
            X Season: 0.20341947674751282
            X Game Of Thrones: 0.13265934586524963
            X Television: 0.08712445199489594
            X Television: 0.06119102984666824
            X 8: 0.0
            X Three: 0.0
         Categories:
              /Arts & Entertainment/TV & Video/TV Shows & Programs: 0.75
         Tags: Arts & Entertainment, TV & Video, TV Shows & Programs
(3/490) https://www.reddit.com/r/homeautomation/comments/awvf5r/local_realtime_person_detection_for_rtsp_cameras/
         Title: Local realtime person detection for RTSP cameras : homeautomation
         Description: r/homeautomation: A subreddit focused on automating your home, housework or household activity. Sensors, switches, cameras, locks, etc. Any …
         Entities:
            X Realtime Person Detection: 0.3057926297187805
            X Homeautomation: 0.15315502882003784
            X Cameras: 0.14035314321517944
            X Rtsp: 0.07461880147457123
            X Homeautomation: 0.051411159336566925
            X Home: 0.047811269760131836
            X Housework: 0.04366889223456383
            X Subreddit: 0.04183248057961464
            X R: 0.04132793843746185
            X Cameras: 0.032860007137060165
            X Locks: 0.028899790719151497
            X Household Activity: 0.012798599898815155
            X Switches: 0.012735127471387386
            X Sensors: 0.012735127471387386
         Categories:
              /Computers & Electronics: 0.7900000214576721
         Tags: Computers & Electronics

References

Project details


Download files

Download the file for your platform. If you're not sure which to choose, learn more about installing packages.

Source Distribution

pocket-tagger-0.1.1.tar.gz (8.1 kB view details)

Uploaded Source

File details

Details for the file pocket-tagger-0.1.1.tar.gz.

File metadata

  • Download URL: pocket-tagger-0.1.1.tar.gz
  • Upload date:
  • Size: 8.1 kB
  • Tags: Source
  • Uploaded using Trusted Publishing? No
  • Uploaded via: twine/1.13.0 pkginfo/1.5.0.1 requests/2.21.0 setuptools/40.4.3 requests-toolbelt/0.9.1 tqdm/4.31.1 CPython/3.6.3

File hashes

Hashes for pocket-tagger-0.1.1.tar.gz
Algorithm Hash digest
SHA256 0ae5a7868c331b504cafb46596278d6f638b337d5aef9f23c443fde47414555f
MD5 efa89980db80d221dcbb09d1e76c4903
BLAKE2b-256 ecb3cf84b113b5d1b495779b70bc04a1454e8a2d0af47452960cd03cf704ac71

See more details on using hashes here.

Supported by

AWS AWS Cloud computing and Security Sponsor Datadog Datadog Monitoring Fastly Fastly CDN Google Google Download Analytics Microsoft Microsoft PSF Sponsor Pingdom Pingdom Monitoring Sentry Sentry Error logging StatusPage StatusPage Status page